Documentos de Académico
Documentos de Profesional
Documentos de Cultura
242
the Nutch1 open source search engine in the content extract the top K relevant terms from a sliding window
modeling phase, followed by content based filtering as consisting of the last W pages in the current session,
a recommendation strategy. Using a search engine’s and then submit them to the search engine in order to
powerful capabilities can automate and scale the compute recommendation links. We also applied the
crawling and indexing of the learning material; (2) we (CF) approach alone using Association Rules (ARs)
automate the indexing of educational content based on (mined in the offline phase), and we compared the
norms and standards used in e-learning. Thus, our sliding window pages to these ARs in order to find
addition of an index used in LOM (Learning Object relevant recommendations. The Apriori algorithm [1]
Metadata) for learning content (if available) to the was used as the AR mining technique. Finally,
preliminary crawling and indexing phase done by following our initial goals, we included the possibility
Nutch is expected to improve the accuracy of the final to combine both of the recommendation approaches
index, and likewise, improve content search and (CBF and CF) in order to improve the recommendation
recommendations. Finally, we note that a byproduct of quality and generate the most relevant learning objects
our crawling and indexing with Nutch is an available to learners. Hence, two approaches were considered:
interface to use for explicit searches over the material Hybrid content via profile based collaborative filtering
if needed. with cascaded/feature augmentation combination,
which performs collaborative recommendation
2.2. Recommendation phase followed by content recommendation (the reverse
order could also be considered); and Hybrid content
2.2.1. Formulating a learner’s implicit query. This and profile based collaborative filtering with weighted
initial step in the recommendation process consists of combination, where the collaborative filtering and
transforming a new user session into a set of URLs content based filtering recommendations are performed
expressing user preferences and interests in visited simultaneously, then the results of both techniques are
learning resources. This task is accomplished in two combined together to produce a single
phases (1) delimiting the current active learner session, recommendation set [9].
and (2) extracting URLs of interest from this active
session. Since the active user session will be extracted 3. Experimentation and results
from the Web log file, we identify only the log records
representing the last W visited pages in the active user To implement and evaluate the proposed
session, which is called the sliding window. The personalization approach, we used the course
recommendation process will then depend on these repository of the Virtual University of Tunis, which is
pages or on the relevant terms that they contain. On the accessible via the RPL platform2. We conceived a
other hand, to make selected pages more closely system composed by a set of components, where each
express learner interests, a weight can be associated component is performing a number of services. The
with each URL contained in the learner’s session. This main features of the proposed recommender system are
weight can be binary (existence or non-existence of a shown in Figure. 2. Component 1 and Component 2
URL referenced in a session), or it can be computed as represent the input for the recommender system in two
a function of a number of features based essentially on different forms. Component 1 extracts the sliding
the frequency of occurrence of the URL within a window pages (the W last visited pages from the active
session and/or the time that a learner spends on a learner session). These pages are extracted from the
particular page, which could express implicitly the fact Web log file, starting from the time that the learner
that a learner liked or disliked the URL [10], [11]. connected to the platform until he/she asks for
However, since such measures have been found to be recommendations. We consider, here, a fixed window
inaccurate as indicators of the user interest [6], we did size W =3, so that only the last three visited pages may
not consider the URL weight in the present work. affect the recommendation. The second input form is
performed by Component 2 which extracts from
2.2.2. Recommendation process. This task is Component 1’s results (sliding window) the top K
principally based on a content-based filtering approach relevant terms. Each URL of the sliding window is
(CBF) and a collaborative filtering approach (CF). transformed into a set of content terms which are the
Several recommendation strategies based on these most relevant terms characterizing this URL. This task
approaches have been investigated in our work. First, is performed using Nutch’s built-in functionalities for
we applied the (CBF) approach alone using the search parsing HTML pages, and a plug-in for stop word
functionalities of the Nutch search engine. We first elimination. Component 1 and Component 2 are
1 2
http://lucene.apache.org/nutch/ http://cours.uvt.rnu.tn
243
performed in an on-line mode. Component 3 concerns 800000
600000
usage sessions (from which the profiles are to be 500000
log files (Apache web access log files3) in the period 200000
0
Feb Mar A pr May Jun Jul
Tot al
Tot al delet ed
Tot al rem ained
3000
2500
2000
1500
1000
500
0
Feb Mar Apr May Jun Jul
244
N, etc). Figure 5 shows a screenshot explaining the 5. References
obtained recommended links within the RPL platform
based on a cascaded hybrid recommendation strategy. [1] R. Agrawal, and R. Srikant, “Fast Algorithms for Mining
Association Rules”, In Proc. of the 20th Int'l Conference on
Very Large Databases, Santiago, Chile, September 1994.
245