There are two operations to realize a Recommender System with Odysseus:
In this example, the MovieLens dataset is used.
The file u_ordered.data
is ordered by timestamp (this is not necessary but allows implementations to take advantage of temporal effects, e. g. concept drift).
The file unique_temporal_ordered_users.data
has only the user column of u_ordered.data
. Duplicates are removed.
#PARSER CQL #RUNQUERY CREATE STREAM ml100k (userid Integer, itemid Integer, rating Double, timestamp Long) WRAPPER 'GenericPull' PROTOCOL 'CSV' TRANSPORT 'File' DATAHANDLER 'Tuple' OPTIONS ( 'filename' '${PROJECTPATH}/datasets/ml-100k/u_ordered.data', 'delimiter' '\t' ,'scheduler.delay' '100' ) #RUNQUERY CREATE STREAM ml100k_users (userid Integer) WRAPPER 'GenericPull' PROTOCOL 'CSV' TRANSPORT 'File' DATAHANDLER 'Tuple' OPTIONS ( 'filename' '${PROJECTPATH}/datasets/ml-100k/unique_temporal_ordered_users.data', 'delimiter' '\t' ,'scheduler.delay' '1000' ) #PARSER PQL #ADDQUERY recommendationModels = RECOMMENDATION_LEARN( { item = 'itemid', user = 'userid', rating = 'rating', learner = 'Mahout', options = [ 'OptionRecommender'='SVDRecommender', 'OptionFactorizer'='SVDPlusPlusFactorizer' ] }, ml100k) #ADDQUERY recommendations = RECOMMENDATION( { recommender = 'recommender', user = 'userid', no_of_recommendations = 5 }, ml100k_users, recommendationModels) |