P1 – Recommendation System

This is one of the pre-defined project ideas you can choose for your project.

Recommendation system for MovieLens

Modify your recommendation system from Assignment 1 to use the small MovieLens dataset with 100 000 ratings. You can read about the dataset here.

You are only required to use user-based collaborative filtering and not item-based (since pre-calculating matching movies will take a very long time).

The dataset can be downloaded on the Datasets page.

Note! A problem with the calculations used in Assignment 1 is that if many users have rated many movies, as in the MovieLens dataset, many movies will get the max recommendation score of 5. To make better recommendations, you can do some modifications:
  • Only include users with the similarity of more than 0 in the calculations.
  • Exclude all movies with very few ratings.

Grading

GradeRequirements
E
  • Use the same Recommendation System you developed for Assignment 1 with the MovieLens dataset.
  • Add code for storing the number of ratings each movie has.
  • Modify the score calculation to exclude movies with few ratings.
  • It shall be possible to set the min number of ratings in the client GUI.
C-D
  • If you set min number of ratings to 1 you will get lots of results with max rating of 5.
  • To improve the results you shall:
    1. round the score to four decimals
    2. sort the result list by score (highest first)
    3. if two results have equal score, sort by number of ratings (highest first).
  • You must show number of ratings for each movie in the result list in your client GUI.
A-B
  • Measure the time it takes to find top five recommended movies for a user (try for example user 256).
  • Build a cache for similarity calculations to avoid calculating similarity between two users more than once.
  • How much does the cache improve execution times when finding top five recommended movies?