This is one of the pre-defined project ideas you can choose for your project.
Recommendation system for MovieLens
Modify your recommendation system from Assignment 1 to use the small MovieLens dataset with 100 000 ratings. You can read about the dataset here.
You are only required to use user-based collaborative filtering and not item-based (since pre-calculating matching movies will take very long time).
The dataset can be downloaded at the Datasets page.
A problem with the calculations used in Assignment 1 is that if there are many users that have rated many movies, as in the MovieLens dataset, lots of movies will get the max recommendation score of 5. To make better recommendations we can do some modifications:
- Only include users with similarity of more than 0 in the calculations
- Exclude all movies with very few ratings
- Use the same Recommendation System you developed for Assignment 1 with the MovieLens dataset
- Add code for storing the number of ratings each movie has
- Modify the score calculation to exclude movies with few ratings
- It shall be possible to set the min number of ratings in the client GUI
- If you set min number of ratings to 1 you will get lots of results with max rating of 5
- To improve the results you shall:
1) round the score to 4 decimals
2) sort the result list by score (highest first)
3) if two results have equal score, sort by number of ratings (highest first)
- You must show number of ratings for each movie in the result list in your client GUI
- Measure the time it takes to find top 5 recommended movies for a user (try for example user 256)
- Build a cache for similarity calculations to avoid calculating similarity between two users more than once
- How much does the cache improve execution times when finding top 5 recommended movies?