Recommendation-System-with-ListenBrainz-Dataset

Project Overview:

This project focuses on building recommendation systems using the ListenBrainz Dataset, which contains 21,684 users and 695 million playtime records. The dataset is divided into three tables: tracks, interactions, and users, with our analysis centered on the tracks and interactions datasets.

Data Preparation:

We worked with a smaller version of the dataset from 2018, joined the tracks and interactions datasets on user_id, and cleaned the data by replacing missing values and filtering out tracks with fewer than 10 plays.
To avoid the cold start problem, we partitioned the dataset by user and split it into training (70%) and validation (30%) sets using window functions.

Modeling and Evaluation:

Alternating Least Squares (ALS) Model:
- Implemented using PySpark, the ALS model learns latent factors for users and tracks based on listen counts.
- We tuned three parameters—rank, regParam, and alpha—and found that rank=200, regParam=0.1, and alpha=1 provided the best results with a validation MAP of 0.061372 and precision at 100 of 0.129533.
- The model was applied to the test set, achieving a MAP of 0.05311.
LightFM Model:
- We used the LightFM Python library for collaborative filtering, focusing on improving performance by incorporating metadata.
- The LightFM model, using the same dataset partitioning, achieved higher precision at k=100 compared to the ALS model with faster training times, making it more ideal for smaller datasets.

Key Metrics:

Mean Average Precision (MAP): Evaluates the order of recommended items.
Precision: Measures the accuracy of recommendations.
RMSE: Indicates the fit of the ALS model to the training data.

Findings:

The ALS model performed well on smaller datasets, but LightFM provided slightly better precision with faster training.
LightFM may be more suitable for smaller datasets, while ALS might be better for larger datasets, though this was not tested due to resource limitations.

Conclusion: This project demonstrates the application of collaborative filtering using ALS and LightFM models on the ListenBrainz dataset, highlighting the trade-offs between model complexity and computational efficiency.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Recommendation_System_Final_Project_Report.pdf		Recommendation_System_Final_Project_Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recommendation-System-with-ListenBrainz-Dataset

About

Releases

Packages

ireneli393/Music-Recommendation-System-with-ListenBrainz-Dataset

Folders and files

Latest commit

History

Repository files navigation

Recommendation-System-with-ListenBrainz-Dataset

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages