Skip to content

itstooerli/song-popularity-predictors

Repository files navigation

Song Popularity Predictors and Similar Songs

This is a data science project leveraging a dataset of over 170,000 songs. We seek to create a playlist of songs of similar characteristics that shuffles the song popularity to improve the likelihood a user identifies a new song that he or she will enjoy.

Description

In order to accomplish to this goal, our project will more succintly investigate the following three questions in order:

  1. Can we predict whether a song is popular or not based on its attributes?
  2. Can we predict whether a song is relatively more popular than another based on their attributes?
  3. Can we suggest to a user a new song based on the current song they are listening to?

Navigation

Summary

(I recognize that the facecolor of the plot axes may blend on dark mode. I fill fix on a future iteration of this README.me. For a closer look, please review the provided report/notebook.)

To answer question 1, we used a voting ensemble classifier with a random forest classifier base estimator with default parameters to predict whether a song is popular or not popular. The model's F1 score was 0.878.

classification_rocauc

To answer question 2, we arrived at a gradient boosting regressor with a learning rate of 0.07 and a max depth of 10 to produce a pairwise ranking accuracy of 0.82. The feature importance is given below.

feature_importance

To answer question 3, we reduced the dimensionality of the dataset to 2 dimensions with PCA and leveraged k-means clustering to cluster similar songs. From the elbow method, we determined that 4 clusters were most effective at describing the dataset.

clustering_pca

Authors

Contributors names

  • Eric Li
  • Wanqin Chen
  • Yuwei Wang

Acknowledgments

Dataset and supplemental material

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published