Two students and researchers at the University Of San Francisco (USF) have recently tried to predict billboard hits using machine-learning models. In their study, pre-published on arXiv, they trained four models on song-related data extracted using the Spotify Web API, and then evaluated their performance in predicting what songs would become hits.

The two students Kai Middlebrook and Kian Sheik used the Spotify Web API to collect data for 1.8 million songs, which included features such as a song’s tempo, key, valence, etc. They then also collected approximately 30 years’ worth of data from the Billboard Hot 100 chart. They used the logistic regression model, RF architecture, SVM and neural network architectures. Among the four models used by Middlebrook and Sheik, the logistic regression model is the easiest to interpret, while the neural network-based one is the hardest. The researchers carried out a series of evaluations to test how well the four models could predict billboard hits. They found that SVM architecture achieved the highest precision rate (99.53%), while the random forest model attained the best accuracy rate (88%) and recall rate (85.51%).

According to Sheik, if companies and producers start using algorithms to make artistic decisions, these models should be designed in a way that does not stunt the progress of art. The architectures developed by the two researchers at USF, however, are not yet able to achieve this.

Read more here