B.Tech Semester III | Machine Learning Project

🎵 Song Popularity Analysis

Anoushka | February 2026

Final Results

114K+
Tracks Analyzed
77.5%
XGBoost Accuracy
0.859
Peak ROC-AUC
84%
Popular Song Recall

✨ Technical Highlights

🔧 Engineered Features

Implemented Vocal Intensity and Energy Balance to capture the complex synergy between audio signals.

🤖 Advanced Ensemble

Moved beyond linear baselines to XGBoost, capturing non-linear relationships with high precision.

🔍 Explainable AI

Used SHAP Analysis to demystify the "black box," visualizing how each feature pulls the prediction toward popularity.

🎯 Precision Tuning

Shifted decision thresholds to 0.4436 to optimize the F1-Score and maximize hit discovery.

🏆 Model Leaderboard

Model Accuracy ROC-AUC F1-Score Stability
XGBoost (Champion) 0.775 0.859 0.780* +/- 0.006
Gradient Boosting 0.766 0.850 0.760 +/- 0.007
Random Forest 0.761 0.839 0.771 +/- 0.010
Logistic Regression 0.577 0.621 0.590 Baseline

* Scores validated via 5-Fold Stratified Cross-Validation for robustness.

🔑 The "Hit" Ingredients

What Drives Popularity?

  • Genre Encodings: The strongest predictor of initial popularity potential.
  • Emotional Valence: Listeners prefer positive, upbeat tones over somber ones.
  • Danceability: Rhythmic consistency remains a staple of chart-topping tracks.
  • Energy Balance: Success favors a balanced mix of organic and electronic production.
  • Strategic Length: Modern streaming favors brevity; 2.5–3.5 mins is the current sweet spot.