Case Study: Popular Artists Peak Ages using Spotify API and MusicBrainz

To see full project use this GitHub link.


Kyr Nastahunin
Malika Yelyubayeva
Mykyta Paroviy
Nicholas Rachfal
Yelizaveta Semikina

Description of the project

This project will look at the top music charts in the United States during the last 12 years (2010–2021). It will primarily focus on determining what age various music artists peak at and other aspects that contribute to it. The author of the Washington Post article When you will most likely hit your creative peak, according to science Christopher Ingraham discusses and examines when all types of artists reach their creative peak. According to Christopher, the average peak age indicated there is around 30–35 years old. However, we argue that the situation is different for performing artists such as individual musicians, with the majority of people’s careers peaking in their early and mid-twenties. Data from Spotify and a MusicBrainz website will assist us in finding most popular artists over that period of time, their popularity and their birth dates.


Spotify API (
MusicBrainz API (

Technology used:

  • Python 3 and its libraries
  • Jupyter Notebook

Importing Python libraries

import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import seaborn as sns
import pandas as pd
import numpy as np
%matplotlib inline
import sklearn
import time
import nltk
import final_report
import ml_age

Data wrangling

In the name of saving time cleaning our data, we decided to acquire our data by utilizing Spotify’s API.

The API keys are stored at the project root directory in a file named ‘api_key.txt’. There must be two lines in the txt file: first one being the Client ID and the second one Client Secret. This project accesses Spotify APIs using a wrapper library, instead of writing manual requests.

client_id, client_secret = final_report.read_api_key()
spotify = final_report.initiate_spotify_api(client_id, client_secret)

Using the API, we can now retrieve the Top 100 songs of each year by passing the playist ID’s as parameters to the Spotify Playlist endpoint. The response contains a lot of irrelevant information to our research so we drop any unnecessary columns and added some of our own columns that specify what year the song featured and it’s chart position. As a result, we get a dataframe of the most popular artists and their most popular songs from the last 12 years which contains 1150 songs (2020 only had a top 50 chart).

playlist_ids = ['37i9dQZF1DXc6IFF23C9jj', '37i9dQZF1DXcagnSNtrGuJ', '37i9dQZF1DX0yEZaMOXna3', '37i9dQZF1DX3Sp0P28SIer',
'37i9dQZF1DX0h0QnLkMBl4', '37i9dQZF1DX9ukdrXQLJGZ', '37i9dQZF1DX8XZ6AUo9R4R', '37i9dQZF1DWTE7dVUebpUW',
'37i9dQZF1DXe2bobNYDtW8', '37i9dQZF1DWVRSukIED0e9', '37i9dQZF1DX7Jl5KP2eZaS', '37i9dQZF1DX7EqpAEG8F4f']
top_charts = final_report.get_top_playlists(playlist_ids, spotify)
print("Total songs: " + str(len(top_charts.index)))
The output of the above code

Now that we have names of the most popular artists, we need to find their date of birth. We discovered an API called ‘musicbrainzngs’. With this library we can retrieve all kinds of music metadata from the MusicBrainz database. We obtain the artists birth dates by extracting the names of the artists from our Spotify dataframe, dropping any duplicates, and querying those names with the musicbrainzngs ‘search_artist’ function which returns a dictionary that contains the artists birth date. After extracting those birth dates, we calculated their current age and added it to the dataframe. However, we had difficultly calculating the age of the artists that are apart of a group, so we decided to drop all the groups.

artists = final_report.get_unique_artists(top_charts)
artists = final_report.get_dobs(artists) # no dob for bands
artists = artists.dropna()
artists['age'] = artists['dob'].apply(lambda x: final_report.calculate_age(x))
The output of the code above

Exploratory data analysis (EDA)

Our data comes from two sources and combined into one when needed. The majority of data comes from Spotify. After dropping the unnecessary columns we get the following data

Each entry in this table is an individual song that was featured in top charts in one of the years in our exploration gap. Each song has the following data: list of authors, name of the song, popularity, year it was featured in the top charts, and the chart position it was in.We are interested in the popularity feature, that allows us to estimate how popular certain artists were on specific year, however there is no historical data regarding popularity, just current.

Each entry in this table is an artist that appeared in the top charts from 2010–2021. Each artists has the following data: a date of birth and their current age. The age is a temporary feature that we’ll only use for EDA, since we will need to calculate ages for each song the author has released.

artists.drop(artists[artists.age < 15].index, inplace=True)
print("The average age in the charts is " + str(artists['age'].mean()))
print("The max age in the charts is " + str(artists['age'].max()) + " and the minimum age is " + str(artists['age'].min()))
print("The standard deviation is " + str(artists['age'].std()) + " and variance is " + str(artists['age'].var()))

As a part of our exploratory data analysis, we got interested in what artists featured in charts the most over the last 12 years. We aggregated our top_charts dataframe by artists and plotted by number of appearances. As expected there aren’t many young artists here since the longevity of their careers are shorter than those of older artists. Actually most of these artists being in their 30’s now.

charts = top_charts.copy()
charts = charts.explode('artists')
final_report.plot_most_frequented_with_age(charts, artists)

To get a better understanding of when music stars reach the end of their careers, we took our top_charts dataframe and added two attributes ‘age group’ and ‘release date’ to aggregate artists by their respective age groups and the year they released their track. We defined our age groups 20–24 being early their early twenties and 25–29 being their late twenties and so on. We can see that in the late thirties there is already a small amount of songs released and featured in top charts by artists as these artists either stopped making songs or their songs did not get into the top charts anymore. So in many cases, music stars reach the end of their careers after the age 35.

charts = top_charts.copy()
final_report.plot_age_groups(charts, artists)

Furthermore, to help determine when the artists peak ages are, we looked at all the songs that were released in the last 12 years and plotted at what ages their artists released them. We can see that the most common age is 26, and then the count is evenly distributed around that age. However, there are still more songs released after the age of 26 than before.

charts =  top_charts.copy()
final_report.plot_chart_aged(charts, artists)

Now let’s look at the peak ages of the aritst. First let’s only keep the artists who were featured in charts at least 3 times, in order to not analyze artists who have too little data about them. Then find their most popular track of all times and drop the rest.

df = pd.read_csv('tracks.csv')
until_2021 = df[df['chart_year'] < 2021]
names = until_2021[['artist_name']]
names = names.pivot_table(columns=['artist_name'], aggfunc='size').reset_index().rename(columns={0: 'count'})
names = names[names['count'] > 2]

until_2021 = until_2021.merge(names, on='artist_name').drop(columns=['count'])
most_popular_songs = until_2021.groupby('artist_name')['popularity'].max().to_frame()
most_popular_songs = until_2021.merge(most_popular_songs,
on=['artist_name', 'popularity']).drop_duplicates(subset=['artist_name', 'name'])

Now let’s see the distribution of the age and popularity, only considering the most popular song for each author

# make 2d plot of popularity and release age
figure(figsize=(5, 5), dpi=80)
fig = plt.scatter(x=most_popular_songs['release_age'], y=most_popular_songs['popularity'])
plt.xlabel("Age", labelpad=14)
plt.ylabel("Song popularity", labelpad=14)
plt.title("There's still a significant number of aritsts that made their most popular song after 30")

Machine Learning

Using spotify’s audio features for each track decide which artist’s age category the track belongs to. Categories are defined the same as in the graph with age categories eg: Early Twenties, Late Twenties, …


Predict the popularity using audio features, collab feature and genre

import ml_pca

projectionsDf, y_data =

Tryng linear regression Model with 2 best features results of the PCA above.

ml_pca.additional(projectionsDf, y_data)

Now let’s perform a hypothesis testing, in order to confirm at what age the artists acually peak. Our team voted and decided that the peak age for artists would probably be around 27. Let’s test it!

Null hypothesis: majority of popular music performers reach their creative peak at 27 years old.

Alternative hypothesis: majority of popular music performers reach their creative peaks after 27 years old.

# make a p-value t-test for our hypothesis# one-tailed alpha value with 120 degrees of freedom
significance = {
0.05: 1.658,
0.025: 1.980
hypothesis_mean = 27.0
sample_mean = most_popular_songs['release_age'].mean()
standard_deviation = most_popular_songs['release_age'].std()
sample_size = most_popular_songs.shape[0]
print("Sample size: " + str(sample_size))
t = (sample_mean-hypothesis_mean)/(standard_deviation/np.sqrt(sample_size))
print("T-value: " + str(t))
print("At significance-level 0.05 the null hypothesis is " + ("not " if t < significance[0.05] else "") + "rejected")
print("At significance-level 0.025 the null hypothesis is " + ("not " if t < significance[0.025] else "") + "rejected")


We were able to disprove the statististics given by the Washington Post article. We learned that the peak age for music aritsts is about 26–29 years old. However, there is still a significant number of artists that peak earlier or later. We also learned that the careers of musical aritsts rarely survive after 35. The majority of them either stop making new songs, or their songs rarely reach the top charts. Current starlets might use this information to plan their careers and see what they can expect in the future.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store

Computer scientist with a passion for solving problems and creating user-friendly experiences. 👩🏻‍💻