The advent of technology has impacted hip hop and changed the record industry for all genres. The quality of a record has gone through various alterations as the music industry transitioned from vinyl to cassettes, compact discs and digital music sites. This is the first of a series of pieces that will examine how Spotify and Spotify listeners evaluate 1990’s hip-hop. The analysis will be a high level analysis that examines Spotify evaluation system and the Notorious B.I.G discography provided by Spotify. The visualization above lists the top ten most popular Notorious B.I.G records and their popularity scores.
- Hypnotize
- Big Poppa
- Juicy
- Mo Money, Mo Problems
- Nasty Girl
- Notorious Thugs
- Who Shot Ya
- Suicidal Thoughts
- Gimme the Loot
- 1970 Something
SPOTIFY DATA
A Spotify API was used to scrape Spotify and acquire the data. The music extraction process was carried out by utilizing python code provided by Kaggle. The analysis and visualizations were conducted and created with R.
The data cleaning process required the removal of features that were not needed for the analysis. Columns “X”, “id” and “uri” were removed from the data set.
The image above provides central tendency summary information for each column. We will analyze central tendency through visualizations but the image also provides the columns in the data frame post cleaning. The data has no NA’s and no further cleaning was done beyond what has been mentioned above.
MEANING OF SPOTIFY MEASUREMENTS
The album category provides all of the Notorious B.I.G albums on Spotify. Track_number lists the track number in the album. The name category provides all of the record titles within the album. The remaining columns are measurements created by Spotify. Special thanks to The Verge and Towards Data Science for providing the definitions for each category.
Acousticness: A confidence measure from 0.0 to 1.0 of wheter the tracks is acoustic. 1.0 represents high confidence the track is acoustic.
Danceability: Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
Energy: A measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
Instrumentalness: Predicts whether a track contains no vocals. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater the likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
Liveness: Detects the presence of an audience in the recording. Higher liveness value represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood the track is live.
Speechiness: Detects the presence of spoken word in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Value above 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
Tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
Valence: A measure fro 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valance sound more negative (e.g. sad, depressed, angry).
Loudness: The overall loudness of a track in decibels. Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks.
Popularity:Calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past.
Of interest is the Valence category. Valence is defined as “describing musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).” It appears that Spotify measures how the record sounds as opposed to doing NLP and Sentiment Analysis. This is a theory based on the definition and something we will examine in the data.
According to the definition, the popularity value weighs records that are played now more heavily than older records played in the past. This means that new artists like Drake are likely to have higher rated records in popularity than artists like Notorious B.I.G., 2pac etc…This provides an area of exploration in the future.
SPOTIFY’S NOTORIOUS B.I.G DISCOGRAPHY
A quick examination of the unique values in the album column provides the Notorious B.I.G discography on Spotify. Nine albums are listed:
- Life After Death 25th Anniversary Super Deluxe Edition
- Music Inspired By Biggie: I Got A Story To Tell
- The King & I
- Duets: The Final Chapter
- Born Again
- Life After Death 2014 Remastered Edition
- Life After Death 2014 Remaster
- Ready to Die The Remaster
- Ready to Die The Remaster: 2015 Remaster
Biggie has two studio albums: Ready to Die(1994) and Life After Death(1997). He has two collaboration albums: Conspiracy with Junior M.A.F.I.A (not listed in Spotify dataset) and The King and I with Faith Evans (2017). There are two posthumous compilation albums: Born Again(1999) and Duets: The Final Chapter (2005). The Spotify dataset has the majority of Biggies discography. There are duplicates of Life After Death and Ready To Day via Remaster versions and the Super Deluxe Edition for Life After Death. Due to the duplicate albums there are duplicate records. The purpose of this analysis is to focus on how Spotify evaluates the entire discography. We are looking for patterns and the correlations between the Spotify variables. The main question we are. trying to answer is what is the relationship between popularity and all the other variables (Loudness, Valence, Tempo, Speechiness etc…).
BIGGIE DISCOGRAPHY DENSITY ANALYSIS
The visualization is a representation of the summary data. The yellow density charts represent variables that are least found in Biggies discography. The data skews to the left in these plots because the frequency is low at the higher measurements on the X-axis. Most frequent points are closer to the X and Y-axis intercept at the 0,0 points. The Spotify metrics does not find much instrumentalness in the albums and the shape is not a curve but a descending triangle which means that the majority of the points are zero’s. The definition of instrumentalness defined it as a prediction of no vocals in the records. Rap is vocal and thus the reason for the shape of the density graph.
The other variables that skew to the left are liveness and acousticness. Liveness detects an audience in the record. Liveness is more likely to play a role if an album is a live album or if the records is a live recording at a concert or live performance. Acousticness has a low confidence level with the majority of the points away from 1.0-a measurement of high acousticness confidence.
The green density graphs skew to the right. The main take away is that these variables are more prevalent in the discography and occur at the higher values. The theory was that Valence did not involve NLP or sentiment analysis. Valence skews to the right which means that the majority of the discography is musically conveyed as positive. Valence measures the sound of the music and because other values like danceability, energy and loudness skew to the right, the music is positive sounding. It lacks meloncholy despite the darkness of the lyrical content.
The legend references popularity and it’s the feature that most resembles a Gaussian distribution(normal distribution). The legend provides a reminder of how the image does not extend to the maximum and fades out at 80. The majority of the values are at the center but the curve slightly skews to the right at the maximum score is 79. The visual is not a gaussian distribution but it does provide a high level view of where the majority of Biggies records are evaluated on Spotify metric.
Tempo has a median of 99.14 beats per minute and a max of 201.94(BPM). The mean is 113.65(BPM). The visualization is not a normal distribution. There are two peaks skewed to the left. The high tempo measurement combined with energy and danceability variables is the reason why the valence is positive. The sound is designed to create a positive body reaction. Feet tap. Heads nod.
VARIABLE CORRELATION
When examining correlation it is important to remember that correlation does not mean causation. Correlation measures how to variables move in relation to each other. It does not mean that one variable effects the change in another. By evaluating the correlation we can find patterns and that can lead to discovering causation.
The correlation plot includes histograms, density functions, smoothed regression lines and correlation coefficients with the corresponding significance levels. Our focus is on the correlation coefficients with corresponding significance levels between popularity and the other variables. Popularity has the highest correlation significance levels with danceability and loudness. We can also see the coefficient corresponding significance levels between popularity and the variables that were skewed to the right and to the left. Popularity has a negative correlation with the variables that skewed to the left(acousticness, instrumentalness and liveness). The last point of note is the relationship between tempo and popularity. The relationship has the lowest correlation coefficient significance out of all the variables that were skewed to the right.
The benefit of the correlation plot is that it provides some evidence that can help in the selection of features to predict popularity of rap albums. For now, we are trying to understand the Spotify data.
The correlation relationship is also explained through the scatterplots. A higher correlation coefficient means that the plots are more condensed. A correlation relationship of 1.0 means that the plots are linear and all of the points can be plotted on one line. Recall the mathematical equation for slope of y = mx + b. We could predict points on the line using the formula. We could predict the points of that line from the formula because the correlation coefficient was 1.0. The slope could be positive or negative. A positive slope gave us an ascending line. A negative slope gave a descending line but all of the points were on the line. In the charts above the correlation coefficient is not 1.0. The highest correlation coefficient is 0.39 and we can see that for danceability and loudness, the points are condensed and close to the lines drawn on the plots. When the correlation coefficient gets closer to 0.0, the plot is not as condensed and the points fan out and are more spread out. Tempo has the lowest correlation and we can see that the points are spread out and not as condensed as Popularity and Loudness.
We must take into consideration that popularity is affected by time. According to Spotify, current records are more likely to have higher popularity scores than older records. With more data we can compare if the patterns we see with Biggie’s discography hold up with current artists or if they change. We can also compare how hip hop artists from the 1990’s compare to the Biggie data. We did not find one feature that is the causation for popularity.
CONCLUSION
We learned that the features Acousticness, Liveness and Instrumentalness have a negative correlation with popularity and that they skew to the left because the Biggie Smalls sound mix for the discography does not include these features. We also learned that Valence measures how positive the music sound and is not a measure of lyrical sentiment analysis. The most popular record in the discography is Hypnotize with a score of 79. Popularity is influenced by time and it favors more recent records than older records. Hypnotize is not only one of Biggie smalls most popular record of all time but it was featured in the film Spiderman: Into the Spider-verse. In the past, songs such as Bohemian Rhapsody have seen a rebirth in popularity because they featured in films. The popularity algorithm on Spotify lends itself to such a strategy and a resurgence can occur for older hip hop records.
In Part II we will compare and contrast the albums Born to Die and Life After Death.
0 Comments