When heavy metal meets data science | Episode III

Sentiment analysis and topic modeling

Luca Ballore
17 min readJun 1, 2020

All the code related to this work is available at this GitHub repository.

Artists write songs because they have a message they want to share. Sometimes the message is about promoting an idea or a cause the writer believes in. Other times it could be about religious or moral issues or just an attempt to share something positive to the world.

Regardless of what the contents are, a song is about strong feelings, a stream of consciousness encoded in words. In this article, I will continue the journey in the world of heavy metal and try to discover topics, sentiments, and emotions with the help of data science.

Sentiment analysis of heavy metal

The guitarist Marty Friedman (picture by German Rojas)

In my previous article, I tried to find a way to measure the “metalness of the lyrics of my dataset. The idea behind my solution was to assign a “metal score” to all the words contained in the corpus, excluding punctuations and the so-called stopwords. The peculiarity of the metalness index is that it is based on the word frequency in the metal dataset in relation to another dataset of non-metal songs.

A somewhat similar approach is used by Hedonometer, a tool that makes use of Twitter time series to map the happiness of the users on a daily basis.
To quantify the happiness of the texts, their team merged the most frequent words from a collection of four corpora: Google Books, New York Times articles, Music Lyrics, and Twitter messages. This operation resulted in a composite set of roughly 10,000 unique words.
With the help of tools like Amazon’s Mechanical Turk service, they have been able to score each of these words on a nine-point scale of happiness: [1] sad to [9] happy.

Is it possible to apply this method to my text corpora?

Scoring sentiment

A simple method I used to measure the sentiment of the heavy metal dataset was to download the scored list made by Hedonometer (available for download at this address) and assign the same “happiness score” to the words of the dataset. I took into consideration only the words in common between the two lists, so the intersection resulted in a smaller table, reducing it from ~10,000 to 7,881 entries:

Fig. 1 — Ranked words in the heavy metal dataset

Exploring the metalness/happiness plane

One of the first things I was curious to find out was to see if metalness and happiness could somehow be correlated. Even if my definition of metalness is based on word frequency (easy and “cold” calculations), there are undeniable links between emotions and what people define as metal.

I placed the previously ranked words in a bi-dimensional metalness/happiness plane to try to find any evidence of that in my data. If I used all the ranked words on the list, it would have made the plane unreadable, so I reduced the set to the 100 most common words, resulting in the graph below.

Fig.2 — Metalness/Happiness plane for the first 100 most common words in the dataset

This empiric approach worked better than expected. If we divide the graph into quadrants, it becomes easy to find a cluster of death and evil-related words (death, evil, war, hell, pain, etc.) in the bottom-right corner.
Another identifiable cluster can be seen in the middle quadrant. It contains words related to emptiness and loneliness: left, far, away, fall, nothing, last, alone, etc. The words in this cluster are sentiment-neutral (with a happiness score between 4 and 5) but still have a medium-high metalness index. This makes complete sense because they are recurrent themes in heavy metal lyrics.
Moving up in “happier” quadrants, I could find another cluster, this time related to freedom and hope: sun, free, sky, dreams, life, etc.

However, This method has clear limitations. Many of the words that form part of the clusters revealed in the graph can mean a lot of different things depending on the context in which they are used. The solution described above does not take any context into consideration, and this can lead to inaccuracy.

Let us have a look at some examples of how the word God is used:

God hates us all, God hates us all
Yeah, he fuckin’ hates me

(Slayer, Disciple)

God of thunder god of rain
Earth shaker who feels no pain
The powerhead of the universe
Now send your never ending curse

(Manowar, Thor (the Powerhead) )

Wearing black, a bow without arrows
God, have mercy on his soul

(Angra, The Shadow Hunter)

In the first example, God refers to something negative and hateful. In the second one, it is used to invoke the power of a God (Thor), and in the third one, God is named in a sort of prayer for a lost soul.
In each of these examples, the word God has a very different meaning, but the scoring method I used treats them the same.

As you can see, the distinction between words used in a positive or negative way is not a trivial task in natural language processing. As such, there are some other approaches that are better suited for this purpose. For example, it is possible to:

  • Utilize more powerful models, like recurrent neural networks or 1D convolutional neural networks attached to a densely connected classifier, which have proven to work efficiently for sentiment classification tasks;
  • Try to deduce the part-of-speech (POS) of a word, and then apply a similar scoring to the (word, POS) tuples, instead of doing that with the words only. The inference of POS of a word has already been described in my previous article;
  • Use a lexicon and rule-based sentiment analysis tool like VADER, that can be used in multiple domains with similar results and has proven to work efficiently even with slang words;

My choice fell on the third method because I thought it could guarantee a good compromise between performance and time required for the implementation.

Sentiment analysis with VADER

Photo: https://tookapic.com/dvader

The documentation defines VADER as follows:

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

A sentiment lexicon is a list of lexical features (words) that are usually labeled according to their semantic orientation as either positive or negative. Exactly as per Hedonometer, VADER developers use Amazon’s Mechanical Turk to get most of the ratings they need.
Some of the characteristics that make VADER a good fit for this task are:

  • It works very well on social media type text, but can also be used profitably in other domains;
  • The widely applicable, valence-based and human-curated nature of the lexicon makes VADER a tool that does not require any training data;
  • It performs very well with slangs and acronyms in sentences, which is not uncommon in heavy metal lyrics;
  • VADER is not only able to identify the sentiment of a text, but it can also quantify how positive or negative it is;
  • It is included in the NLTK library I already used to perform other tasks such as tokenization and POS inference;

For more insights, you can find complete details on the Github page of the VADER project.

It took just a few lines of code to set up a function able to calculate a VADER sentiment index for each song of the dataset, thanks to the NLTK library. I obtained a float number between the interval [-1;1], where the higher the number, the happier the text.

According to my results, here is the most “positive” song:

There I was talkin’ ‘bout you out on the street
it won’t be long before we finally meet
I played my cards and now my hand was alright
and now I’m searchin’ but I don’t see the light

We had some action but it split at the seems
the situation had me weak in the knees
Now you look for me to hold you near
I still need you so dry your tears

I only wanted to be loved by you
ooh ah loved by you you loved by you
whoa loved by you

You gave me feeling all around in my brain
now don’t you stop I’ve got to feel it again
Your door is open and I come inside
my mind is achin’ and my body’s afire

There was a time you acted so refined
there was a time when you would blow my mind
And now I’m lookin’ at the way you changed
here we are playin’ the same old game

I only wanted to be loved by you
oh loved by you ooh loved by you you
loved by you yeah yeah oh

Oh oh oh you know what daddy likes
ooh ooh ooh ooh ooh ooh oh oh yeah oh aaaah

There was a time you acted so refined
there was a time when you would blow my mind
And now I’m lookin’ at they way you changed
here we are playin’ the same old game

I only wanted to be loved by you
oh loved by you you loved by you baby you
loved by you oh

Loved by you baby loved by you you loved
by you eh hey loved by you tell me
Loved by you you wanna love me loved by you
I know you do loved by you sugar loved by you oh

Loved by you you loved by you oh
loved by you you and you
loved by you oh

Loved by you ooh
loved by you whoa huh
hey you know that loved by you
love me loved by you

(Riot, Loved By You)

As you can notice, this song is a lot about love and the desire to be loved. It obtained a sentiment score of 0.9996, which is probably higher than I expected. One of the limits of VADER is that it is not always very good at catching the melancholic shades of a text.
In this case, the score has also been boosted by the obsessive repetition of “positive” n-grams like “loved by you”, or “love me”, highly ranked by VADER.

Let’s have a look at the song ranked as most “negative”:

The end of all law
4 shots fired another body falls
I execute the guilty violently
Undercover killing spree, no warning shot

Die motherfucker, die, die
Die motherfucker, die, die
Die motherfucker, die, die

I’ll put a bullet between your fucking eyes
Pull the trigger — cock the hammer back
5th shot to the back of your neck
You’re not a threat, you’re a fucking disease

Eradicate the enemy
Dead body, another crime scene
Blood-stained pavement, chalk outline
Bullet holes you’re dead and cold
The end of all law, no warning shot

Die motherfucker, die, die
Die motherfucker, die, die
Die motherfucker, die, die
Die, I put the gun to the side of your head

Squeezing the trigger
Powder burnt skin, breaking through cranial bone
Decayed brain tissue implodes

Just another life that you thought you could control
Just another pig, dead, with some extra holes
You better think again, before I kill again
You won’t survive, when the bullets start to fly

Protect and serve yourself
Dug your own grave, now rot
In that hole decay
The murder will never stop, no warning shot

Die motherfucker, die, die
Die motherfucker, die, die
Die motherfucker, die, die
I’ll put a bullet between your fucking eyes
Die

(Six Feet Under, No Warning Shot)

This text obtained a sentiment index of -0.9997. In this case, hate and violence expressed in such a brutal way made it hard to argue against the VADER score.

Ranking metal bands sentiment

Once I ranked the sentiment of every single song of my dataset, I was able to compute the mean index for each band:

Fig. 3 — The most positive and most negative bands according to VADER index

It is interesting to note that even the most “positive” band has an index that barely surpasses the neutral level, while the most negative one has a much higher absolute value. Also, the mean value of metalness tends to be higher for bands having a lower sentiment index.
I could see this result as a further confirmation of the strong connections between heavy metal and feelings like anger, sadness, and depression. It is no coincidence that a recent study has demonstrated that heavy metal helps combat these feelings. There is a quote from that paper that I think sheds further light on this matter:

“When I’m sad I don’t want to listen to Happy by Pharrell, I want to listen to something sad, something that understands me.”

Redefinition of the Metalness/Happiness plane

The graph showed in Fig. 2 displays clusters of words that belong to the same topic obtained with the sole use of a customized metalness index and a not perfect sentiment scoring method.
After the adoption of VADER, I decided to repeat the same experiment, adding some variations:

  • The plane would no longer contain words, but heavy metal bands;
  • The number of entries would be restricted to a sample of 65 bands chosen between the most popular ones in the dataset;

This time, the scope of my experiment was to find out if the metalness/happiness plane could aggregate bands that belong to the same sub-genre of metal.
In music, a genre or a sub-genre is just a conventional category that identifies pieces of music with a shared set of unwritten rules, which include topics and, of course, sentiments. If this definition is correct, then the graph should reveal clusters of bands that share the same set of “conventions”.

Fig. 4 — Metalness/Happiness plane for a sample of 65 bands in the dataset

The plane in (Fig. 4) matched my intuition surprisingly well. In the area close to the up-left corner I noticed a big cluster of power/prog metal bands, like Kamelot, Dream Theater, Ayreon, Angra, Gamma Ray, Stratovarius, Sonata Arctica, etc.
More on the right side, I noticed another cluster of bands I would classify as power/epic metal: Hammerfall, Rhapsody, Blind Guardian, Virgin Steele, Manowar. I was also able to find a death/black metal cluster (Dimmu Borgir, Immortal, Satyricon, Morbid Angel, Arch Enemy, Death, etc.), and a smaller group of similar thrash metal bands (Sodom, Destruction, Venom).
Anthrax, Megadeth, and Metallica are also members of a shared cluster, missing only Slayer to complete the so-called “big 4 of thrash”.

Exactly like the first experiment (Fig. 2), even this one showed some limitations. There are always some bands that will appear inside clusters they should not belong to, because sentiments, as well as topics, are not exclusive of a particular sub-genre. Moreover, the artistic nature of music often makes these classifications arbitrary and controversial. In other words, some sub-genres may overlap.

Another problem is that music is not a static entity. It is always in motion: messages, sentiments, topics (as well as the music itself), everything continuously changes during the career of a band.
Let us have a look at the position of Opeth in Fig. 4. It is located precisely in the middle between a cluster of death metal bands (Arch Enemy, Death, Morbid Angel, etc.) and another group of prog metal bands (Ayreon, Dream Theater, etc.). This reflects in some way the career path of this band.

Their musical journey convinced me that it was worth investigating the “sentiment paths” of metal bands.

How do their metalness/happiness plane vary over the years?

Bands sentiment path

Opeth’s frontman Mikael Åkerfeldt (Down The Barrel photography)

Before I could outline the so-called “sentiment path” of a band, I had to measure the metalness and the sentiment at discrete time intervals and look at how they changed over time. The best way to do that with the data I had available was to calculate the mean values of the two variables per each album and then plot a meatless/happiness plane with the records of the bands I aimed to analyze.

This is an example of the sentiment path of the Swedish band Opeth, already named in the previous paragraph:

Fig. 5 — Sentiment path for Opeth

As a fan of the band in question, I found the graph above (Fig. 5) very accurate. Opeth is a band that, in some ways, has always been atypical for the sub-genre it has been labeled to. Thanks to an excellent mix of aggressive death metal riffs and acoustic guitars with a clean voice, their early albums were well-received as important contributions to melodic death metal.
In particular, Blackwater Park (2001) gave the band a higher profile in the metal scene.
In recent years, however, the lead singer, guitarist, and songwriter Mikael Åkerfeldt has abandoned death-metal growls and metal riffs for songs that are more inspired by his record collection: seventies rock and prog. This is partially reflected also by the lyrics, and by the metalness value that moved slightly to the left of its axis. The sentiment seems to move to unhappier values, suggesting (again) that the happiness of the treated topics is not always proportional to metalness.

For example, Opeth’s frontman said that the album Sorceress (2016) was inspired by “the jealousy, the mindfucks, the paranoia” that come with affairs of the heart. In this case, the dark sides of a topic often perceived as “positive” (love) have been successfully detected by VADER, which gave the album a negative sentiment score.

Other examples of sentiment paths are plotted in the graph below: Metallica and Slayer.

Do you agree with these results?

Fig. 6 — Sentiment path for Metallica and Slayer

Heavy metal sentiment paths in literature

The same concept can also be applied to other forms of literature.

How metal is The Lord of The Rings for example? And how does the sentiment change throughout the books of the saga?

All I had to do was to find a text corpus of Tolkien’s masterpiece (Kaggle turned out to be pretty useful in this case) and perform the same operations I did for my heavy metal dataset. Here are the results:

Fig. 7 — Sentiment path for The Lord of The Rings saga

The Lord of The Rings turned out to be quite metal, with a metalness index close to 0.45 at its maximum. The sentiment moves slightly in the plane from positive to neutral, matching — in my opinion — the course of the events. At the end of the saga good prevails, but evil is not ultimately defeated, and the sentiment portrays the price that was paid in the many sacrifices and deaths.

And what about the Harry Potter saga?

Fig. 8 — Sentiment path for the Harry Potter saga

The J. K. Rowling’s series does not have a very high metalness score, and it is also quite neutral when it comes to sentiment. The majority of the books are clustered in nearly placed quadrants, but there are a couple of singularities. The last book, The Deathly Hallows, is much more “metal” than the others, and together with The Prisoner of Azkaban, it is also the most unhappy.

My curiosity pushed me to analyze a version of another very famous “saga”: the King James version of The Holy Bible. It has a lot of books, and some of them have pretty dark atmospheres.

But which ones are the most metal and the most unhappy?

Fig. 9 — Metalness/Happiness plane of the King James’ Bible

Unsurprisingly, the most metal book is The Revelation of Saint John the Divine, also known as The Apocalypse. Apart from some singularities, the Bible’s books have a pretty high mean metalness score and some of them, like the Habakkuk and The Lamentations of Jeremiah, are also quite unhappy.

In wrath you strode through the earth
and in anger you threshed the nations.

You came out to deliver your people,
to save your anointed one.
You crushed the leader of the land of wickedness,
you stripped him from head to foot.

(Habakkuk 3:12–13)

Simple topic modeling

Previously in this article, I described a simple method to cluster words belonging to the same topic in a metalness/happiness plane (Fig. 2). It worked fairly well for being a naive approach, but when it comes to topic modeling, other methods have proven to be much more robust and effective.
I did some experiments with one of those called Latent Dirichlet Allocation (LDA).

Despite the pompous name, the concepts behind LDA are quite intuitive. I will not dive into the details of this method, but it could be useful to keep in mind the two principles that guide it:

  • Every document is a mixture of topics. Imagine that each document contains (or may contain) words from a finite number of topics in specific proportions. As an example, consider a two-topic model. We could say that “Document X₁ is 80% topic A and 20% topic B, Document X₂ is 40% topic A and 60% topic B, etc.
  • Every topic is a mixture of words. Take as an example the two-topic model named before. Imagine applying that model in a sports news text corpora, where we have one topic for “football” and another one for “basketball”. The most common words in football might be “goal”, “midfielder”, and “striker”, while in basketball they could be “center”, “guard”, and “dunk”. More importantly, words can be shared between topics; terms like “ball” or “pitch” might appear in both equally.

LDA is, in other words, a method used to estimate both of these principles at the same time: to find the mixture of words that are associated with each topic, but also to determine the mixture of topics that describes each document.

Before I could apply LDA to the heavy metal text corpora, I had to set some parameters to initialize the algorithm:

  • The number of topics to be found. I set it to 9 in order to match the number of the most represented metal sub-genres in the dataset: Heavy, Death, Thrash, Black, Power, Epic, Prog, Glam, and Doom.
  • The number of top words per topic to be displayed. The choice of this parameter is more arbitrary, and I thought 20 was a reasonable choice;
  • A random state value to make the inference more stable. This value is an arbitrary integer number.

Here are the results:

Topics found via LDA:Topic #0:
dead blood death life pain flesh hate die fucking kill body lies face inside human skin head mind eyes self
Topic #1:
know just time away life way love feel say world day like make won want heart come need tell right
Topic #2:
death shall blood war world rise earth power fight gods come end black man fall hell souls die stand time
Topic #3:
war kill die man world time people law blood ready gun like death killing fight attack red streets end killer
Topic #4:
life time soul eyes night mind light world dark fear end pain inside come lost darkness death dreams feel free
Topic #5:
god hell holy lord evil devil satan son cross christ heaven father jesus blood come burn sin man black soul
Topic #6:
like love light heart eyes sun away night sky cold black rain burn tears come deep dark fall sea day
Topic #7:
got like just gonna know want wanna rock baby need fuck little make come right think love way feel man
Topic #8:
night ride sky metal land wind hear high fly thunder wild steel king black ice fight tonight cold come time

LDA is not able to name the topics; this is still a task left to whom analyzes the data. Here is my interpretation:

  • Topic #0: death metal;
  • Topic #1: undefined, those are words that many sub-genres have in common;
  • Topic #2: heavy/power metal;
  • Topic #3: thrash metal;
  • Topic #4: gothic metal;
  • Topic #5: black metal;
  • Topic #6: doom metal;
  • Topic #7: glam metal;
  • Topic #8: power/epic metal;

Just to be clear, with my interpretation I did not want to make these topics define the complexity of a sub-genre with barely 20 words. The only thing that the list above does is to visualize a probability to find words related to a topic in a specific sub-genre.

Conclusions

In this article, I described my attempts to analyze words and sentiments of metal music. In particular, I was able to see the changes of the sentiments throughout the albums of a band, and also how words can shape — even without the sound of the instruments — all those conventions and traditions commonly known as music genres.

In my next article, I will try to use deep learning to create a generator of heavy metal lyrics.

Stay tuned!

Metal isn’t necessarily aggressive. There’s metal that’s contemplative, there’s metal that’s sad, and there’s metal that’s exuberant. No genre is limited in what it can express.

(John Darnielle)

--

--

Software Engineer @EA_DICE. AI enthusiast, music addicted, languages lover, football maniac, NERD.