O'Reilly logo

Music Emotion Recognition by Homer H. Chen, Yi-Hsuan Yang

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Saunder January 24, 2011 10:39 book
10
Lyrics Analysis and Its
Application to MER
The viability of an MER system largely lies in the accuracy of emotion recogni-
tion. However, due to the semantic gap between the object feature level and the
human cognitive level of emotion perception, it is difficult for the machine to ac-
curately compute the emotion values, especially the valence values. Consequently,
many efforts have been made to incorporate mid-level features of music to MER.
For example, B. Schuller et al. incorporated genre, ballroom dance style, chord pro-
gression, and lyrics in their MER system and found that many of them contribute
positively to the prediction accuracy [286–288]. Similar observations have also been
made by many other researchers [60, 139, 183, 207]. The following three chapters
describe how such mid-level features, including lyrics, chord progression, and genre
metadata, can be utilized to improve MER. For simplicity, we focus on categorical
MER in the following three chapters. We begin with the use of text features extracted
from lyrics in this chapter.
10.1 Motivation
A popular approach to categorical MER uses audio features such as MFCCs to
represent a music signal and employs machine learning techniques to classify the
emotion embedded in a music signal. The progress of such a monomodal approach,
however, has been stagnant due to the so-called semantic gap the chasm between
raw data (signals) and high-level semantics (meanings). While mid-level audio fea-
tures such as chord, mode, articulation, and instrumentation carry more semantic
information [30,60,63,217], robust techniques for extracting such features need to
be developed.
173
Saunder January 24, 2011 10:39 book
174 Music Emotion Recognition
Complementary to music signal, lyrics are semantically rich and expressive and
have profound impact on human perception of music [20]. It is often easy for us to
tell from the lyrics whether a song expresses love, sadness, happiness, or something
else. Incorporating lyrics to the analysis of music emotion is feasible because most
popular songs sold in the market come with lyrics and because most lyrics are written
in accordance with music signal [94]. One can also analyze lyrics to generate text
feature descriptors of music.
The application of text analysis to song lyrics has been explored for singer
identification [216], structure analysis, similarity search [222], and genre classifi-
cation [226]. The use of features extracted from the lyrics to improve MER have
received increasing attention in the MIR community, and many different lyrics fea-
tures have been proposed. Earlier approaches (e.g., [62,64,352]) use either manually
or automatically generated affect lexicons to analyze lyrics. These lexicon-based ap-
proaches are considered less favorable since they are not applicable to songs of any
language. Later approaches [138,183,219,288, 326, 353, 363] are mainly based on
statistical natural language processing (NLP) [225], which is more general and well
grounded.
This chapter presents a multimodal approach to categorical MER. Features ex-
tracted from both the music signal and the associated lyrics are utilized to model
our emotion perception. Specifically, NLP techniques such as bag-of-words [290]
and probabilistic latent semantic analysis (PLSA) [129] are adopted to extract text
features from the lyrics. These feature extraction algorithms are general and can be
applied to lyrics of any language. This chapter also describes a number of multimodal
fusion methods that properly integrate the extracted text and audio features. Eval-
uation results show that the incorporation of lyrics indeed improves the accuracy
of MER. In particular, the multimodal fusion method late fusion by subtask merging
significantly outperforms the conventional audio-based approach; a +21% relative
improvement gain in classification accuracy is observed.
10.2 Lyrics Feature Extraction
Lyrics are normally available on the Web and downloadable with a simple script-
based URL (uniform resource locator) lookup [45,103]. Many websites have a fixed
pattern such as “http://[...]/?artist=[...]&song=[...]” that can be utilized to search
for lyrics. One can query the website by putting artist name and song title in the
URL pattern. Alternatively, the famous website LyriWiki [8] provides an application
programming interface (API) that allows programmatic access to the content of its
lyrics database [183].
The acquired lyrics are then preprocessed with traditional information retrieval
operations such as stopword removal, stemming, and word segmentation [290].
Stopword removal is the process of filtering out common words, such as a, the, and,
can, it, and we,toname a few. These words are removed because they are so common

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required