O'Reilly logo

Music Emotion Recognition by Homer H. Chen, Yi-Hsuan Yang

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Saunder January 24, 2011 10:39 book
9
Probability Music
Emotion Distribution
Prediction
This chapter describes a new approach to dimensional MER that assigns soft
(probabilistic) emotion values instead of hard (deterministic) emotion values to
music pieces. This approach considers the affective content of a music signal as an
emotion distribution in the emotion plane and trains a computational model to
predict the music emotion distribution. In this way, one can model how subjective
the perceived emotion of a music piece is and how likely a specific emotion (defined
by valence and arousal values) would be perceived by a person while listening to
the music piece. To our best knowledge, this is the first approach to dimensional
MER that computes soft emotion values. This chapter also presents an extensive
performance evaluation of this approach and describes how this approach can be
applied to enhance our understanding of music emotion.
9.1 Motivation
As we have discussed in the previous chapters, simply representing a song as a single
point in the emotion plane according to the mean valence and arousal (VA) values
is not enough due to the subjectivity of emotion perception. As Figure 9.1 shows,
the perceived emotions of a song in fact constitute an emotion distribution in the
emotion plane. For some songs the distribution is dense and monomodal, whereas
for others it is sparse and multimodal.
Motivated by the above observation, this chapter describes a new perspective
that considers the perceived emotion of a music piece as a probabilistic distribution
145
Saunder January 24, 2011 10:39 book
146 Music Emotion Recognition
–1
–1
0
1
–1
0
1
–1
0
1
–1
0
1
0
Valence
Arousal
Valence Valence Valence
1–1 0 1–1 0 1–1 0 1
Figure 9.1 Emotion annotations in the 2D valence-arousal emotion plane [272]
for four songs: (a) “Smells Like Teen Spirit” by Nirvana, (b) “A Whole New World”
by Peabo Bryson and Regina Belle, (c) “The Rose” by Janis Joplin, and (d) “Tell Laura
ILove Her” by Ritchie Valens. Each circle corresponds to a subject’s annotation
of the song. It can be observed that emotion perception is indeed subjective and
that different subjects’ annotations of a song constitute an “emotion distribution”
in the emotion plane. (Data from Y.-H. Yang, et al. Proc, ACM Int. Workshop on
Human-Centered Multimedia. 2007).
instead of a single point in the emotion plane. Modeling music emotion distribution
is central to the understanding of music emotion and the design of an emotion-based
music retrieval system, as the disparate emotion distribution is a natural consequence
of the interplay between musical and personal factors of emotion perception [100].
In addition, this probabilistic model provides a solid basis for personalized emotion-
based retrieval. An emotion distribution can be considered as a collection of users’
perceived emotions of a music piece, whereas the perceived emotion of a specific user
can be considered as a sample of the distribution. Based on the probabilistic model,
one can formulate MER and personalized emotion-based music retrieval under a
unified probabilistic framework.
Essential to this idea is the development of the computational model for pre-
dicting the emotion distribution of a music piece directly from music features. That
is, given a music signal (and its feature representation), the computational model
predicts its emotion mass at every point in the emotion plane, with the values summed
to one. Here the term emotion mass refers to the probability that a listener’s perceived
emotion of a music piece locates at a specific point. The formulation of MER as the
prediction of emotion mass calls for novel ways of generating the ground truth data,
training the machine learning model, and representing the result. Below we provide
the details of the methods that have been developed to tackle these issues.
9.2 Problem Formulation
We begin with the mathematical formulation of music emotion distribution predic-
tion. Given the feature representation x
s
of an input song d
s
, the goal is to predict
the probability of the perceived emotion of the song being e
ij
= [v
i
, a
j
]
, where

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required