O'Reilly logo

Music Emotion Recognition by Homer H. Chen, Yi-Hsuan Yang

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Saunder January 24, 2011 10:39 book
8
Two-Layer
Personalization
In the previous chapter, we have shown that personalization is an effective way to
deal with the subjectivity issue of MER. In this chapter, we move on to discuss more
advanced methods that are useful for addressing the subjectivity issue, including
the bag-of-users model and the residual modeling. Based on these two methods,
a two-layer personalization scheme is developed. A first-layer regressor is trained
to predict the general perception of a music piece, and a second-layer regressor
is trained to predict the difference between the general perception and a user’s
individual perception. This two-layer personalization scheme is more effective than
the single-layer one (that is, personalized MER) because the music content and the
individuality of the user are treated separately.
8.1 Problem Formulation
One can differentiate the following two dimensional MER problems:
General prediction: Given a music piece d
i
, predict the emotion value y
i
generally perceived by every user. This is what the general regressor described
in Chapter 7 aims to predict.
Personalized prediction: Given a music piece d
i
and a user u
j
, predict
the emotion value y
ij
perceived by the user. This is what the personalized
regressor described in Chapter 7 aims to predict.
Following Chapters 4 and 7, this chapter also formulates MER as a regression prob-
lem. Given N inputs (x
i
, y
i
), 1 i N, where x
i
is a feature vector of the i-th
135
Saunder January 24, 2011 10:39 book
136 Music Emotion Recognition
music piece d
i
, and y
i
[1, 1] is the emotion value obtained by averaging the
annotations of subjects, a general regressor r (·)istrained by minimizing the squared
error between y
i
and
ˆ
y
i
, where
ˆ
y
i
= r (x
i
)isthe prediction result for d
i
.Wealso
use this method as the baseline in this chapter. On the other hand, according to the
personalized MER (PMER) scheme described in Chapter 7, a personalized regressor
r
j
(·)istrained by minimizing the squared error between y
ij
and r
j
(x
i
).
The baseline method and the PMER scheme described above, however, do not
make the best use of available data for model training. Below we describe two more ad-
vanced methods that were developed in [361] to improve the performance of general
prediction and personalized prediction, respectively. The first method, bag-of-users
model (BoU), improves general prediction by better utilizing the annotations col-
lected from subjective tests. The second method, a two-layer personalization scheme,
improves personalized prediction by modeling music content and the individuality
of user in two stages. As shown in Section 8.4, BoU and the two-layer personalization
scheme significantly outperform the baseline method and the single-layer scheme
(i.e., PMER), respectively.
8.2 Bag-of-Users Model
The ground truth data needed for training a general regressor is typically obtained
by averaging the opinions of subjects. This procedure, however, makes little use of
the individual annotations assigned by each subject, which provide abundant cues
of the affective content of a music piece. Figure 7.1 illustrates that simply averaging
annotations loses the information that for some music pieces the perceived emotion
values are fairly sparse.
The bag-of-users (BoU) model trains a regressor r
j
(·) for each subject u
j
using
his/her annotations and obtains a bag of models {r
1
(·), r
2
(·), ..., r
U
(·)}, where U
denotes the number of subjects. The BoU model then aggregates the models using a
super regression model to make a general prediction. Let
ˆ
y
i
= [
ˆ
y
i1
,
ˆ
y
i2
, ...,
ˆ
y
iU
]
denote a vector of the prediction results for d
i
, where
ˆ
y
ij
= r
j
(x
i
). The super model
r
(·)istrained by minimizing the error between y
i
and r
(
ˆ
y
i
). The estimate r
(
ˆ
y
i
)
can be regarded as the aggregation of the opinions of the U subjects.
The strength of BoU is that it is able to assign different weights to different
subjects through the super model r
(·). A lower weight would be assigned to a
subject u
j
(more accurately, the corresponding regressor r
j
(·)) whose annotations
are considered less reliable or less consistent, thereby removing the effect of outliers
to the ground truth data.
Tables 8.1 and 8.2 show example MATLAB
codes for the baseline method and
the BoU model, respectively. It can be found that the two methods differ in the way
user annotations are utilized and the way regressor models are trained. Note that
both methods are designed for general prediction, not personalized prediction.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required