Saunder January 24, 2011 10:39 book
Annotation and Model
Dimensional MER involves an emotion annotation process that is more labor costly
than that of its categorical counterpart. Subjects need to determine the numerical
valence and arousal (VA) values of music pieces rather than assign emotion labels
to them. The heavy cognitive load of emotion annotation impedes the collection of
large-scale ground truth annotations and also harms the reliability of the annotations.
Because the generality of the training instances (which is related to the size of the
training data set) and the quality of ground truth annotations are essential to the
performance of a machine learning model, reducing the effort of emotion annotation
plays a key role in the progress of dimensional MER. This chapter provides the details
of a ranking approach that resolves this issue.
To collect the ground truth data for dimensional MER, the subjects are often asked
to rate the emotion values of music pieces in a continuum [234, 282, 364, 366].
Performing such an emotion rating, however, is a heavy cognitive load to the sub-
jects . Low-motivated subjects may give largely uniform ratings and thereby
understate the differences of emotion values among songs . Moreover, it is
unclear whether the distance between two values rated 0.7 and 0.9 is the same as
the distance between two other values rated 0.2 and 0.4 in a subject’s mind .
Consequently, the quality of the ground truth data can vary a lot, which in turn
deteriorates the accuracy of emotion recognition.
Saunder January 24, 2011 10:39 book
82 Music Emotion Recognition
To overcome this difficulty, the ranking approach to MER developed in 
determines the coordinates of a music piece in the 2D emotion plane by the relative
emotion of the song with respect to other songs instead of directly computing the
exact emotion values of the song. Specifically, this approach utilizes a machine
learning algorithm called learning-to-rank algorithm to train two computational
models that rank a collection of music pieces by their valence or arousal values. Two
computational models are trained for valence and arousal, respectively. The ranking
order of music pieces is then mapped to the valence or arousal values. The music
pieces that are ranked topmost are assigned with the maximal valence or arousal
values, and vice versa. In this way, a dimensional visualization of the music pieces in
the emotion plane can also be generated.
The advantage of this approach is twofold. First, because the model training pro-
cess of this approach requires ranking music pieces only by emotion, the annotation
process of MER is greatly simplified. The subjects only have to rank (e.g., by making
pairwise comparisons) the music pieces, which intuitively is a much easier task than
the one that requires the determination of exact emotion values. It has been found
that the ranking approach works remarkably better than the conventional rating
approach in practice. By relieving the cognitive load on the subjects, the reliability
of the ground truth is also enhanced.
Second, due to the semantic gap between the object feature level and the hu-
man cognitive level of emotion perception, it is difficult to accurately compute the
emotion values . Because machine learning algorithms that minimize the mean
squared error (MSE) between the ground truth and the estimates tend to make
conservative estimates when the computational model is inaccurate, the regression
approach described in the previous chapter suffers a reduction of the coverage of the
emotion plane. For example, it can be observed from Figure 4.5 that the range of the
estimates is smaller than that of the ground truth. The ranking approach is free of this
issue because songs associated with topmost/lowermost rankings are assigned with
the highest/lowest emotion values, producing a full coverage of the emotion plane.
Below we first describe the ranking-based annotation method that is employed to
replace conventional rating-based methods and then the learning-to-rank algorithms
that better exploit the ranking-based annotations.
5.2 Ranking-Based Emotion Annotation
The basic idea of ranking-based emotion annotation is to ask the subjects to rank
music pieces by emotion rather than rate the exact emotion values. This approach
greatly reduces the cognitive load of emotion annotation because ranking is generally
easier to perform than rating. Intuitively, emotion ranking can be accomplished by
asking subjects to determine the straight order of a number of music pieces. However,
this would be a lengthy process since determining the straight order of n music pieces
requires n(n − 1)/2 comparisons.