O'Reilly logo

Music Emotion Recognition by Homer H. Chen, Yi-Hsuan Yang

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Saunder January 24, 2011 10:39 book
3
Music Features
The experience of music listening is multidimensional. Different emotion per-
ceptions of music are usually associated with different patterns of acoustic cues
[126, 157, 175]. For example, while arousal is related to tempo (fast/slow), pitch
(high/low), loudness (high/low), and timbre (bright/soft), valence is related to mode
(major/minor) and harmony (consonant/dissonant) [101]. It is also noted that emo-
tion perception is rarely dependent on a single music factor but a combination of
them [126, 268]. For example, loud chords and high-pitched chords may suggest
more positive valence than soft chords and low-pitched chords, irrespective of mode.
See [101] for an overview of the empirical research concerning the influence of dif-
ferent music factors on emotion perception.
As summarized in Table 3.1, several features are extracted to represent the
following five perceptual dimensions of music listening: energy, rhythm, tempo-
ral, spectrum, and melody. Many of these features have been used for MER. This
chapter describes the semantic meanings of these features, how they are extracted,
and their relationship to music emotion. To better illustrate the relationship between
these features and emotion perception, we show the features of the following four
songs as running examples:
(a) Smells Like Teen Spirit by Nirvana Negative valence and high arousal
(quadrant II of 2DES).
(b) Are We the Waiting by Green Day Positive valence and high arousal
(quadrant I of 2DES).
(c) Mad World by Gary Jules Negative valence and low arousal (quadrant III
of 2DES).
(d) White Christmas by Lisa One Positive valence and low arousal (quadrant
IV of 2DES).
These songs are randomly selected from the data set #2 of Table 1.4.
35
Saunder January 24, 2011 10:39 book
36 Music Emotion Recognition
Table 3.1 Extracted Feature Sets
Feature
Set Extractor Features
Energy PsySound [44] Dynamic loudness
SDT [31] Audio power, total loudness, and specific loudness
sensation coefficients
Rhythm Marsyas [324] Beat histogram
MA toolbox [246], Rhythm pattern, rhythm histogram, and tempo
RP extractor [206]
MIR toolbox [182] Rhythm strength, rhythm regularity, rhythm clarity,
average onset frequency, and average tempo [217]
Temporal SDT Zero-crossings, temporal centroid, and log attack
time
Spectrum Marsyas, SDT Spectral centroid, spectral rolloff, spectral flux,
spectral flatness measures, and spectral crest
factors
MA toolbox, Mel-frequency cepstral coefficients
Marsyas, SDT
MATLAB Spectral contrast [152], Daubechies wavelets
coefficient histogram [205], tristimulus, even-harm,
and odd-harm [338]
MIR toolbox Roughness, irregularity, and inharmonicity
Harmony MIR toolbox Salient pitch, chromagram centroid, key clarity,
musical mode, and harmonic change
Marsyas Pitch histogram
PsySound Sawtooth waveform inspired pitch estimate [46]
3.1 Energy Features
The energy of a song is often highly correlated with the perception of arousal [101].
We can measure perceived loudness by the dynamic loudness model of Chalup-
per and Fastl [53] implemented in PsySound [44, 267], a computer program that
models parameters of auditory sensation based on some psychoacoustic models, such
as the Bark critical band [374] for modeling auditory filters in our ears, an auditory

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required