O'Reilly logo

Music Emotion Recognition by Homer H. Chen, Yi-Hsuan Yang

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Saunder January 24, 2011 10:39 book
11
Chord Recognition and
Its Application to MER
Chord is one of the most important mid-level features of music. With chord se-
quence, songs that are similar in various aspects can be identified and retrieved more
effectively. This chapter describes the extraction of chord features from the chord
sequence of music pieces to improve MER. We begin with the introduction of a
chord recognition system for recognizing the chord sequence from low-level music
features and then describe two features that are computed from the chord sequence.
Empirical evaluation shows that these chord features improve the accuracy of valence
classification.
11.1 Chord Recognition
A chord is a set of harmonically related musical pitches (notes) that sounded almost
simultaneously, and the sequence of chords determines the harmonic progression
and the tonal structure of a song. Similar chord sequences can be observed in songs
that are close in genre or emotion. The use of chord for cover song detection [194]
and music segmentation [30] has also been shown effective. However, to obtain the
chord sequence of a song, an automatic chord recognition system needs to be built
because the chord sequence is hidden in the audio waveform.
In this section, we describe a chord recognition system that is based on the
N-gram model and the hidden Markov model (HMM) [144]. This system is con-
ceptually consistent with musical theory [30], effective, and more time-efficient than
conventional chord recognition systems. Its simplicity and time-efficiency is desirable
for practical applications.
187
Saunder January 24, 2011 10:39 book
188 Music Emotion Recognition
Input
audio
Beat
tracking
PCP feature
extraction
Music knowledge
or trained N-gram
Observation prob.
estimation
(acoustic model)
Transition prob.
estimation
(language model)
Chord template
x
Output chord
sequence
Figure 11.1 A schematic diagram of the chord recognition system.
Figure 11.1 shows a schematic diagram of the chord recognition system. In the
training phase, an N-gram model is trained based on ground-truth chord transcrip-
tions to learn the common rules of chord progression. For each segment of the input
audio in the testing phase, the chord with maximum likelihood is estimated using the
pretrained acoustic and language models. More details of the system are described
below.
11.1.1 Beat Tracking and PCP Extraction
For an input music piece, a beat tracking system called BeatRoot [76] is applied to
detect the beat times. The music piece is then segmented according to the beat times.
That is, each music segment is considered to have a consistent chord.
Each music segment is represented by the pitch class profile (PCP), which rep-
resents the frequency spectrum by 12 bins according to the 12 distinct semitones
(or chroma) of the musical octave (see Section 3.5 for more details of PCP). PCP
is commonly adopted in chord recognition systems because it contains information
of musical pitches (notes). To extract PCP, the algorithm described in [248] can be
adopted.
11.1.2 Hidden Markov Model and N-Gram Model
Chord recognition can be effectively modeled using the basic concepts in digital
speech processing. Inspired by the way humans recognize chords, the task of chord
recognition can be divided into the following two parts: acoustic modeling and
language modeling. In acoustic modeling, the hidden Markov model (HMM) is
employed to learn the relationship between PCP features and ground truth chord

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required