Saunder January 24, 2011 10:39 book
188 Music Emotion Recognition
or trained N-gram
Figure 11.1 A schematic diagram of the chord recognition system.
Figure 11.1 shows a schematic diagram of the chord recognition system. In the
training phase, an N-gram model is trained based on ground-truth chord transcrip-
tions to learn the common rules of chord progression. For each segment of the input
audio in the testing phase, the chord with maximum likelihood is estimated using the
pretrained acoustic and language models. More details of the system are described
11.1.1 Beat Tracking and PCP Extraction
For an input music piece, a beat tracking system called BeatRoot  is applied to
detect the beat times. The music piece is then segmented according to the beat times.
That is, each music segment is considered to have a consistent chord.
Each music segment is represented by the pitch class profile (PCP), which rep-
resents the frequency spectrum by 12 bins according to the 12 distinct semitones
(or chroma) of the musical octave (see Section 3.5 for more details of PCP). PCP
is commonly adopted in chord recognition systems because it contains information
of musical pitches (notes). To extract PCP, the algorithm described in  can be
11.1.2 Hidden Markov Model and N-Gram Model
Chord recognition can be effectively modeled using the basic concepts in digital
speech processing. Inspired by the way humans recognize chords, the task of chord
recognition can be divided into the following two parts: acoustic modeling and
language modeling. In acoustic modeling, the hidden Markov model (HMM) is
employed to learn the relationship between PCP features and ground truth chord