Get at iTunes Music Store Metadata

Take a peek at the iTunes Music Store metadata and use the metadata for your own web applications.

Apple’s iTunes Music Store (iTMS) is more than just a place to buy DRM-restricted songs for $0.99 apiece; it is also a massive audio information repository. This searchable database contains loads of valuable metadata about each song track and album—song and album name, publication date, and record label, to name but a few—not to mention a free 30-second preview for each track and a thumbnail image of each CD cover. Of course, a major limitation is that to search this trove of information you need the iTunes application, which means you need to be sitting in front of a Mac OS X or Windows 2000/XP machine. Here are just some of the possible actions that this limitation precludes:

  • Browsing the iTunes Music Store from your cell phone

  • Querying the iTMS from Linux or (gasp!) Mac OS 9

  • Borrowing thumbnail images and preview clips for use in your own applications

  • Crosschecking iTunes tracks against RIAA Radar (http://www.magnetbox.com/riaa/) to avoid buying RIAA-owned tracks (see http://www.riaa.com)

  • Developing a full-blown iTMS client for your favorite platform

There are many other desirable actions like these, all singing the same refrain: it sure would be nice to be able to access the iTMS from anywhere.

Thankfully, there is a solution, and it’s called (appropriately enough) iTMS4-ALL.

iTMS-4-ALL

iTMS-4-ALL (http://hcsoftware.sourceforge.net/jason-rohrer/itms4all/; free) is a Perlbased CGI script that lets you search the iTMS from any web browser. In addition to being a useful search tool in its own right, the script serves as an example of how to interact with Apple’s Store server.

You can download and run iTMS-4-ALL on your own web server or just take it for a spin at http://itunes.punboy.net/cgi-bin/itms4all.pl. Figure 4-64 shows iTMS-4-ALL in action.

The iTMS Protocol

Before diving into the code for this hack, let’s examine the details of the iTMS protocol. How does your iTunes client communicate with Apple’s Music Store server? What kind of information is exchanged that might be useful to us? Here is what we know so far:

  • iTunes communicates with Apple almost exclusively through HTTP.

  • iTunes authentication (logging in so you can actually buy something) happens not through HTTP, but instead through HTTPS. For some reason, iTunes will not direct its HTTPS requests through a web proxy, even though other applications (such as Internet Explorer) will.

  • iTunes fetches gzipped (i.e., compressed using the GZIP format) XML files from Apple to lay out its GUI (to display the storefront, genre pages, and search results).

  • Every gzipped XML file is encrypted with AES-128 (Rijndael) in CBC mode. The CBC initialization vector is included as one of the HTTP headers (x-apple-crypto-iv). In other words, you essentially need two 128-bit strings to decrypt the XML: the first one (the initialization vector) is provided right in the HTTP response, while the second one (the AES key) is supposed to be a secret shared by Apple’s server and your iTunes client.

  • The secret AES key used by Apple and your iTunes client is 8a9dad399fb014c131be611820d78895. This secret key is used over and over, though a fresh initialization vector is selected for each communication. (Sean Kasun gleaned this key from the iTunes application).

iTMS-4-ALL in action

Figure 4-64. iTMS-4-ALL in action

Fetching information from Apple (for example, searching for “Xiu Xiu,” a flamboyant post-rock band) involves the following steps:

  1. iTunes sends the following HTTP (web) request to phobos.apple.com on port 80:

        GET /WebObjects/MZSearch.woa/wa/com.apple.jingle.search.DirectAction/ 
        search?term=Xiu%20Xiu HTTP/1.1 
        User-Agent: iTunes/4.2 (Macintosh; U; PPC Mac OS X 10.2) 
        Accept-Language: en-us, en;q=0.50 
        Cookie: countryVerified=1 
        Accept-Encoding: gzip, x-aes-cbc 
        Connection: close 
        Host: phobos.apple.com

    Tip

    The User-Agent header is important: Apple will not return information to non-iTunes agents.

  2. Apple responds with the following wodge of HTTP:

        HTTP/1.1 200 Apple
        Date: Fri, 16 Apr 2004 13:55:07 GMT
        Content-Length: 4320
        Content-Type: text/xml; charset=UTF-8
        Cache-Control: no-cache
        Connection: close
        Server: Apache/1.3.27 (Darwin)
        Pragma: no-cache
        content-encoding: gzip, x-aes-cbc
        x-apple-max-age: 3600
        x-apple-crypto-iv: 19953b75e9846ea59715be906cdca0c8
        x-apple-protocol-key: 2
        x-apple-asset-version: 2118
        x-apple-application-instance: 20
        Via: 1.1 netcache04 (NetCache NetApp/5.2.1R3)
    
        [-- encrypted gzip archive starts here --]
  3. iTunes then initializes an AES-128 CBC cipher with its key (8a9dad399fb014c131be611820d78895) and the initialization vector provided by x-apple-crypto-iv (19953b75e9846ea59715be906cdca0c8). iTunes decrypts the GZIP archive and then un-gzips it to get the raw XML. In other words, the decryption algorithm is initialized with two 128-bit strings (the AES key and the initialization vector) and then used to decode the encrypted data. After decryption, the data is still in GZIP-compressed form and needs to be decompressed before it can be used.

The full XML document for search results is too long to show here (one example is 72 KB of text when uncompressed). The XML includes lots of layout information, so Apple can change the way results are displayed to the user without upgrading the iTunes client. The dict entries near the end of the document contain information for each track matching your search. These entries are dictionaries (think about looking up something in the dictionary: you want a definition associated with a particular word) that map various key names to pieces of metadata. Here is an example dict entry:

    <dict>
    <key>kind</key><string>song</string>
    <key>artistName</key> <string>Xiu Xiu</string>
    <key>artistId</key><string>3208396</string>
    <key>bitRate</key><integer>128</integer>
    <key>buyParams</key><string>productType=S&salableAdamId=5390052&price=990*
        </string>
<key>price</key><integer>990</integer>
    <key>copyright</key><string>_ 2004 5 Rue Christine</string>
    <key>dateModified</key><date>2004-03-10T06:44:25Z</date>
    <key>discCount</key><integer>1</integer>
    <key>discNumber</key><integer>1</integer>
    <key>duration</key><integer>179164</integer>
    <key>explicit</key><integer>0</integer>
    <key>fileExtension</key><string>m4p</string>
    <key>genre</key><string>Alternative</string>
    <key>genreId</key><integer>20</integer>
    <key>playlistName</key><string>Fabulous Muscles</string>
    <key>playlistArtistName</key><string>Xiu Xiu</string>
    <key>playlistArtistId</key><integer>3208396</integer>
    <key>playlistId</key><string>5390070</string>
    <key>previewURL</key><string>http://a1535.phobos.apple.com/Music/y2004*
         /m02/d06/h14/s05.ojrmonwq.p.m4p</string>
    <key>previewLength</key><integer>30</integer>
    <key>relevance</key><string>1.0</string>
    <key>releaseDate</key><string>2004-02-17T08:00:00Z</string>
    <key>sampleRate</key><integer>44100</integer>
    <key>songId</key><integer>5390052</integer>
    <key>comments</key><string></string>
    <key>trackCount</key><integer>10</integer>
    <key>trackNumber</key><integer>2</integer>
    <key>songName</key><string>I Luv the Valley OH!</string>
    <key>vendorId</key><integer>1143</integer>
    <key>year</key><integer>2004</integer>
    </dict>

Just look at all that lovely metadata! The album name (Fabulous Muscles) is provided under the playlistName key, while the song name (I Luv the Valley OH!) is tagged with the songName key. Of particular interest is the previewURL, which in this case is http://a1535.phobos.apple.com/Music/ y2004/m02/d06/h14/s05.ojrmonwq.p.m4p; this URL can be fetched by any web browser (baked into iTunes or not) and played on most platforms (Mac, Windows, Unix, etc.) using VideoLAN’s VLC media player (http://www.videolan.org; free).

In addition to the metadata included in each dict entry, the search results also include CD cover thumbnails, which appear in the XML as URLs for JPEG files. In our example results, the cover JPEG for Fabulous Muscles, shown in Figure 4-64, has the URL http://a1.phobos.apple.com/Music/y2004/ m02/d06/h14/s05.kmxqqbbr.60x60-75.jpg. The current iTunes Music Store incarnation includes up to four thumbnails with each set of search results.

This is the protocol that iTunes uses to interact with the iTMS server, but how do you interact with the server sans iTunes? Here is where you get to start hacking.

The Code

With knowledge of the protocol in hand, you can now start writing code to fetch search results from Apple and access the XML-formatted metadata.

Searching the iTMS with wget.

wget is a command-line agent for grabbing data off the Web. In general, if you pass a URL to the wget command, wget will download the contents pointed to by the URL and save them to disk. wget is standard issue on most Unix-like platforms, including Mac OS X, and you can also download it for Windows platforms from various sources (try Googling for “wget for Windows”).

You can grab encrypted iTMS data from Apple yourself with wget, but you need to specify an iTunes User-Agent header to override wget’s default User-Agent header:

    $ wget http://phobos.apple.com/WebObjects/MZSearch.woa/wa/ *
       com.apple.jingle.search.DirectAction/search?term=Xiu%20Xiu -U *
       "iTunes/4.0 (Macintosh; U; PPC Mac OS X 10.2)"

Of course, the fetched file is encrypted with AES, as described above. Unfortunately, there are no standardissue tools for decrypting these files, so we need to resort to some relatively simple Perl code to go any further.

Cryptography programming in Perl.

To decrypt AES-128 CBC, you need two nonstandard Perl modules: Crypt::CBC and Crypt::Rijndael. Both modules can be downloaded from CPAN (http://www.cpan.org).

Tip

In case you are wondering, Rijndael is another name for AES, since the Rijndael algorithm was selected as the AES standard.

CBC.pm is pure Perl, but the Rijndael module must be compiled for your platform. Compilation instructions are included with the module package that you download from CPAN. Once installed, these modules can be included in your Perl program as follows:

    use Crypt::CBC;
    use Crypt::Rijndael;

You can get the encryption initialization vector (IV) for the x-apple-crypto-iv HTTP header, as described previously. Apple picks a fresh IV for each response, and you must use the IV included with a response to decrypt that response. Assume the IV is 19953b75e9846ea59715be906cdca0c8. You can set up variables for the key and IV as follows:

    my $iTunesKeyHex = "8a9dad399fb014c131be611820d78895";
    my $ivHex = "19953b75e9846ea59715be906cdca0c8";

The CBC module requires that both keys and IVs be in binary form, though we currently have them in hex-encoded form. We can pack our key and IV into binary form as follows:

    my $iTunesKeyBinary = pack( "H*", $iTunesKeyHex );
    my $ivBinary = pack( "H*", $ivHex );

Using these binary values, you can create a Rijndael CBC cipher as follows:

my $cipher = Crypt::CBC->new( { 'key' => $iTunesKeyBinary,
                                   'cipher' => 'Rijndael',
                                   'iv' => $ivBinary,
                                   'regenerate_key' => 0,
                                   'padding' => 'standard',
                                   'prepend_iv' => 0
                                   } );

You can think of this initialized cipher object as a black box that takes encrypted data as input an outputs decrypted data. Assuming that you have your encrypted GZIP data stored in a variable called $encryptedSearchResults, you can finally decrypt the results as follows:

    my $decryptedSearchResultsGZIP =
        $cipher->decrypt( $encryptedSearchResults );

Now, your results can be decompressed with GZIP, producing raw XML that you can peruse, parse, and otherwise enjoy.

iTMS-4-ALL.

iTMS-4-ALL is a Perl-based CGI script that pulls all of the aforementioned pieces together into a user-friendly package. The script can be installed on any web server that supports CGI and Perl and then accessed from any web browser. The user interface for searching the iTMS was shown earlier in Figure 4-64. If you want to explore the script right away, you can download the code from http://hcsoftware.sourceforge.net/ jason-rohrer/itms4all/. A live installation of the script is also available on that page, so you can search the iTMS from your browser without installing anything.

The HTML user interface generated by iTMS-4-ALL is basic by design: it works in all web browsers, including text-mode applications such as Lynx and the palmtop microbrowsers present on cell phones. Thus, iTMS-4-ALL not only unshackles iTMS searching from the officially supported iTunes platforms, it also enables searching away from the desktop. You can now browse the iTunes store while sitting on the bus.

Installing the script on your own web server is relatively painless. All necessary Perl modules are included with the download package, and a script is provided to compile the modules for your server’s platform. After running the compilation script, you need to copy the files into your web server’s cgi-bin directory. For example, if your server keeps CGI scripts in /httpd/cgi-bin, you would type:

	cp –r itms4all.pl Crypt IO auto /httpd/cgi-bin

Finally, you need to make sure that your web server has permission to execute your script. For most common server setups, you can grant permission with the following command:

	chmod o+x /httpd/cgi-bin/itms4all.pl

This command grants execution permission (x) to the other users (o), including your web server. Now you are ready to test the script. If your server had the address http://www.myserver.com, you could run the script by pointing your browser to http://www.myserver.com/cgi-bin/itms4all.pl.

Jason Rohrer

Get iPod and iTunes Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.