Now that we've seen how YouTube's API works, we'll now show how to download YouTube videos, first manually with your browser and then programmatically with code.
YouTube videos are stored in the Flash video (FLV) format, which is what we'll want to download.
Before we show how you can programmatically download a YouTube video, let's look at how you can download a video just by using your browser. To begin with, open any YouTube video page—say, the infamous Tay Zonday "Chocolate Rain" video, which is at http://www.youtube.com/watch?v=EwTZ2xpQwpA. From this URL, we can determine that the video ID (v=) is "EwTZ2xpQwpA".
YouTube has a (hidden) URL that you can use to directly download a video, but you need to pass in two things, a video_id
and a session token. A session token is an identifier that YouTube assigns to a browser that lasts for approximately 15 minutes. If you don't have a valid session token, YouTube will block your request to download a Flash video.
To see this in action, open your browser, go to http://www.youtube.com/get_video?video_id=EwTZ2xpQwpA, and notice how YouTube pretends the URL is a bad URL by sending back an HTTP 404 Not Found
error to Internet Explorer browsers (Firefox 3 users will see a blank page), as shown in Figure 4-7. You get a 404 error because you must append a valid session token to the URL for the request to work.
To get a valid session token from YouTube, we will have to open a browser to the YouTube page for "Chocolate Rain" at: http://www.youtube.com/watch?v=EwTZ2xpQwpA and click View Source to view the page's HTML contents. In the HTML, you'll find a JavaScript variable named swfArgs
that YouTube uses to pass in a number of parameters to the Flash player as shown in Example 4-19.
Example 4-19. The swfArgs Javascript variable needed to play a YouTube video
var swfArgs = {"usef": 0, "cust_p": "jMWn75PwKgutJQ0J3mrLbA", "iv_storage_ server": "http://www.google.com/reviews/y/", "ad_module": "http://s.ytimg.com/yt/ swf/ad-vfl59966.swf", "ad_channel_code": "invideo_overlay_480x70_cat10,afv_overlay", "video_id": "EwTZ2xpQwpA", "l": 292, "fmt_map": "34/0/9/0/115", "ad_host": "ca-host-pub-5311789755034317", "sk": "_N53QD2G0B79IwT2MIi7nNSvpkgWSWWwC", "invideo": true, "t": "OEgsToPDskJUt8xv3hrKiGOAYlLYcl1L", "hl": "en", "plid": "AARaIAIC2yUqgjl1AAAA-YT8YQA", "vq": null, "iv_module": "http://s.ytimg.com/yt/swf/ iv_module-vfl57703.swf", "afv": true, "ad_host_tier": "12789", "ad_video_pub_id": "ca-pub-6219811747049371", "tk": "5Eu9v6C5n2lxivxOt0UqRqT0yNnUmUCs3oR7ZzuDg8_ JZHd6DyO2jw=="};
Buried deep inside the swfArgs
variable, is the name/value pair "t": "OEgsTo PDskJUt8xv3hrKiGOAYlLYcl1L"
, which represents a valid session token that can be used to download the video. Now try the same download URL for which YouTube previously returned a 404 error, except this time we'll append the "t"
session token value as shown below:
http://www.youtube.com/get_video?video_id=EwTZ2xpQwpA&t=OEgsToPDskJUt8xv3hrKiGOAYlLYcl1L |
Note
The exact download URL will not work because, by the time you read this, the session token (t= part) would have expired. If you want to try this in your browser, you will first have to navigate to a YouTube video page, click view source in your browser and manually copy/paste the session token into the download URL. If you try to use a session token after the session has expired, you will receive a HTTP 410 Gone Error
saying the page does not exist. Session tokens are also unique to a video such that you cannot use the same session token to download a different video.
You should now be prompted to save the 11.3 MB FLV file to your hard drive as shown in Figure 4-8. Let's now see how we can do the same thing using code.
Now that we've shown how you can do it in a browser, we want to replicate the process of getting a token in code by reading the HTML from the video page, pulling the swfArgs
Javascript parameter, and then calling the download link with a valid session token.
Note
HTML parsing like this is brittle and subject to breaking when YouTube changes the way they do session tokens or if they rename their Javascript variables. Given that there is no API to download videos, developers building applications that depend on HTML internals of a website need to continually test and make sure a change to the site doesn't cause an application to break.
To do this in code, we've created a method in the Download
class called CreateTokenRequest.
When given an InnerTubeVideo
, it makes a WebClient
request to download the video page HTML as a string. It then parses out the swfArgs
Javascript variable and retrieves the "t:"
argument, just like we did manually with the browser.
The code in Example 4-20 and Example 4-21 retrieves the HTML from the web page using the WebClient DownloadString
method passing in the Link
(the URL for the YouTube video). Next, it gets the index of the swfArgs
variable (see Example 4-19) in the HTML string. Once we find the location of swfArgs
, we can retrieve the variable's value by finding the index of the bracket characters "{" and "}" which contain the value. Once we get the index of both of those characters, we then use the SubString
method to retrieve the value in between the brackets and load that into the variable fullString
.
Example 4-20. C# code to programmatically get a session token
private static string CreateTokenRequest(InnerTubeVideo video) { //YouTube variables const string jsVariable = "swfArgs"; const string argName = "t"; //get raw html from YouTube video page string rawHtml; using (WebClient wc = new WebClient()) { rawHtml = wc.DownloadString(video.Link); } //extract the Javascript name/value pairs int jsIndex = rawHtml.IndexOf(jsVariable); int startIndex = rawHtml.IndexOf("{", jsIndex); int endIndex = rawHtml.IndexOf("}", startIndex); string fullString = rawHtml.Substring(startIndex + 1, endIndex - startIndex - 1); //remove all quotes (") fullString = fullString.Replace("\"",""); //split all values string[] allArgs = fullString.Split(','); //loop through javascript parameters foreach (string swfArg in allArgs) { if (swfArg.Trim().StartsWith(argName)) { var nameValuePair = swfArg.Split(':'); return string.Format("{0}={1}", argName, nameValuePair[1].Trim()); } } throw new Exception(string.Format("token not found in swfArgs," + " make sure {0} is accessible", video.Link)); }
Example 4-21. Visual Basic code to programmatically get a session token
Private Shared Function CreateTokenRequest(ByVal video As InnerTubeVideo) As String 'YouTube variables Const jsVariable As String = "swfArgs" Const argName As String = "t" 'get raw html from YouTube video page Dim rawHtml As String Using wc As New WebClient() rawHtml = wc.DownloadString(video.Link) End Using 'extract the JavaScript name/value pairs Dim jsIndex As Integer = rawHtml.IndexOf(jsVariable) Dim startIndex As Integer = rawHtml.IndexOf("{", jsIndex) Dim endIndex As Integer = rawHtml.IndexOf("}", startIndex) Dim fullString As String = rawHtml.Substring(startIndex + 1, _ endIndex - startIndex - 1) 'remove all quotes (") fullString = fullString.Replace("""", "") 'split all values Dim allArgs() As String = fullString.Split(","c) 'loop through javascript parameters For Each swfArg As String In allArgs If swfArg.Trim().StartsWith(argName) Then Dim nameValuePair = swfArg.Split(":"c) Return String.Format("{0}={1}", argName, nameValuePair(1).Trim()) End If Next swfArg Throw New Exception(String.Format("token not found in swfArgs, " & _ "make sure {0} is accessible", video.Link)) End Function
Once the fullString
variable holds the value of swfArgs
, we'll need to do a little bit more work to parse the session token. To do that, we'll remove quotes from the string and split the contents of the variable (each argument is comma delimited). Finally, we'll loop through each argument looking for the "t"
argument (the session token). Once found, we return a string in the format "Name=Value"
, which, for this video, would return the following:
t=OEgsToPDskJUt8xv3hrKiGOAYlLYcl1L
|
Now that we've solved how to get the session token programmatically, we now need to download the file to our hard drive. To do this, we will use the DownloadVideo
method of the Download
class which is included in its entirety below in Example 4-22 and Example 4-23.
The first thing our code does is make sure we haven't downloaded the Flash video (FLV) before by checking if the file already exists.
Next, we build the download URL which, like in our browser example, is a combination of the download URL, the video ID, and the session token that we get from calling the CreateTokenRequest
from above.
We open a request to the file stream of the YouTube Flash video file on YouTube's server using the download URL we previously built. At the same time, we also create a FileStream
on our hard drive that we'll use to save the video file. The code will read 65K of data from the web stream of the video and write those bytes directly to the hard drive file stream. The code continues looping, reading 65K from the Web and writing it to the hard drive until we reach the end of the file, at which point calling ReadBytes
from the web stream will return 0
and we exit the loop.
Note
Both the WebStream
and the FileStream
are wrapped in using
statements that automatically handle the proper disposal of these resources in an automatically generated try/finally block. We do this to ensure that even if an exception is thrown, we will still clean up system resources properly.
Example 4-22. C# code for the DownloadVideo method
public static void DownloadVideo(InnerTubeVideo source, string destination) { if (!File.Exists(destination)) { UriBuilder final = new UriBuilder(source.DownloadLink); final.Query = "video_id=" + source.Id + "&" + CreateTokenRequest(source); WebRequest request = WebRequest.Create(final.ToString()); request.Timeout = 500000; try { WebResponse response = request.GetResponse(); using (Stream webStream = response.GetResponseStream()) { try { int _bufferSize = 65536; using (FileStream fs = File.Create(destination, _bufferSize)) { int readBytes = −1; byte[] inBuffer = new byte[_bufferSize]; //Loop until we hit the end while (readBytes != 0) { //read data from web into filebuffer, then write to file readBytes = webStream.Read(inBuffer, 0, _bufferSize); fs.Write(inBuffer, 0, readBytes); } } } catch (Exception ex) { Debug.WriteLine("Error in Buffer Download"); Debug.Indent(); Debug.WriteLine(ex.Message); } } } catch (Exception ex) { Debug.WriteLine("Error in request.GetResponse()"); Debug.Indent(); Debug.WriteLine(ex.Message); } } }
Example 4-23. Visual Basic code for the DownloadVideo method
Public Shared Sub DownloadVideo(ByVal source As InnerTubeVideo, ByVal destination As String) If (Not File.Exists(destination)) Then Dim final As New UriBuilder(source.DownloadLink) final.Query = "video_id=" & source.Id & "&" & CreateTokenRequest(source) Dim request As WebRequest = WebRequest.Create(final.ToString()) request.Timeout = 500000 Try Dim response As WebResponse = request.GetResponse() Using webStream As Stream = response.GetResponseStream() Try Dim _bufferSize As Integer = 65536 Using fs As FileStream = File.Create(destination, _bufferSize) Dim readBytes As Integer = −1 Dim inBuffer(_bufferSize - 1) As Byte 'Loop until we hit the end Do While readBytes <> 0 'read data from web into filebuffer, then write to file readBytes = webStream.Read(inBuffer, 0, _bufferSize) fs.Write(inBuffer, 0, readBytes) Loop End Using Catch ex As Exception Debug.WriteLine("Error in Buffer Download") Debug.Indent() Debug.WriteLine(ex.Message) End Try End Using Catch ex As Exception Debug.WriteLine("Error in request.GetResponse()") Debug.Indent() Debug.WriteLine(ex.Message) End Try End If End Sub
In addition to downloading a YouTube video, InnerTube also downloads the large thumbnail image, sized 425 pixels × 344 pixels, to use as the preview image before a video starts playing. The format of the large image thumbnail URL is http://img.you tube.com/vi/VideoID/0.jpg.
The code to download the image (Example 4-24 and Example 4-25) simply checks whether the file has already been downloaded, and if it hasn't, calls the DownloadFile
method, passing in the location for the source file to download and the destination \filename
.
Example 4-24. C# code to download a video's thumbnail image
public static void DownloadImage(InnerTubeVideo source, string destination) { //if we haven't downloaded the image yet, download it if (!File.Exists(destination)) { using (WebClient wc = new WebClient()) { wc.DownloadFile(new Uri(source.ThumbnailLink), destination); } } }
Example 4-25. Visual Basic code to download a video's thumbnail image
Public Shared Sub DownloadImage(ByVal source As InnerTubeVideo, ByVal destination As String) 'if we haven't downloaded the image yet, download it If (Not File.Exists(destination)) Then Using wc As New WebClient() wc.DownloadFile(New Uri(source.ThumbnailLink), destination) End Using End If End Sub
Get Coding4Fun now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.