Downloading Videos from YouTube

Now that we've seen how YouTube's API works, we'll now show how to download YouTube videos, first manually with your browser and then programmatically with code.

Downloading a YouTube Video with Your Browser

YouTube videos are stored in the Flash video (FLV) format, which is what we'll want to download.

Before we show how you can programmatically download a YouTube video, let's look at how you can download a video just by using your browser. To begin with, open any YouTube video page—say, the infamous Tay Zonday "Chocolate Rain" video, which is at http://www.youtube.com/watch?v=EwTZ2xpQwpA. From this URL, we can determine that the video ID (v=) is "EwTZ2xpQwpA".

YouTube Blocks You (by Design)

YouTube has a (hidden) URL that you can use to directly download a video, but you need to pass in two things, a video_id and a session token. A session token is an identifier that YouTube assigns to a browser that lasts for approximately 15 minutes. If you don't have a valid session token, YouTube will block your request to download a Flash video.

To see this in action, open your browser, go to http://www.youtube.com/get_video?video_id=EwTZ2xpQwpA, and notice how YouTube pretends the URL is a bad URL by sending back an HTTP 404 Not Found error to Internet Explorer browsers (Firefox 3 users will see a blank page), as shown in Figure 4-7. You get a 404 error because you must append a valid session token to the URL for the request to work.

YouTube will return a 404 error in Internet Explorer unless you provide a session token

Figure 4-7. YouTube will return a 404 error in Internet Explorer unless you provide a session token

Getting a Session Token from JavaScript

To get a valid session token from YouTube, we will have to open a browser to the YouTube page for "Chocolate Rain" at: http://www.youtube.com/watch?v=EwTZ2xpQwpA and click View Source to view the page's HTML contents. In the HTML, you'll find a JavaScript variable named swfArgs that YouTube uses to pass in a number of parameters to the Flash player as shown in Example 4-19.

Example 4-19. The swfArgs Javascript variable needed to play a YouTube video

var swfArgs = {"usef": 0, "cust_p": "jMWn75PwKgutJQ0J3mrLbA", "iv_storage_
server": "http://www.google.com/reviews/y/", "ad_module": "http://s.ytimg.com/yt/
swf/ad-vfl59966.swf", "ad_channel_code": "invideo_overlay_480x70_cat10,afv_overlay",
"video_id": "EwTZ2xpQwpA", "l": 292, "fmt_map": "34/0/9/0/115", "ad_host":
"ca-host-pub-5311789755034317", "sk": "_N53QD2G0B79IwT2MIi7nNSvpkgWSWWwC",
"invideo": true, "t": "OEgsToPDskJUt8xv3hrKiGOAYlLYcl1L", "hl": "en", "plid":
"AARaIAIC2yUqgjl1AAAA-YT8YQA", "vq": null, "iv_module": "http://s.ytimg.com/yt/swf/
iv_module-vfl57703.swf", "afv": true, "ad_host_tier": "12789", "ad_video_pub_id":
"ca-pub-6219811747049371", "tk": "5Eu9v6C5n2lxivxOt0UqRqT0yNnUmUCs3oR7ZzuDg8_
JZHd6DyO2jw=="};

Houston, We Have a Token!

Buried deep inside the swfArgs variable, is the name/value pair "t": "OEgsTo PDskJUt8xv3hrKiGOAYlLYcl1L", which represents a valid session token that can be used to download the video. Now try the same download URL for which YouTube previously returned a 404 error, except this time we'll append the "t" session token value as shown below:

http://www.youtube.com/get_video?video_id=EwTZ2xpQwpA&t=OEgsToPDskJUt8xv3hrKiGOAYlLYcl1L

Note

The exact download URL will not work because, by the time you read this, the session token (t= part) would have expired. If you want to try this in your browser, you will first have to navigate to a YouTube video page, click view source in your browser and manually copy/paste the session token into the download URL. If you try to use a session token after the session has expired, you will receive a HTTP 410 Gone Error saying the page does not exist. Session tokens are also unique to a video such that you cannot use the same session token to download a different video.

You should now be prompted to save the 11.3 MB FLV file to your hard drive as shown in Figure 4-8. Let's now see how we can do the same thing using code.

Prompt to save a YouTube video to your PC

Figure 4-8. Prompt to save a YouTube video to your PC

Getting a Token Programmatically

Now that we've shown how you can do it in a browser, we want to replicate the process of getting a token in code by reading the HTML from the video page, pulling the swfArgs Javascript parameter, and then calling the download link with a valid session token.

Note

HTML parsing like this is brittle and subject to breaking when YouTube changes the way they do session tokens or if they rename their Javascript variables. Given that there is no API to download videos, developers building applications that depend on HTML internals of a website need to continually test and make sure a change to the site doesn't cause an application to break.

To do this in code, we've created a method in the Download class called CreateTokenRequest. When given an InnerTubeVideo, it makes a WebClient request to download the video page HTML as a string. It then parses out the swfArgs Javascript variable and retrieves the "t:" argument, just like we did manually with the browser.

The code in Example 4-20 and Example 4-21 retrieves the HTML from the web page using the WebClient DownloadString method passing in the Link (the URL for the YouTube video). Next, it gets the index of the swfArgs variable (see Example 4-19) in the HTML string. Once we find the location of swfArgs, we can retrieve the variable's value by finding the index of the bracket characters "{" and "}" which contain the value. Once we get the index of both of those characters, we then use the SubString method to retrieve the value in between the brackets and load that into the variable fullString.

Example 4-20.  C# code to programmatically get a session token

private static string CreateTokenRequest(InnerTubeVideo video)
{
  //YouTube variables
  const string jsVariable = "swfArgs";
  const string argName = "t";

  //get raw html from YouTube video page
  string rawHtml;
  using (WebClient wc = new WebClient())
  {
    rawHtml = wc.DownloadString(video.Link);
  }

  //extract the Javascript name/value pairs
  int jsIndex = rawHtml.IndexOf(jsVariable);
  int startIndex = rawHtml.IndexOf("{", jsIndex);
  int endIndex = rawHtml.IndexOf("}", startIndex);
  string fullString = rawHtml.Substring(startIndex + 1, endIndex - startIndex - 1);

  //remove all quotes (")
  fullString = fullString.Replace("\"","");

  //split all values
  string[] allArgs = fullString.Split(',');

  //loop through javascript parameters
  foreach (string swfArg in allArgs)
  {
    if (swfArg.Trim().StartsWith(argName))
    {
      var nameValuePair = swfArg.Split(':');
      return string.Format("{0}={1}", argName, nameValuePair[1].Trim());
    }
  }
  throw new Exception(string.Format("token not found in swfArgs," +
    " make sure {0} is accessible", video.Link));
}

Example 4-21.  Visual Basic code to programmatically get a session token

Private Shared Function CreateTokenRequest(ByVal video As InnerTubeVideo) As String
  'YouTube variables
  Const jsVariable As String = "swfArgs"
  Const argName As String = "t"

  'get raw html from YouTube video page
  Dim rawHtml As String
  Using wc As New WebClient()
    rawHtml = wc.DownloadString(video.Link)
  End Using

  'extract the JavaScript name/value pairs
  Dim jsIndex As Integer = rawHtml.IndexOf(jsVariable)
  Dim startIndex As Integer = rawHtml.IndexOf("{", jsIndex)
  Dim endIndex As Integer = rawHtml.IndexOf("}", startIndex)
  Dim fullString As String = rawHtml.Substring(startIndex + 1, _
    endIndex - startIndex - 1)

  'remove all quotes (")
  fullString = fullString.Replace("""", "")

  'split all values
  Dim allArgs() As String = fullString.Split(","c)

  'loop through javascript parameters
  For Each swfArg As String In allArgs
    If swfArg.Trim().StartsWith(argName) Then
      Dim nameValuePair = swfArg.Split(":"c)
      Return String.Format("{0}={1}", argName, nameValuePair(1).Trim())
    End If
  Next swfArg
  Throw New Exception(String.Format("token not found in swfArgs, " & _
    "make sure {0} is accessible", video.Link))
End Function

Once the fullString variable holds the value of swfArgs, we'll need to do a little bit more work to parse the session token. To do that, we'll remove quotes from the string and split the contents of the variable (each argument is comma delimited). Finally, we'll loop through each argument looking for the "t" argument (the session token). Once found, we return a string in the format "Name=Value", which, for this video, would return the following:

t=OEgsToPDskJUt8xv3hrKiGOAYlLYcl1L

Downloading the Video

Now that we've solved how to get the session token programmatically, we now need to download the file to our hard drive. To do this, we will use the DownloadVideo method of the Download class which is included in its entirety below in Example 4-22 and Example 4-23.

The first thing our code does is make sure we haven't downloaded the Flash video (FLV) before by checking if the file already exists.

Next, we build the download URL which, like in our browser example, is a combination of the download URL, the video ID, and the session token that we get from calling the CreateTokenRequest from above.

We open a request to the file stream of the YouTube Flash video file on YouTube's server using the download URL we previously built. At the same time, we also create a FileStream on our hard drive that we'll use to save the video file. The code will read 65K of data from the web stream of the video and write those bytes directly to the hard drive file stream. The code continues looping, reading 65K from the Web and writing it to the hard drive until we reach the end of the file, at which point calling ReadBytes from the web stream will return 0 and we exit the loop.

Note

Both the WebStream and the FileStream are wrapped in using statements that automatically handle the proper disposal of these resources in an automatically generated try/finally block. We do this to ensure that even if an exception is thrown, we will still clean up system resources properly.

Example 4-22.  C# code for the DownloadVideo method

public static void DownloadVideo(InnerTubeVideo source, string destination)
{
  if (!File.Exists(destination))
  {
    UriBuilder final = new UriBuilder(source.DownloadLink);
    final.Query = "video_id=" + source.Id + "&" + CreateTokenRequest(source);

    WebRequest request = WebRequest.Create(final.ToString());
    request.Timeout = 500000;

    try
    {
      WebResponse response = request.GetResponse();

      using (Stream webStream = response.GetResponseStream())
      {
        try
        {
          int _bufferSize = 65536;

          using (FileStream fs = File.Create(destination, _bufferSize))
          {
            int readBytes = −1;
            byte[] inBuffer = new byte[_bufferSize];

            //Loop until we hit the end
            while (readBytes != 0)
            {
              //read data from web into filebuffer, then write to file
              readBytes = webStream.Read(inBuffer, 0, _bufferSize);
              fs.Write(inBuffer, 0, readBytes);
            }
          }
        }
        catch (Exception ex)
        {
          Debug.WriteLine("Error in Buffer Download");
          Debug.Indent();
          Debug.WriteLine(ex.Message);
        }
      }
    }
    catch (Exception ex)
    {
      Debug.WriteLine("Error in request.GetResponse()");
      Debug.Indent();
      Debug.WriteLine(ex.Message);
    }
  }
}

Example 4-23.  Visual Basic code for the DownloadVideo method

Public Shared Sub DownloadVideo(ByVal source As InnerTubeVideo, ByVal destination As String)
  If (Not File.Exists(destination)) Then
    Dim final As New UriBuilder(source.DownloadLink)
    final.Query = "video_id=" & source.Id & "&" & CreateTokenRequest(source)

    Dim request As WebRequest = WebRequest.Create(final.ToString())
    request.Timeout = 500000

    Try
      Dim response As WebResponse = request.GetResponse()

      Using webStream As Stream = response.GetResponseStream()
        Try
          Dim _bufferSize As Integer = 65536

          Using fs As FileStream = File.Create(destination, _bufferSize)
            Dim readBytes As Integer = −1
            Dim inBuffer(_bufferSize - 1) As Byte

            'Loop until we hit the end
            Do While readBytes <> 0
              'read data from web into filebuffer, then write to file
              readBytes = webStream.Read(inBuffer, 0, _bufferSize)
              fs.Write(inBuffer, 0, readBytes)
            Loop
          End Using
          Catch ex As Exception
            Debug.WriteLine("Error in Buffer Download")
            Debug.Indent()
            Debug.WriteLine(ex.Message)
        End Try
      End Using
    Catch ex As Exception
      Debug.WriteLine("Error in request.GetResponse()")
      Debug.Indent()
      Debug.WriteLine(ex.Message)
    End Try
  End If
 End Sub

Downloading a Video's Thumbnail Image

In addition to downloading a YouTube video, InnerTube also downloads the large thumbnail image, sized 425 pixels × 344 pixels, to use as the preview image before a video starts playing. The format of the large image thumbnail URL is http://img.you tube.com/vi/VideoID/0.jpg.

The code to download the image (Example 4-24 and Example 4-25) simply checks whether the file has already been downloaded, and if it hasn't, calls the DownloadFile method, passing in the location for the source file to download and the destination \filename.

Example 4-24.  C# code to download a video's thumbnail image

public static void DownloadImage(InnerTubeVideo source, string destination)
{
    //if we haven't downloaded the image yet, download it
    if (!File.Exists(destination))
    {
        using (WebClient wc = new WebClient())
        {
            wc.DownloadFile(new Uri(source.ThumbnailLink), destination);
        }
    }
}

Example 4-25.  Visual Basic code to download a video's thumbnail image

Public Shared Sub DownloadImage(ByVal source As InnerTubeVideo, ByVal destination As String)
  'if we haven't downloaded the image yet, download it
  If (Not File.Exists(destination)) Then
    Using wc As New WebClient()
      wc.DownloadFile(New Uri(source.ThumbnailLink), destination)
    End Using
  End If
 End Sub

Get Coding4Fun now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.