One of the core tasks of Node.js is to act as a web server. This is such a key part of the system that when Ryan Dahl started the project, he rewrote the HTTP stack for V8 to make it nonblocking. Although both the API and the internals for the original HTTP implementation have morphed a lot since it was created, the core activities are still the same. The Node implementation of HTTP is nonblocking and fast. Much of the code has moved from C into JavaScript.
HTTP uses a pattern that is common in Node.
Pseudoclass factories provide an easy way to create a new server.[7] The http.createServer()
method provides us with a new instance of the HTTP
Server
class, which is the class we use to define the
actions taken when Node receives incoming HTTP requests. There are a few
other main pieces of the HTTP module and other Node modules in general.
These are the events the Server
class
fires and the data structures that are passed to the callbacks. Knowing
about these three types of class allows you to use the HTTP module
well.
Acting as an HTTP server is probably the most common current use case for Node. In Chapter 1, we set up an HTTP server and used it to serve a very simple request. However, HTTP is a lot more multifaceted than that. The server component of the HTTP module provides the raw tools to build complex and comprehensive web servers. In this chapter, we are going to explore the mechanics of dealing with requests and issuing responses. Even if you end up using a higher-level server such as Express, many of the concepts it uses are extensions of those defined here.
As we’ve already seen, the first step in
using HTTP servers is to create a new server using the http.createServer()
method. This returns a new instance of the Server
class, which
has only a few methods because most of the functionality is going to be
provided through using events. The http
server class has six events and three
methods. The other thing to notice is how most of the methods are used
to initialize the server, whereas events are used during its
operation.
Let’s start by creating the smallest basic HTTP server code we can in Example 4-7.
This example is not
good code. However, it illustrates some important points. We’ll fix the
style shortly. The first thing we do is require
the http
module. Notice how we can chain methods
to access the module without first assigning it to a variable. Many
things in Node return a function,[8] which allows us to invoke those functions immediately.
From the included http
module, we
call createServer
. This doesn’t have
to take any arguments, but we pass it a function to attach to the
request
event. Finally, we tell the
server created with createServer
to
listen
on port 8125.
We hope you never write code like this in real situations, but it does show the flexibility of the syntax and the potential brevity of the language. Let’s be a lot more explicit about our code. The rewrite in Example 4-8 should make it a lot easier to understand and maintain.
This example implements the minimal web
server again. However, we’ve started assigning things to named
variables. This not only makes the code easier to read than when it’s
chained, but also means you can reuse it. For example, it’s not uncommon
to use http
more than once in a file.
You want to have both an HTTP server and an HTTP client, so reusing the
module object is really helpful. Even though JavaScript doesn’t force
you to think about memory, that doesn’t mean you should thoughtlessly
litter unnecessary objects everywhere. So rather than use an anonymous
callback, we’ve named the function that handles the request
event. This is less about memory usage
and more about readability. We’re not saying you shouldn’t use anonymous
functions, but if you can lay out your code so it’s easy to find, that
helps a lot when maintaining it.
Note
Remember to look at Part I of the book for more help with programming style. Chapters 1 and 2 deal with programming style in particular.
Because we didn’t pass the request
event listener as part of the factory
method for the http Server
object, we
need to add an event listener explicitly. Calling the on
method from EventEmitter
does this. Finally, as with the
previous example, we call the listen
method with the port we want to
listen on. The http
class provides
other functions, but this example illustrates the most important
ones.
The http
server supports a number of events, which are associated with either the
TCP or HTTP connection to the client. The connection
and close
events indicate the buildup or teardown of a TCP connection to a
client. It’s important to remember that some clients will be using HTTP
1.1, which supports keepalive. This means that their TCP connections may
remain open across multiple HTTP requests.
The request
, checkContinue
, upgrade
, and clientError
events are associated with HTTP requests. We’ve already used the
request
event, which signals a new
HTTP request.
The checkContinue
event indicates a special event.
It allows you to take more direct control of an HTTP request in which
the client streams chunks of data to the server. As the client sends
data to the server, it will check whether it can continue, at which
point this event will fire. If an event handler is created for this
event, the request
event will
not be emitted.
The upgrade
event is emitted when a client asks
for a protocol upgrade. The http
server will deny HTTP upgrade requests unless there is an event handler
for this event.
Finally, the clientError
event passes on any error events
sent by the client.
The HTTP server can throw a few events. The
most common one is request
, but you
can also get events associated with the TCP
connection for the request as well as
other parts of the request life cycle.
When a new TCP stream is created for a
request, a connection
event is
emitted. This event passes the TCP stream for the request as a
parameter. The stream is also available as a request.connection
variable for each request
that happens through it. However, only one connection
event will be emitted for each
stream. This means that many request
s
can happen from a client with only one connection
event.
Node is also great when you want to make
outgoing HTTP connections. This is useful in many contexts, such as
using web services, connecting to document store databases, or just
scraping websites. You can use the same http
module when doing HTTP requests, but
should use the http.ClientRequest
class. There are two factory methods for this class: a
general-purpose one and a convenience method. Let’s take a look at the
general-purpose case in Example 4-9.
The first thing you can see is that an
options
object defines a lot of the
functionality of the request. We must provide the host
name (although an IP address is also
acceptable), the port
, and the
path
. The method
is optional and defaults to a value of
GET
if none is specified. In essence,
the example is specifying that the request should be an HTTP GET
request to http://www.google.com/
on port 80
.
The next thing
we do is use the options
object to
construct an instance of http.
Client
Request
using
the factory method http.request()
.
This method takes an options
object and an optional callback argument. The passed callback listens to
the response
event, and when a response
event is received, we can process the results of the request. In the
previous example, we simply output the response object to the console.
However, it’s important to notice that the body of the HTTP request is
actually received via a stream in the response
object. Thus, you can subscribe to
the data
event of the response
object to get the data as it becomes
available (see the section Readable streams for more
information).
The final important point to notice is that
we had to end()
the request
. Because this was a GET
request, we didn’t write any data to the
server, but for other HTTP
methods,
such as PUT
or POST
, you may need to. Until we call the
end()
method, request
won’t initiate the HTTP
request, because it doesn’t know whether
it should still be waiting for us to send data.
Since GET
is such a common HTTP use case, there is a special factory method to support it in
a more convenient way, as shown in Example 4-10.
This example of http.get()
does exactly the same thing as
the previous example, but it’s slightly more concise. We’ve lost the
method
attribute of the config
object, and left out the call request.end()
because it’s implied.
If you run the previous two examples, you
are going to get back raw Buffer
objects. As described later in this chapter, a Buffer
is a special
class defined in Node to support the storage of arbitrary, binary
data. Although it’s certainly possible to work with these, you often
want a specific encoding, such as UTF-8 (an encoding for Unicode
characters). You can specify this with the response.setEncoding()
method (see Example 4-11).
Example 4-11. Comparing raw Buffer output to output with a specified encoding
> var http = require('http'); > var req = http.get({host:'www.google.com', port:80, path:'/'}, function(res) { ... console.log(res); ... res.on('data', function(c) { console.log(c); }); ... }); > <Buffer 3c 21 64 6f 63 74 79 70 ... 65 2e 73 74> <Buffer 61 72 74 54 69 ... 69 70 74 3e> > > var req = http.get({host:'www.google.com', port:80, path:'/'}, function(res) { ... res.setEncoding('utf8'); ... res.on('data', function(c) { console.log(c); }); ... }); > <!doctype html><html><head><meta http-equiv="content-type ... load.t.prt=(f=(new Date).getTime()); })(); </script> >
In the first case, we do not pass ClientResponse.setEncoding()
, and we get
chunks of data in Buffer
s. Although
the output is abridged in the printout, you can see that it isn’t just
a single Buffer
, but that several
Buffer
s have been returned with
data. In the second example, the data is returned as UTF-8 because we
specified res.setEncoding('utf8')
.
The chunks of data returned from the server are still the same, but
are given to the program as string
s
in the correct encoding rather than as raw Buffer
s. Although the printout may not make
this clear, there is one string
for
each of the original Buffer
s.
Not all HTTP is GET
. You might also need to call POST
,
PUT
, and other HTTP
methods that alter data on the other
end. This is functionally the same as making a GET
request, except you are going to write
some data upstream, as shown in Example 4-12.
This example
is very similar to Example 4-10, but uses
the http.ClientRequest
.write()
method. This
method allows you to send data upstream, and as explained earlier, it
requires you to explicitly call http.ClientRequest.end()
to indicate
you’re finished sending data. Whenever ClientRequest.write()
is called, the data is
sent upstream (it isn’t buffered), but the server will not respond
until ClientRequest.end()
is
called.
You can stream data to a server using
ClientRequest.write()
by coupling
the writes to the data
event of a
Stream
. This is ideal if you need
to, for example, send a file from disk to a remote server over
HTTP.
The ClientResponse
object stores a variety of information about the request. In general,
it is pretty intuitive. Some of its obvious properties that are often
useful include statusCode
(which contains the HTTP
status) and header
(which is
the response header object). Also hung off of ClientResponse
are various streams and
properties that you may or may not want to interact with directly.
The URL
module provides tools for easily parsing and dealing with URL
strings. It’s extremely useful when you have to deal with URLs. The
module offers three methods: parse
,
format
, and resolve
. Let’s start by looking at Example 4-13,
which demonstrates parse
using Node REPL.
Example 4-13. Parsing a URL using the URL module
> var URL = require('url'); > var myUrl = "http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome #alsoahash"; > myUrl 'http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome#alsoahash' > parsedUrl = URL.parse(myUrl); { href: 'http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome#alsoahash' , protocol: 'http:' , slashes: true , host: 'www.nodejs.org' , hostname: 'www.nodejs.org' , hash: '#alsoahash' , search: '?with=query¶m=that&are=awesome' , query: 'with=query¶m=that&are=awesome' , pathname: '/some/url/' } > parsedUrl = URL.parse(myUrl, true); { href: 'http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome#alsoahash' , protocol: 'http:' , slashes: true , host: 'www.nodejs.org' , hostname: 'www.nodejs.org' , hash: '#alsoahash' , search: '?with=query¶m=that&are=awesome' , query: { with: 'query' , param: 'that' , are: 'awesome' }, pathname: '/some/url/' } >
The first thing we do, of course, is require
the URL
module. Note that the names
of modules are always lowercase. We’ve created a url
as a string containing all the parts that
will be parsed out. Parsing is really easy: we just call the parse
method from the URL
module on the string. It returns a data
structure representing the parts of the parsed URL. The components it
produces are:
The href
is the full URL
that was originally
passed to parse
. The protocol is the
protocol used in the URL
(e.g.,
http://
, https://
, ftp://
, etc.). host
is the fully qualified hostname of the
URL
. This could be as simple as the
hostname for a local server, such as print
server
, or a fully qualified domain name such as www.google.com
. It might also include a port
number, such as 8080
, or username and
password credentials like un:pw@ftpserver.com
. The various parts of the
hostname are broken down further into auth
, containing just the user credentials;
port
, containing just the port; and
hostname
, containing the hostname
portion of the URL
. An important
thing to know about hostname
is that
it is still the full hostname, including the top-level domain (TLD;
e.g., .com
, .net
, etc.) and the specific server. If the
URL
were http://sport.yahoo.com/nhl
, hostname
would not give you just the TLD
(yahoo.com
) or just the host
(sport
), but the entire hostname
(sport.yahoo.com
). The URL
module doesn’t have the capability to
split the hostname down into its components, such as domain or
TLD.
The next set of components of the URL
relates to everything after the host
.
The pathname
is the entire filepath
after the host
. In http://sports.yahoo.com/nhl
, it is /nhl
. The next component is the search
component, which stores the HTTP GET
parameters in the URL. For example,
if the URL were http://mydomain.com/?foo=bar&baz=qux
, the
search
component would be ?foo=bar&baz=qux
. Note the inclusion of
the ?
. The query
parameter is similar to the search
component. It contains one of two
things, depending on how parse
was
called.
parse
takes two arguments: the url
string
and an optional Boolean that determines whether the queryString
should be parsed using the
querystring
module, discussed in the
next section. If the second argument is false, query
will just contain a string similar to
that of search
but without the
leading ?
. If you don’t pass anything
for the second argument, it defaults to false
.
The final component is the fragment
portion of the URL. This is the part
of the URL after the #
. Commonly,
this is used to refer to named anchors in HTML
pages. For instance, http://abook.com/#chapter2
might refer to the
second chapter on a web page hosting a whole book. The hash
component in this case would contain
#chapter2
. Again, note the included
#
in the string. Some sites, such as
http://twitter.com
, use more complex
fragments for AJAX applications, but the same rules apply. So the URL
for the Twitter mentions account, http://twitter.com/#!/mentions
, would have a
pathname
of /
but a hash of #!/mentions
.
The querystring
module is a very simple helper module to deal with query strings.
As discussed in the previous section, query strings are the parameters
encoded at the end of a URL. However, when reported back as just a
JavaScript string, the parameters are fiddly to deal with. The querystring
module provides an easy way to
create objects from the query strings. The main methods it offers are parse
and
decode
, but some internal helper
functions, —such as escape
,
unescape
, unescapeBuffer
, encode
, and stringify
, are also exposed. If you have a
query string, you can use parse
to
turn it into an object, as shown in Example 4-14.
Here, the class’s parse
function turns the query string into an
object in which the properties are the keys and the values correspond to
the ones in the query string. You should notice a few things, though.
First, the numbers are returned as strings, not numbers. Because
JavaScript is loosely typed and will coerce a string into a number in a
numerical operation, this works pretty well. However, it’s worth bearing
in mind for those times when that coercion doesn’t work.
Additionally, it’s important to note that
you must pass the query string without the leading ?
that demarks it in the URL. A typical URL
might look like http://www.bobsdiscount.com/?item=304&location=san+francisco
.
The query string starts with a ?
to
indicate where the filepath ends, but if you include the ?
in the string you pass to parse
, the first key will start with a
?
, which is almost certainly not what
you want.
This library is really useful in a bunch of
contexts because query strings are used in situations other than URLs.
When you get content from an HTTP
POST
that is x-form-encoded
, it
will also be in query string form. All the browser manufacturers have
standardized around this approach. By default, forms in HTML will send
data to the server in this way also.
The querystring
module is also used as a helper
module to the URL
module.
Specifically, when decoding URLs, you can ask URL
to turn the query string into an object
for you rather than just a string. That’s described in more detail in
the previous section, but the parsing that is done uses the parse
method from querystring
.
Another important part of querystring
is encode
(Example 4-15).
This function takes a query string’s key-value pair object and
stringifies it. This is really useful when you’re working with HTTP
requests, especially POST
data. It makes it easy to work with a
JavaScript object until you need to send the data over the wire and then
simply encode it at that point. Any JavaScript object can be used, but
ideally you should use an object that has only the data that you want in
it because the encode
method will add
all properties of the object. However, if the property value isn’t a
string, Boolean, or number, it won’t be serialized and the key will just
be included with an empty value.
[7] When we talk about a pseudoclass, we are referring to the definition found in Douglas Crockford’s JavaScript: The Good Parts (O’Reilly). From now on, we will use “class” to refer to a “pseudoclass.”
[8] This works in JavaScript because it supports first-class functions.
Get Node: Up and Running now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.