One of the challenges of writing a book is trying to explain things in the simplest way possible. That runs counter to showing techniques and functional code that you’d want to deploy. Although we should always strive to have the simplest, most understandable code possible, sometimes you need to do things that make code more robust or faster at the cost of making it less simple. This section provides guidance about how to harden the applications you deploy, which you can take with you as you explore upcoming chapters. This section is about writing code with maturity that will keep your application running long into the future. It’s not exhaustive, but if you write robust code, you won’t have to deal with so many maintenance issues. One of the trade-offs of Node’s single-threaded approach is a tendency to be brittle. These techniques help mitigate this risk.
Deploying a production application is not the same as running test programs on your laptop. Servers can have a wide variety of resource constraints, but they tend to have a lot more resources than the typical machine you would develop on. Typically, frontend servers have many more cores (CPUs) than laptop or desktop machines, but less hard drive space. They also have a lot of RAM. Node currently has some constraints, such as a maximum JavaScript heap size. This affects the way you deploy because you want to maximize the use of the CPUs and memory on the machine while using Node’s easy-to-program single-threaded approach.
As we saw earlier in this chapter, you can split I/O activities from other things in Node, and error handling is one of those things. JavaScript includes try/catch functionality, but it’s appropriate only for errors that happen inline. When you do nonblocking I/O in Node, you pass a callback to the function. This means the callback is going to run when the event happens outside of the try/catch block. We need to be able to provide error handling that works in asynchronous situations. Consider the code in Example 3-9.
Example 3-9. Trying to catch an error in a callback and failing
var http = require('http') var opts = { host: 'sfnsdkfjdsnk.com', port: 80, path: '/' } try { http.get(opts, function(res) { console.log('Will this get called?') }) } catch (e) { console.log('Will we catch an error?') }
When you call http.get()
,
what is actually happening? We pass some parameters
specifying the I/O we want to happen and a callback function. When the
I/O completes, the callback function will be fired. However, the
http.get()
call will succeed simply
by issuing the callback. An error during the GET cannot be caught by a
try/catch block.
The disconnect from I/O errors is even more obvious in Node REPL.
Because the REPL shell prints out any return values that are not
assigned, we can see that the return value of calling http.get()
is the http.ClientRequest
object that is created.
This means that the try/catch did its job by making sure the specified
code returned without errors. However, because the hostname is nonsense,
a problem will occur within this I/O request. This means the callback
can’t be completed successfully. A try/catch can’t help with this,
because the error has happened outside the JavaScript, and when Node is
ready to report it, we are not in that call stack any more. We’ve moved
on to dealing with another event.
We deal with this in Node by using the error
event. This is a
special event that is fired when an error occurs. It allows a module
engaging in I/O to fire an alternative event to the one the callback was
listening for to deal with the error. The error event allows us to deal
with any errors that might occur in any of the callbacks that happen in
any modules we use. Let’s write the previous example correctly, as shown
in Example 3-10.
Example 3-10. Catching an I/O error with the error event
var http = require('http') var opts = { host: 'dskjvnfskcsjsdkcds.net', port: 80, path: '/' } var req = http.get(opts, function(res) { console.log('This will never get called') }) req.on('error', function(e) { console.log('Got that pesky error trapped') })
By using the error
event, we
got to deal with the error (in this case by ignoring it). However, our
program survived, which is the main thing. Like try/catch in JavaScript,
the error
event catches all kinds of
exceptions. A good general approach to exception handling is to set up
conditionals to check for known error conditions and deal with them if
possible. Otherwise, catching any remaining errors, logging them, and
keeping your server running is probably the best approach.
As we’ve mentioned, Node is single-threaded. This means Node is using only one processor to do its work. However, most servers have several “multicore” processors, and a single multicore processor has many processors. A server with two physical CPU sockets might have “24 logical cores”—that is, 24 processors exposed to the operating system. To make the best use of Node, we should use those too. So if we don’t have threads, how do we do that?
Node provides a module called cluster
that allows you to delegate work to child processes. This means
that Node creates a copy of its current program in another process (on
Windows, it is actually another thread). Each child process has some
special abilities, such as the ability to share a socket with other
children. This allows us to write Node programs that start many other
Node programs and then delegate work to them.
It is important to understand that when you use cluster
to share work between a number of
copies of a Node program, the master process isn’t involved in every transaction. The
master process manages the child processes, but when the children interact with I/O
they do it directly, not through the master. This means that if you set
up a web server using cluster
,
requests don’t go through your master process, but directly to the
children. Hence, dispatching requests does not create a bottleneck in
the system.
By using the cluster
API, you
can distribute work to a Node process on every available core
of your server. This makes the best use of the resource. Let’s look at a
simple cluster
script in Example 3-11.
Example 3-11. Using cluster to distribute work
var cluster = require('cluster'); var http = require('http'); var numCPUs = require('os').cpus().length; if (cluster.isMaster) { // Fork workers. for (var i = 0; i < numCPUs; i++) { cluster.fork(); } cluster.on('death', function(worker) { console.log('worker ' + worker.pid + ' died'); }); } else { // Worker processes have a http server. http.Server(function(req, res) { res.writeHead(200); res.end("hello world\n"); }).listen(8000); }
In this example, we use a few parts of Node core to evenly
distribute the work across all of the CPUs available: the cluster
module, the http
module, and the os
module. From the
latter, we simply get the number of CPUs on the system.
The way cluster
works is that
each Node process becomes either a “master” or a “worker” process. When
a master process calls the cluster.fork()
method, it creates a child process that is identical to the
master, except for two attributes that each process can check to see
whether it is a master or child. In the master process—the one in which
the script has been directly invoked by calling it with Node—cluster.isMaster
returns true
, whereas cluster.isWorker
returns false
. cluster.isMaster
returns false
on the
child, whereas cluster.isWorker
returns true
.
The example shows a master script that invokes a worker for each
CPU. Each child starts an HTTP server, which is another unique aspect of
cluster
. When you listen()
to a socket where cluster
is in use, many processes can listen
to the same socket. If you simply started serveral Node processes with
node myscript.js
, this wouldn’t be
possible, because the second process to start would throw the EADDRINUSE
exception.
cluster
provides a cross-platform way
to invoke several processes that share a socket. And even when the
children all share a connection to a port, if one of them is jammed, it
doesn’t stop the other workers from getting connections.
We can do more with cluster
than simply share
sockets, because it is based on the child_process
module. This gives us a number of attributes, and some of the most
useful ones relate to the health of the child processes. In the previous
example, when a child dies, the master process uses console.log()
to print
out a death notification. However, a more useful script would cluster.fork()
a new child, as shown in Example 3-12.
Example 3-12. Forking a new worker when a death occurs
if (cluster.isMaster) { //Fork workers. for (var i=0; i<numCPUs; i++) { cluster.fork(); } cluster.on('death', function(worker) { console.log('worker ' + worker.pid + ' died'); cluster.fork(); }); }
This simple change means that our master process can keep restarting dying processes to keep our server firing on all CPUs. However, this is just a basic check for running processes. We can also do some more fancy tricks. Because workers can pass messages to the master, we can have each worker report some stats, such as memory usage, to the master. This will allow the master to determine when workers are becoming unruly or to confirm that workers are not freezing or getting stuck in long-running events (see Example 3-13).
Example 3-13. Monitoring worker health using message passing
var cluster = require('cluster'); var http = require('http'); var numCPUs = require('os').cpus().length; var rssWarn = (12 * 1024 * 1024) , heapWarn = (10 * 1024 * 1024) if(cluster.isMaster) { for(var i=0; i<numCPUs; i++) { var worker = cluster.fork(); worker.on('message', function(m) { if (m.memory) { if(m.memory.rss > rssWarn) { console.log('Worker ' + m.process + ' using too much memory.') } } }) } } else { //Server http.Server(function(req,res) { res.writeHead(200); res.end('hello world\n') }).listen(8000) //Report stats once a second setInterval(function report(){ process.send({memory: process.memoryUsage(), process: process.pid}); }, 1000) }
In this example, workers report on their memory usage, and the master sends an alert to the log when a process uses too much memory. This replicates the functionality of many health reporting systems that operations teams already use. It gives control to the master Node process, however, which has some benefits. This message-passing interface allows the master process to send messages back to the workers too. This means you can treat a master process as a lightly loaded admin interface to your workers.
There are other things we can do with message passing that we can’t do from the outside of Node. Because Node relies on an event loop to do its work, there is the danger that the callback of an event in the loop could run for a long time. This means that other users of the process are not going to get their requests met until that long-running event’s callback has concluded. The master process has a connection to each worker, so we can tell it to expect an “all OK” notification periodically. This means we can validate that the event loop has the appropriate amount of turnover and that it hasn’t become stuck on one callback. Sadly, identifying a long-running callback doesn’t allow us to make a callback for termination. Because any notification we could send to the process will get added to the event queue, it would have to wait for the long-running callback to finish. Consequently, although using the master process allows us to identify zombie workers, our only remedy is to kill the worker and lose all the tasks it was doing.
Some preparation can give you the capability to kill an individual worker that threatens to take over its processor; see Example 3-14.
Example 3-14. Killing zombie workers
var cluster = require('cluster'); var http = require('http'); var numCPUs = require('os').cpus().length; var rssWarn = (50 * 1024 * 1024) , heapWarn = (50 * 1024 * 1024) var workers = {} if(cluster.isMaster) { for(var i=0; i<numCPUs; i++) { createWorker() } setInterval(function() { var time = new Date().getTime() for(pid in workers) { if(workers.hasOwnProperty(pid) && workers[pid].lastCb + 5000 < time) { console.log('Long running worker ' + pid + ' killed') workers[pid].worker.kill() delete workers[pid] createWorker() } } }, 1000) } else { //Server http.Server(function(req,res) { //mess up 1 in 200 reqs if (Math.floor(Math.random() * 200) === 4) { console.log('Stopped ' + process.pid + ' from ever finishing') while(true) { continue } } res.writeHead(200); res.end('hello world from ' + process.pid + '\n') }).listen(8000) //Report stats once a second setInterval(function report(){ process.send({cmd: "reportMem", memory: process.memoryUsage(), process: process.pid}) }, 1000) } function createWorker() { var worker = cluster.fork() console.log('Created worker: ' + worker.pid) //allow boot time workers[worker.pid] = {worker:worker, lastCb: new Date().getTime()-1000} worker.on('message', function(m) { if(m.cmd === "reportMem") { workers[m.process].lastCb = new Date().getTime() if(m.memory.rss > rssWarn) { console.log('Worker ' + m.process + ' using too much memory.') } } }) }
In this script, we’ve added an interval to the master as well as
the workers. Now whenever a worker sends a report to the master process,
the master stores the time of the report. Every second or so, the master
process looks at all its workers to check whether any of them haven’t
responded in longer than 5 seconds (using >
5000
because timeouts are in milliseconds). If that is the
case, it kills the stuck worker and restarts it. To make this process
effective, we moved the creation of workers into a small function. This
allows us to do the various pieces of setup in a single place,
regardless of whether we are creating a new worker or restarting a dead
one.
We also made a small change to the HTTP server in order to give each request a 1 in 200 chance of failing, so you can run the script and see what it’s like to get failures. If you do a bunch of parallel requests from several sources, you’ll see the way this works. These are all entirely separate Node programs that interact via message passing, which means that no matter what happens, the master process can check on the other processes because the master is a small program that won’t get jammed.
Get Node: Up and Running now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.