Building Scalable Web Sites

Chapter 1. Introduction

Before we dive into any design or coding work, we need to step back and define our terms. What is it we’re trying to do andhow does it differ from what we’ve done before? If you’ve already built some web applications, you’re welcome to skip aheadto the next chapter (where we’ll start to get a bit nerdier), but if you’re interested in getting some general context thenkeep on reading.

What Is a Web Application?

If you’re reading this book, you probably have a good idea of what a web application is, but it’s worth defining our terms because the label has been routinely misapplied. A web application is neither a web site nor an application in the usual desktop-ian sense. A web application sits somewhere between the two, with elements of both.

While a web site contains pages of data, a web application is comprised of data with a separate delivery mechanism. While web accessibility enthusiasts get excited about the separation of markup and style with CSS, web application designers get excited about real data separation: the data in a web application doesn’t have to have anything to do with markup (although it can contain markup). We store the messages that comprise the discussion component of a web application separately from the markup. When the time comes to display data to the user, we extract the messages from our data store (typically a database) and deliver the data to the user in some format over some medium (typically HTML over HTTP). The important part is thatwe don’t have to deliver the data using HTML; we could just as easily deliver it as a PDF by email.

Web applications don’t have pages in the same way web sites do. While a web application may appear to have 10 pages, addingmore data to the data store increases the page count without our having to add further markup or source code to our application. With a feature such as search, which is driven by user input, a web application can have a near infinite number of “pages,” but we don’t have to enter each of these as a blob of HTML. A small set of templates and logic allows us to generate pages on the fly based on input parameters such as URL or POST data.

To the average user, a web application can be indistinguishable from a web site. For a simple weblog, we can’t tell by looking at the outputted markup whether the pages are being generated on the fly from a data store or written as static HTML documents. The file extension can give us a clue, but can be faked for good reason in either direction. A web application tends to appear to be an application only to those users who edit the application’s data. This is often, although not always, accomplished via an HTML interface, but could just as easily be achieved using a desktop application that edits the data store directly or remotely.

With the advent of Ajax (Asynchronous JavaScript and XML, previously known as remote scripting or “remoting”), the interaction model for web applications has been extended. In the past, users interacted with web applications using a page-based model. A user would request a page from the server, submit his changes using an HTTP POST, and be presented with a new page, either confirming the changes or showing the modified data. With Ajax, we can send our data modifications in the background without changing the page the user is on, bringing us closer to the desktop application interaction model.

The nature of web applications is slowly changing. It can’t be denied that we’ve already come a long way from the first interactive applications on the Web, but there’s still a fair way to go. With applications like Google’s Gmail and Microsoft’s Office Live, the web application market is moving toward applications delivered over the Web with the features and benefits of desktop applications combined with the benefits of web applications. While desktop applications give us rich interactivity and speed, web applications can offer zero-effort upgrades, truly portable data, and reduced client requirements. Whatever the model of interaction, one thing remains constant: web applications are systems with a core data set that can be accessed and modified using web pages, with the possibility of other interfaces.

How Do You Build Web Applications?

To build a web application, we need to create at least two major components: a hardware platform and software platform. Forsmall, simple applications, a hardware platform may comprise a single shared server running a web server and a database. Atsmall scales we don’t need to think about hardware as a component of our applications, but as we start to scale out, it becomes a more and more important part of the overall design. In this book we’ll look extensively at both sides of applicationdesign and engineering, how they affect each other, and how we can tie the two together to create an effective architecture.

Developers who have worked at the small scale might be asking themselves why we need to bother with “platform design” when we could just use some kind of out-of-the-box solution. For small-scale applications, this can be a great idea. We save time and money up front and get a working and serviceable application. The problem comes at larger scales—there are no off-the-shelf kits that will allow you to build something like Amazon or Friendster. While building similar functionality might be fairly trivial, making that functionality work for millions of products, millions of users, and without spending far too much on hardware requires us to build something highly customized and optimized for our exact needs. There’s a good reason why the largest applications on the Internet are all bespoke creations: no other approach can create massively scalableapplications within a reasonable budget.

We’ve already said that at the core of web applications we have some set of data that can be accessed and perhaps modified. Within the software element of an application, we need to decide how we store that data (a schema), how we access and modify it (business logic), and how we present it to our users (interaction logic). In Chapter 2 we’ll be looking at these different components, how they interact, and what comprises them. A good application design works down from the very top, defining software and hardware architecture, the components that comprise your platform, and the functionality implemented by those layers.

This book aims to be a practical guide to designing and building large-scale applications. By the end of the book, you’ll have a good idea of how to go about designing an application and its architecture, how to scale your systems, and how to go about implementing and executing those designs.

What Is Architecture?

We like to talk about architecting applications, but what does that really mean? When an architect designs a house, he has a fairly well-defined task: gather requirements, explore the options, and produce a blueprint. When the builders turn that blueprint into a building, we expect a few things: the building should stay standing, keep the rain and wind out, and let enough light in. Sorry to shatter the illusion, but architecting applications is not much like this.

For a start, if buildings were like software, the architect would be involved in the actual building process, from laying the foundations right through to installing the fixtures. When he designed and built the house, he would start with a coupleof rooms and some basic amenities, and some people would then come and start living there before the building was complete. When it looked like the building work was about to finish, a whole bunch more people would turn up and start living there, too. But these new residents would need new features—more bedrooms to sleep in, a swimming pool, a basement, and on and on. The architect would design these new rooms and features, augmenting his original design. But when the time came to build them, the current residents wouldn’t leave. They’d continue living in the house even while it was extended, all the time complaining about the noise and dust from the building work. In fact, against all reason, more people would move in while the extensions were being built. By the time the modifications were complete, more would be needed to house the newcomers and keep them happy.

The key to good application architecture is planning for these issues from the beginning. If the architect of our mythical house started out by building a huge, complex house, it would be overkill. By the time it was ready, the residents would have gone elsewhere to live in a smaller house built in a fraction of the time. If we build in such a way that extending our house takes too long, then our residents might move elsewhere. We need to know how to start at the right scale and allow our house to be extended as painlessly as possible.

That’s not to say that we’re going to get anything right the first time. In the scaling of a typical application, every aspect and feature is probably going to be revisited and refactored. That’s fine—the task of an application architect isto minimize the time it takes to refactor each component, through careful initial and ongoing design.

How Do I Get Started?

To get started designing and building your first large-scale web application, you’ll need four things. First, you’ll need an idea. This is typically the hardest thing to come up with and not traditionally the role of engineers;). While the techniques and technologies in this book can be applied to small projects, they are optimal for larger projects involving multiple developers and heavy usage. If you have an application that hasn’t been launched or is small and needs scaling, then you’ve already done the hardest part and you can start designing for the large scale. If you already have a large-scale application, it’s still a good idea to work your way through the book from front to back to check that you’ve covered your bases.

Once you have an idea of what you want to build, you’ll need to find some people to build it. While small and medium applications are buildable by a single engineer, larger applications tend to need larger teams. As of December 2005, Flickr has over 100,000 lines of source code, 50,000 lines of template code, and 10,000 lines of JavaScript. This is too much code for a single engineer to maintain, so down-the-road responsibility for different areas of the application needs to be delegated to different people. We’ll look at some techniques for managing development with multiple developers in Chapter 3. To build an application with any size team, you’ll need a development environment and a staging environment (assuming you actually want to release it). We’ll talk more about development and staging environments as well as the accompanying build tools in Chapter 3, but at a basic level, you’ll need a machine running your web server and database server software.

The most important thing you need is a method of discussing and recording the development process. Detailed spec documents can be tedious overkill, but not writing anything down can be similarly catastrophic. A good pad of paper can suffice for very small teams, or a good whiteboard (which you can then photograph to keep a persistent copy of your work). If you find you can’t tear yourself away from a computer long enough to grasp a pen, a Wiki can fulfill a similar role. For larger teamsa Wiki is a good way to organize development specifications and notes, allowing all your developers to add and edit and allowing them to see the work of others.

While the classic waterfall development methodology can work well for monolithic and giant web applications, web application development often benefits from a fast iterative approach. As we develop an application design, we want to avoid taking any steps that pin us in a corner. Every decision we make should be quickly reversible if we find we took a wrong turn—new features can be designed technically at a very basic level, implemented, and then iterated upon before release (or even after release). Using lightweight tools such as a Wiki for ongoing documentation allows ourselves plenty of flexibility—we don’t need to spend six months developing a spec and then a year implementing it. We can develop a spec in a day and then implement it in a couple of days, leaving months to iterate and improve on it. The sooner we get working code to play with, the sooner we find out about any problems with our design and the less time we will have wasted if we need to take a different approach. The last point is fairly important—the less time we spend on a single unit of functionality (which tends to mean our units are small and simple), the less invested we’ll be in it and the easier it will be to throw away if need be. For a lot more information about development methodologies and techniques, pick up a copy of Steve McConnell’s Rapid Development (Microsoft Press).

With pens and Wiki in hand, we can start to design our application architecture and then start implementing our world-changing application.

Get Building Scalable Web Sites now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Building Scalable Web Sites by Cal Henderson

Chapter 1. Introduction

What Is a Web Application?

How Do You Build Web Applications?

What Is Architecture?

How Do I Get Started?

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly