The World Wide Web has grown quickly over the last couple of decades to become an invaluable resource for communication, research, and entertainment. The Web has also become an open platform on which powerful services and applications can be built by established companies and newcomers alike. It is a very accessible platform that allows even small companies to create web applications and build a business without requiring the backing of a large enterprise. A person or group with some expertise, some time, and a good enough idea can create a web application that competes with the offerings of larger corporations—or even carves out an entirely new market. On the Web, the size and marketing clout of a large corporation does not guarantee it a monopoly on the attention and patronage of a global audience.
The Web is full of opportunities for companies both large and small, but the smaller companies face a difficult problem: infrastructure.
Web applications that are popular and have thousands of users require significant infrastructure to provide the high performance and smooth experience that users demand. Industrial-strength infrastructure is very expensive to buy and maintain, so smaller companies with fewer users are often forced to do without. Yet in today’s world of web publicity flash storms caused by sites such as Slashdot and Digg, the difference between a web application serving a few dozen users and serving thousands may be no more than a glowing article and a few hours’ time.
Although this kind of attention may be exactly what you hope for, unless you have invested heavily in infrastructure, your application may not survive the onslaught. On the other hand, if you spend too much money on servers, bandwidth, hosting, and the management of all this infrastructure, there will be little left to develop the application itself. A dilemma facing many small development teams is how to strike the right balance between investing in application development and funding robust and scalable infrastructure.
Amazon offers a new and compelling solution to this dilemma in the form of infrastructure web services. These services allow application developers to avoid altogether the burden of buying and maintaining physical infrastructure by making it possible to rent virtual infrastructure instead. In this book we will show you how you can build your applications on top of Amazon’s services and effectively outsource your infrastructure.
Amazon Simple Storage Service (S3) offers secure online storage space for any kind of data, providing an alternative to building, maintaining, and backing-up your own storage systems. It makes your data accessible to any other applications or individuals you allow from anywhere on the Web. There are no limits on how much data you can store in the service, how long you can store it, or on how much bandwidth you can use to transfer or publish it.
S3 is a scalable, distributed system that stores your information reliably across multiple Amazon data centers, and it is able to serve it quickly to massive audiences. Its storage application programming interface (API) is deliberately simple and makes no assumptions about the nature of the data you are storing. This simplicity means you can maintain complete control over how your data is represented in the service.
Amazon Elastic Compute Cloud (EC2) makes it possible to run multiple virtual Linux servers on demand, providing as many computers as you need to process your data or run your web application without having to purchase or rent physical machines. In EC2 you have full control over each server with root access to the operating system (the root user is the ultimate system administrator on Linux machines), a configurable firewall to manage network access, and the freedom to install any software you please. Once you have set up an EC2 server the way you like it, you can save it permanently as a server image. You can then launch new servers from this image to create virtual machines that are preconfigured and ready to do your bidding.
The EC2 service offers computing resources that are very flexible. You can run as many servers as you need for as long as you need them, and you can shut them all down when they have served their purpose. The service offers an API to start and stop server instances, apply access and networking permissions, and manage your server images. You manage each individual server using standard Linux tools over a secure shell session.
Amazon Simple Queue Service (SQS) delivers short messages between any computers or systems with access to the Internet, allowing the components of your distributed web applications to communicate reliably without you having to build or maintain your own messaging system. With SQS you can send an unlimited number of messages via an unlimited number of message queues, and you can configure the performance characteristics and access permissions for each queue. The service uses a message locking and timeout mechanism that helps prevent messages from being delivered more than once, while still ensuring they will be delivered despite any component failures or network dropouts.
SQS is implemented as a distributed application within Amazon. Your messages are stored redundantly across multiple servers and data centers. The service’s API allows you to send and receive messages, and to control their full life cycle.
Amazon Flexible Payments Service (FPS) transfers money between individuals or companies that have Amazon Payments accounts, allowing you to build applications that provide an online store or that implement a marketplace between customers and third-party vendors. With FPS you can make payments from traditional sources, such as credit cards and bank accounts, or from sources internal to Amazon Payments accounts that have lower fees and are designed to make micro-payment transactions feasible.
All transactions need to be authorized by everyone involved in the transaction. The parties involved can impose detailed constraints on transactions, such as how and when transactions can be performed, how much money can be transferred, and who can send and receive the funds.
Customers interact with your FPS application through an Amazon Payments gateway using their Amazon.com account. Because the transactions are mediated by Amazon, your customers are not required to provide you with their personal banking information, and you do not have the burden of securely storing this highly sensitive information.
At the time of this writing, the FPS service is still in beta. This means that the service’s features are still evolving quickly, and there is an increased risk of problems that may make the service unsuitable for use in production systems. The full functionality of the service is only available to developers or users with U.S.-based credit cards and bank accounts. International users can access only limited functionality.
Amazon SimpleDB (SimpleDB) stores small pieces of textual information in a simple database structure that is easy to manage, modify and search. If your application relies on a relatively simple database, this service can replace your traditional relational database (RDBMS) server leaving you with one less piece of infrastructure to purchase and maintain.
SimpleDB is designed to minimize the complexity and administrative overhead involved in managing your data. It does not require a pre-defined schema so you can alter the structure and content of your database whenever you need to. It indexes every piece of information you store so all your queries run quickly. And it stores your data securely, redundantly and safely within Amazon’s network of data centers.
At the time of this writing, the SimpleDB service is still in beta. This means that the service’s features are still evolving quickly, and there is an increased risk of problems that may make the service unsuitable for use in production systems.
These five web services—S3, EC2, SQS, FPS, and SimpleDB—share the same fundamental characteristics. They are pay-as-you-go, meaning you pay predictable fees based on how much or how little you use the service. There are no initial costs to join, no long-term subscription payments, and the usage fees are attractively low. The services are highly scalable, performing equally well in modest or massively demanding usage scenarios. This means that the applications built on them can be similarly scalable and are able to grow rapidly at short notice without hitting limits imposed by insufficient infrastructure. One significant feature is that all the services are designed to be highly reliable and fault-tolerant: the services and data resources are distributed across multiple servers and data centers within Amazon’s infrastructure, and they are managed by a company with significant experience and investments in the operation of a global web business.
To use AWS you first need to register for an account and provide a credit card to be billed for your service usage. If you already have an Amazon.com account for Amazon’s online store, you can associate your AWS membership with this existing account.
Create a new AWS account at the AWS home page—http://www.aws.amazon.com. This is where you can manage your AWS account, sign up for services, view your service activity, and track billing information.
Once you have registered for an AWS account, you need to sign up separately for each AWS service you wish to use. If you have not explicitly signed up for a service, you will not be able to access its API. To sign up for a service, visit the home page for that service and click on the button “Sign Up For This Web Service.”
Here are the home pages for the infrastructure services we discuss in this book:
To view a listing of the services you have signed up for and the billing history for these services, click on the “Your Web Services Account” button on the AWS home page. Figure 1-1 shows this button that leads to your AWS account information, and which also gives you access to the AWS Access Identifiers page (see Figure 1-2), where you can lookup the AWS access key and X.509 certificate credentials associated with your AWS account.
You will be billed monthly for your usage of AWS, at which time Amazon will automatically debit your usage fees from the credit card associated with your AWS account. All service charges and payments are in U.S. dollars.
Before you start building applications based on AWS, it is worthwhile to consider the thinking behind these services. What were the key goals that lead Amazon to build the services in the first place? And how did these goals influence the design and implementation of the services?
Initially, the AWS infrastructure services were not conceived as products to be sold to developers external to Amazon but were instead designed to meet specific needs within Amazon’s own internal systems. It was only later that these services were opened up to the public. The key implementation details of the services are therefore intended primarily to serve Amazon’s needs and will not necessarily use the methodologies or techniques common in the rest of the industry. Appreciating the reasoning behind the architectural decisions and their implementation details can help you to adjust your expectations for the services. This, in turn, will make it easier to design applications that work well with the services’ capabilities.
Amazon’s services are designed to power the Amazon.com web site and related partner applications. The services operate as small component cogs in a large service-oriented architecture (SOA). Each service performs a specific task as simply and efficiently as possible, while the strengths of many different services are combined as required to perform complex processes and build the rich Amazon.com web pages with which we are familiar. Amazon’s SOA has been developed over many years of hard-won experience to be highly scalable to meet growing demand and be highly reliable despite the inevitable hardware and network failures that will occur in such an environment.
The AWS infrastructure services we will examine in this book were designed to fulfill specific tasks in this SOA environment. You will gain the most from the services with the fewest headaches if you design your applications to work like Amazon’s. Instead of taking the traditional approach of building a system with the expectation that everything will work as expected all the time, and that problems will be so rare that you can deal with them as an afterthought, you need to accept from the start that failures will occur, and you should design your application to deal with them. For example, you should aim to build application components that can recover from temporary network glitches, gracefully handle error conditions, and restart quickly. Try to avoid creating architectural bottlenecks that are single points of failure. Instead, share the work burden between multiple components in a service pool that can be expanded or contracted in response to demand, and ensure that each component in the group can be easily replaced.
With the right mindset, you can take full advantage of the AWS infrastructure and build your own applications that, like Amazon’s, are scalable, reliable, and cheap to run. If you do not embrace this chaotic and contingency-based approach, or if your application simply does not fit this model, you may end up fighting the AWS infrastructure every step of the way.
It should go without saying that the infrastructure services provided by AWS will not be suitable for every circumstance or application. There are a number of things you need to consider carefully before deciding whether a full or partial move to Amazon’s virtual infrastructure is appropriate in your situation. In this section we will briefly discuss some of the common objections to making such a move and suggest counter-arguments to these objections. Our aim is not to persuade you one way or another, but to raise the issues you need to consider before you make up your own mind.
The infrastructure provided by AWS is only available when you have a working Internet connection and a clear network path to the services. It is vital to have a high-speed Internet connection. If you have an intermittent or slow connection to the Internet, these services will not be a practical option. Even with a fast connection, with the fragility of networking hardware and the vagaries of Internet traffic routing, it is likely that sooner or later you will be unable to reach AWS for a brief period of time. If your application is completely dependent on AWS, this could result in downtime for your application and disruption for your customers. With these kinds of issues, it may not be possible for Amazon to help, especially if the problem is caused by network resources outside its control.
Amazon seeks to be resistant to traffic routing problems by making its services accessible multiple data center locations. These data centers are generally located in the United States, though S3 is also available from data centers located in Europe. Overall, for any web-based resource, there is always the possibility of losing connectivity, and you need to take this risk into account when planning your application.
Amazon does not provide Service Level Agreements (SLAs) for all its services. This means that it does not always define the level of service it will guarantee to AWS customers, nor does it offer compensation should the level of service fall below expectations. Of the five infrastructure services discussed in this book, only S3 has an SLA. For some organizations, the lack of a formal service agreement will make the AWS offerings too risky to accept. Even for groups or individuals without such strict requirements, the lack of an SLA can be disquieting, because Amazon generally only promises “best-effort” service. In addition to this, the AWS Terms and Conditions agreement permits the termination of AWS accounts at the sole discretion of Amazon. In legal terms, there does not seem to be a strong commitment by Amazon to providing high-quality service to AWS customers.
Looking at the AWS offering from a nonlegal perspective, Amazon’s commitment seems clear and solid. The services are continually evolving, new services are added periodically, and Amazon staff at all levels from CEO down is actively evangelizing the service and maintaining a dialogue with AWS users to improve the offering. If actions speak louder than words, the future of AWS seems assured.
Even if you are not prepared to rely completely on AWS without greater assurances, the services can still provide a worthwhile resource to use in the meantime, provided you have a backup plan in case things go awry.
AWS is still a relatively new suite of services, and Amazon has so far been willing to provide an SLA for the oldest and most stable of the services: S3. It is possible that as its services mature, and as AWS users maintain pressure on the company to do so, Amazon will eventually offer SLAs for its other services as well.
Developers of web applications that store, transmit, or process sensitive information need to be mindful of their responsibility to protect the security of this information and keep it private. Exposing this information to a third party, like Amazon, by using AWS infrastructure services can potentially reduce the degree of control you can maintain over the data, and transmitting the information to and from these services over the Internet can also pose a risk. In the effort to maintain the security and privacy of your data, there are three things you need to consider:
Secure transmission; and
Although Amazon uses industry standard mechanisms to protect data through user authentication and secure transmission, you may need to do some extra work to guarantee your data is stored securely.
AWS user authentication is provided by the access mechanism for the APIs. A client program that sends requests to an AWS service is required to authenticate itself by signing the request message. Requests are digitally signed with the credentials of the AWS account holder, proving that they could only have been generated by a client in possession of the credentials and allowing Amazon to detect if they are altered in transit. This means you can be sure that you are the only person able to access your services and the data they contain. We will discuss the request signing procedure for AWS in more detail in User Authentication” in Chapter 2.
It is easy to secure your AWS service communication channels using the standard HTTP over SSL protocol. Amazon makes the infrastructure web service APIs available over both the standard HTTP protocol and the secure version, HTTPS. By using HTTPS for all your communication with the services, you can ensure your data is automatically encrypted in transit, and by taking advantage of the endpoint authentication mechanism available in the SSL protocol, you can guarantee that you only communicate with an authentic Amazon server.
The onus is on you, as an AWS developer, to ensure the secure storage of your data in AWS. Data stored within Amazon’s infrastructure is protected by the user authentication mechanisms that control access to the information based on the client’s credentials. For highly sensitive data, you need to consider the risk that the information could be accessed by malevolent individuals who gain access to the service from inside Amazon or by stealing your AWS credentials. To properly protect sensitive information, it should be encrypted prior to being transmitted or stored, making it indecipherable even if someone manages to access the stored data. Because Amazon’s services require no particular data format, you are free to encrypt information in any way you wish, but support for doing so is not built in to the services.
If you have specific infrastructure requirements for your web application that do not align with the capabilities of AWS, you may have no choice but to find an alternative infrastructure provider or build your own. To be sure AWS can meet your needs, you will have to research the capabilities and limitations of Amazon’s services, and determine whether any limitations can be overcome by rethinking your approach or designing around them.
To obtain the detailed technical information you will need to make these judgments, we encourage you to do two things. First, read the chapters in this book that discuss the technical and usage details of the services to gain an understanding of how the APIs work. Once you are in a position to ask detailed questions, you can take advantage of the considerable expertise and advice offered by the developers and Amazon staff who populate the AWS forums.
There is no direct channel for notifying Amazon when you are experiencing service faults, for escalating unresolved problems, or for obtaining information about any actions Amazon is taking to resolve the issues. Although there is an email address advertised for making technical enquiries (email@example.com) this has not proven to be an effective means of notifying Amazon of service faults. The only channel currently available for sending such notifications and for following up on service issues is through the Developer Connection online discussion forums.
When faults occur in the AWS, the forums are also the only place you will find any detailed information or status reports. Most of this information is generated by other service users and is not officially made available by Amazon. Amazon does not provide a publicly accessible status page that shows the current health of services or details about current issues, and its staff does not tend to proactively post details about service outages in the forums, probably because the staff prefers to spend its time fixing the issue. This is an area in which Amazon could definitely improve, and we hope it will do so as the services become more mature and its customers demand a more acceptable solutions.
AWS infrastructure services are made available through three separate APIs: REST, Query, and SOAP. In this book we will focus only on the REST and Query APIs and will not demonstrate how to use the SOAP APIs. We have a number of reasons for doing this, reasons which will become clearer after a brief explanation of the differences between the interfaces.
The REST interfaces offered by AWS use only the standard components of HTTP request messages to represent the API action that is being performed. These components include:
HTTP method: describes the action the request will perform
Universal Resource Identifier (URI): path and query elements that indicate the resource on which the action will be performed
Request Headers: pieces of metadata that provide more information about the request itself or the requester
Request Body: the data on which the service will perform an action
The Query interfaces offered by AWS also use the standard components of the HTTP protocol to represent API actions; however these interfaces use them in a different way. Query requests rely on parameters, simple name and value pairs, to express both the action the service will perform and the data the action will be performed on. When you are using a Query interface, the HTTP envelope serves merely as a way of delivering these parameters to the service.
To perform an operation with a Query interface, you can express the parameters in the URI of a GET request, or in the body of a POST request. The method component of the HTTP request merely indicates where in the message the parameters are expressed, while the URI may or may not indicate a resource to act upon.
These characteristics mean that the Query interfaces cannot be considered properly RESTful because they do not use the HTTP message components to fully describe API operations. Instead, the Query interfaces can be considered REST-like, because although they do things differently, they still only use standard HTTP message components to perform operations.
The SOAP interfaces offered by AWS use XML documents to express the action that will be performed and the data that will be acted upon. These SOAP XML documents are constructed as another layer on top of the underlying HTTP request, such that all the information about the operation is moved out of the HTTP message and encapsulated in the SOAP message instead.
For operations performed with a SOAP interface, the HTTP components of the request message are nearly irrelevant: all that is important is the XML document sent to the service as the body of the request. The valid structure and content of SOAP messages are defined in a Web Service Description Language (WSDL) document that describes the operations the service can perform, and the structure of the input and output data documents the service understands. To create a client program for a SOAP interface, you will typically use a third-party tool to interpret the WSDL document and generate the client stub code necessary to interact with the service.
The approach used in the SOAP interfaces are very different from those used by the REST and Query interfaces. Operations expressed in SOAP messages are completely divorced from the underlying HTTP message used to transmit the request, and the HTTP message components, such as method and URI, reveal nothing about the operation being performed.
The main reason we eschew the SOAP interface in this book is because we believe that SOAP interfaces in general add unnecessary complexity and overhead, effectively spoiling the simplicity and transparency that can make web services such powerful and flexible tools. We are not alone in feeling this way. According to Amazon staff members, a vast majority of developers use the REST-based APIs to interact with AWS.
When a client performs an action using a REST-based interface, the messages are relatively simple. They can be constructed and interpreted by standard tools that recognize the HTTP protocol. SOAP messages, on the other hand, are much more complicated than those based on REST. To build a SOAP interface, you will generally require sophisticated tools to generate stub code before you can send any requests at all. Although such tools are readily available for some programming languages, typically those used in big business, they remain unavailable or immature on many platforms.
A secondary reason for avoiding SOAP is that we have tried to follow the KISS principle in this book. We have sought to keep our code samples as clear, simple, and broadly applicable as possible. Although most of our samples are written in the Ruby language, the techniques we demonstrate should be easy to apply in any other language that provides support for HTTP. If we presented SOAP interface clients, this code would demonstrate the subtleties of a particular third-party SOAP library much more effectively than it would the general-purpose techniques for using AWS that will work across multiple languages.
The steps for using a service’s REST or Query interface will be very similar for the SOAP interface as well. As the different interfaces all look similar and act in much the same way, readers already familiar with SOAP, and with access to high-quality tools for interacting with SOAP services, should be able to follow along with our API discussions using the SOAP interfaces. However, be aware that the SOAP service interfaces do not always provide all the functionality available in the REST or Query interfaces.
 The distinctions we allude to between different web service architectures are described in much more detail in RESTful Web Services by Leonard Richardson and Sam Ruby (O’Reilly). This book presents a strong case for avoiding the complexity that is SOAP, while discussing the theory and techniques necessary to build web services that take advantage of the simple, elegant, and efficient REST-based architecture that has served the Web so well.