book

Building Scalable Web Sites

Name: Building Scalable Web Sites
Author: Cal Henderson
ISBN: 9780596102357

by Cal Henderson

May 2006

Intermediate to advanced

349 pages

11h 55m

English

O'Reilly Media, Inc.

Read now

Unlock full access

A Note Regarding Supplemental Files
Preface
What This Book Is AboutWhat You Need to KnowConventions Used in This BookUsing Code ExamplesSafari® EnabledHow to Contact UsAcknowledgments
1. Introduction
1.1. What Is a Web Application?1.2. How Do You Build Web Applications?1.3. What Is Architecture?1.4. How Do I Get Started?
2. Web Application Architecture
2.1. Layered Software Architecture2.2. Layered Technologies2.3. Software Interface Design2.4. Getting from A to B2.5. The Software/Hardware Divide2.6. Hardware Platforms2.6.1. Shared Hardware2.6.2. Dedicated Hardware2.6.3. Co-Located Hardware2.6.4. Self-Hosting2.7. Hardware Platform Growth2.7.1. Availability and Lead Times2.7.2. Importing, Shipping, and Staging2.7.3. Space2.7.4. Power2.7.5. NOC Facilities2.7.6. Connectivity2.8. Hardware Redundancy2.9. Networking2.10. Languages, Technologies, and Databases
3. Development Environments
3.1. The Three Rules3.2. Use Source Control3.2.1. What Is Source Control?3.2.1.1. Versioning3.2.1.2. Rollback3.2.1.3. Logs3.2.1.4. Diffs3.2.1.5. Multiuser editing and merging3.2.1.6. Annotation (blame)3.2.1.7. The locking debate3.2.1.8. Projects and modules3.2.1.9. Tagging3.2.1.10. Branching3.2.1.11. Merging3.2.2. Utilities—the “Nice to Haves”3.2.2.1. Shell and editor integration3.2.2.2. Web interfaces3.2.2.3. Commit-log mailing list3.2.2.4. Commit-log RSS feed3.2.2.5. Commit database3.2.2.6. Commit hooks3.2.3. Source-Control Products3.2.4. The Revision Control System (RCS)3.2.4.1. The Concurrent Versions System (CVS)3.2.4.1.1. Client availability3.2.4.1.2. Web interfaces3.2.4.1.3. Mailing list and RSS feed3.2.4.1.4. Commit database3.2.4.2. Subversion (SVN)3.2.4.2.1. Client availability3.2.4.2.2. Web interfaces3.2.4.2.3. Mailing list and RSS feed3.2.4.2.4. Commit database3.2.4.3. Perforce3.2.4.3.1. Client availability3.2.4.3.2. Web interfaces3.2.4.3.3. Mailing list and RSS feed3.2.4.3.4. Commit database3.2.4.4. Visual Source Safe (VSS)3.2.4.4.1. Client availability3.2.4.4.2. Web interfaces3.2.4.4.3. Mailing list and RSS feed3.2.4.4.4. Commit database3.2.4.5. And the rest . . .3.2.4.6. Summary3.2.5. What to Put in Source Control3.2.5.1. Documentation3.2.5.2. Software configurations3.2.5.3. Build tools3.2.6. What Not to Put in Source Control3.3. One-Step Build3.3.1. Editing Live3.3.2. Creating a Work Environment3.3.2.1. Development3.3.2.1.1. Personal development environments3.3.2.2. Staging3.3.2.2.1. Sub-staging3.3.2.3. Production3.3.2.4. Beta production3.3.3. The Release Process3.3.4. Build Tools3.3.5. Release Management3.3.6. What Not to Automate3.3.6.1. Database schema changes3.3.6.2. Software and hardware configuration changes3.4. Issue Tracking3.4.1. The Minimal Feature Set3.4.2. Issue-Tracking Software3.4.2.1. FogBugz3.4.2.2. Mantis Bug Tracker3.4.2.3. Request Tracker (RT)3.4.2.4. Bugzilla3.4.2.5. Trac3.4.3. What to Track3.4.3.1. Bugs3.4.3.2. Features3.4.3.3. Operations3.4.3.4. Support requests3.4.4. Issue Management Strategy3.4.4.1. High-level categorization3.4.5. CADT3.5. Scaling the Development Model3.6. Coding Standards3.7. Testing3.7.1. Regression Testing3.7.2. Manual Testing
4. i18n, L10n, and Unicode
4.1. Internationalization and Localization4.1.1. Internationalization in Web Applications4.1.2. Localization in Web Applications4.1.2.1. String substitution4.1.2.2. Multiple template sets4.1.2.3. Multiple frontends4.2. Unicode in a Nutshell4.3. Unicode Encodings4.3.1. Code Points and Characters, Glyphs and Graphemes4.3.2. Byte Order Mark4.4. The UTF-8 Encoding4.5. UTF-8 Web Applications4.5.1. Handling Output4.5.2. Handling Input4.6. Using UTF-8 with PHP4.7. Using UTF-8 with Other Languages4.8. Using UTF-8 with MySQL4.9. Using UTF-8 with Email4.10. Using UTF-8 with JavaScript4.11. Using UTF-8 with APIs
5. Data Integrity and Security
5.1. Data Integrity Policies5.2. Good, Valid, and Invalid5.3. Filtering UTF-85.4. Filtering Control Characters5.5. Filtering HTML5.5.1. Why Use HTML?5.5.2. HTML Input Filtering5.5.3. Blacklists and Whitelists5.5.4. Balancing5.5.5. Dealing with HTML5.6. Cross-Site Scripting (XSS)5.6.1. The Canonical Hole5.6.2. User Input Holes5.6.3. Tag and Bracket Balancing5.6.4. Protocol Filtering5.7. SQL Injection Attacks5.7.1. Mitigating SQL Injection Attacks5.7.2. Avoiding SQL Injection Attacks
6. Email
6.1. Receiving Email6.2. Injecting Email into Your Application6.2.1. An Alternative Approach6.3. The MIME Format6.4. Parsing Simple MIME Emails6.5. Parsing UU Encoded Attachments6.6. TNEF Attachments6.7. Wireless Carriers Hate You6.8. Character Sets and Encodings6.9. Recognizing Your Users6.10. Unit Testing
7. Remote Services
7.1. Remote Services Club7.2. Sockets7.3. Using HTTP7.3.1. The HTTP Request and Response Cycle7.3.2. HTTP Authentication7.3.3. Making an HTTP Request7.4. Remote Services Redundancy7.5. Asynchronous Systems7.6. Exchanging XML7.6.1. Parsing XML7.6.2. REST7.6.3. XML-RPC7.6.4. SOAP7.7. Lightweight Protocols7.7.1. Memory Usage7.7.2. Network Speed7.7.3. Parsing Speed7.7.4. Writing Speed7.7.5. Downsides7.7.6. Rolling Your Own
8. Bottlenecks
8.1. Identifying Bottlenecks8.1.1. Application Areas by Software Component8.1.2. Application Areas by Hardware Component8.1.3. CPU Usage8.1.4. Code Profiling8.1.5. Opcode Caching8.1.6. Speeding Up Templates8.1.7. General Solutions8.1.8. I/O8.1.9. Disk I/O8.1.10. Network I/O8.1.11. Memory I/O8.1.12. Memory and Swap8.2. External Services and Black Boxes8.2.1. Databases8.2.2. Query Spot Checks8.2.3. Query Profiling8.2.4. Query and Index Optimization8.2.5. Caching8.2.6. Denormalization

9. Scaling Web Applications
9.1. The Scaling Myth9.1.1. What Is Scalability?9.1.2. Scaling a Hardware Platform9.1.3. Vertical Scaling9.1.4. Horizontal Scaling9.1.5. Ongoing Work9.1.6. Redundancy9.2. Scaling the Network9.2.1. Scaling PHP9.3. Load Balancing9.3.1. Load Balancing with Hardware9.3.2. Load Balancing with Software9.3.3. Layer 49.3.4. Layer 79.3.5. Huge-Scale Balancing9.3.6. Balancing Non-HTTP Traffic9.4. Scaling MySQL9.4.1. Storage Backends9.5. MyISAM9.5.1. InnoDB9.5.2. BDB9.5.3. Heap9.6. MySQL Replication9.6.1. Master-Slave Replication9.6.2. Tree Replication9.6.3. Master-Master Replication9.6.4. Replication Failure9.6.5. Replication Lag9.7. Database Partitioning9.7.1. Clustering9.7.2. Federation9.8. Scaling Large Database9.9. Scaling Storage9.9.1. Filesystems9.9.2. Protocols9.9.3. RAID9.9.4. Federation9.9.5. Caching9.9.6. Caching Data9.9.7. Caching HTTP Requests9.9.8. Scaling in a Nutshell
10. Statistics, Monitoring, and Alerting
10.1. Tracking Web Statistics10.1.1. Server Logfiles10.1.2. Analysis10.1.3. Using Beacons10.1.4. Spread10.1.5. Load Balancers10.1.6. Tracking Custom Metrics10.2. Application Monitoring10.2.1. Bandwidth Monitoring10.2.2. Long-Term System Statistics10.2.2.1. MySQL statistics10.2.2.2. Apache statistics10.2.2.3. memcached statistics10.2.2.4. Squid statistics10.2.3. Custom Visualizations10.3. Alerting10.3.1. Uptime Checks10.3.2. Resource-Level Monitoring10.3.3. Threshold Checks10.3.4. Low-Watermark Checks
11. APIs
11.1. Data Feeds11.1.1. RSS11.1.2. RDF11.1.3. Atom11.1.4. The Others11.1.5. Feed Auto-Discovery11.1.6. Feed Templating11.1.7. OPML11.1.8. Feed Authentication11.2. Mobile Content11.2.1. The Wireless Application Protocol (WAP)11.2.2. XHTML Mobile Profile11.3. Web Services11.4. API Transports11.4.1. REST11.4.2. XML-RPC11.4.3. SOAP11.4.4. Transport Abstraction11.5. API Abuse11.5.1. Monitoring with API Keys11.5.2. Throttling11.5.3. Caching11.6. Authentication11.6.1. None at All11.6.2. Plain Text11.6.3. Message Authentication Code (MAC)11.6.4. Token-Based Systems11.7. The Future
Index
About the Author
Colophon
Copyright

Content preview from Building Scalable Web Sites

Chapter 4. i18n, L10n, and Unicode

Internationalization, localization, and Unicode are all hot topics in the field of modern web application development. If you build and launch an application without support for multiple languages, you’re going to be missing out on a huge portion of your possible user base. Current research suggests that there are about 510 million English-speaking people in the world. If your application only caters to English speakers, you’ve immediately blocked 92 percent of your potential global audience. These numbers are actually wildly inaccurate and generally used as a scare tactic; you have to consider how many of the world’s six billion or so population is online to begin with. But even once we factor this in, we are still left with 64 percent of online users (around 680 million people) who don’t speak English (these statistics come from the global reach web site: http://global-reach.biz/). That’s still a huge number of potential users you’re blocking from using your application.

Addressing this problem has historically been a huge deal. Developers would need advanced knowledge of character sets and text processing, language-dependent data would need to be stored separately, and data from one group of users could not be shared with another. But in a world where the Internet is becoming more globally ubiquitous, these problems needed solving. The solutions that were finally reached cut out a lot of the hard work for developers—it’s now almost trivially ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 0596102356Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Building Scalable Web Sites

by Cal Henderson

Chapter 4. i18n, L10n, and Unicode

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.