book

The Tangled Web

Name: The Tangled Web
Author: Michal Zalewski
ISBN: 9781593273880

by Michal Zalewski

November 2011

Intermediate to advanced

320 pages

10h 18m

English

No Starch Press

Read now

Unlock full access

The Tangled Web
PRAISE FOR THE TANGLED WEB
PRAISE FOR SILENCE ON THE WIRE BY MICHAL ZALEWSKI
Preface
Acknowledgments
1. Security in the World of Web Applications
Information Security in a NutshellFlirting with Formal SolutionsEnter Risk ManagementEnlightenment Through TaxonomyToward Practical Approaches
A Brief History of the Web
Tales of the Stone Age: 1945 to 1994The First Browser Wars: 1995 to 1999The Boring Period: 2000 to 2003Web 2.0 and the Second Browser Wars: 2004 and Beyond
The Evolution of a Threat
The User as a Security FlawThe Cloud, or the Joys of Communal LivingNonconvergence of VisionsCross-Browser Interactions: Synergy in FailureThe Breakdown of the Client-Server Divide
Global browser market share, May 2011
I. Anatomy of the Web
2. It Starts with a URL
Uniform Resource Locator StructureScheme NameIndicator of a Hierarchical URLCredentials to Access the ResourceServer AddressServer PortHierarchical File PathQuery StringFragment IDPutting It All Together Again
Reserved Characters and Percent Encoding
Handling of Non-US-ASCII Text

Common URL Schemes and Their Function
Browser-Supported, Document-Fetching ProtocolsProtocols Claimed by Third-Party Applications and Plug-insNonencapsulating Pseudo-ProtocolsEncapsulating Pseudo-ProtocolsClosing Note on Scheme Detection
Resolution of Relative URLs
3. Hypertext Transfer Protocol
Basic Syntax of HTTP TrafficThe Consequences of Supporting HTTP/0.9Newline Handling QuirksProxy RequestsResolution of Duplicate or Conflicting HeadersSemicolon-Delimited Header ValuesHeader Character Set and Encoding SchemesReferer Header Behavior
HTTP Request Types
GETPOSTHEADOPTIONSPUTDELETETRACECONNECTOther HTTP Methods
Server Response Codes
200-299: Success300-399: Redirection and Other Status Messages400-499: Client-Side Error500-599: Server-Side ErrorConsistency of HTTP Code Signaling
Keepalive Sessions
Chunked Data Transfers
Caching Behavior
HTTP Cookie Semantics
HTTP Authentication
Protocol-Level Encryption and Client Certificates
Extended Validation CertificatesError-Handling Rules
4. Hypertext Markup Language
Basic Concepts Behind HTML DocumentsDocument Parsing ModesThe Battle over Semantics
Understanding HTML Parser Behavior
Interactions Between Multiple TagsExplicit and Implicit ConditionalsHTML Parsing Survival Tips
Entity Encoding
HTTP/HTML Integration Semantics
Hyperlinking and Content Inclusion
Plain LinksForms and Form-Triggered RequestsFramesType-Specific Content InclusionA Note on Cross-Site Request Forgery
5. Cascading Style Sheets
Basic CSS SyntaxProperty Definitions@ Directives and XBL BindingsInteractions with HTML
Parser Resynchronization Risks
Character Encoding
6. Browser-Side Scripts
Basic Characteristics of JavaScriptScript Processing ModelParsingFunction ResolutionCode ExecutionExecution Ordering ControlCode and Object Inspection CapabilitiesModifying the Runtime EnvironmentOverriding Built-InsSetters and GettersImpact on Potential Uses of the LanguageJavaScript Object Notation and Other Data SerializationsE4X and Other Syntax Extensions
Standard Object Hierarchy
The Document Object ModelAccess to Other Documents
Script Character Encoding
Code Inclusion Modes and Nesting Risks
The Living Dead: Visual Basic
7. Non-HTML Document Types
Plaintext Files
Bitmap Images
Audio and Video
XML-Based Documents
Generic XML ViewScalable Vector GraphicsMathematical Markup LanguageXML User Interface LanguageWireless Markup LanguageRSS and Atom Feeds
A Note on Nonrenderable File Types
8. Content Rendering with Browser Plug-ins
Invoking a Plug-inThe Perils of Plug-in Content-Type Handling
Document Rendering Helpers
Plug-in-Based Application Frameworks
Adobe FlashProperties of ActionScriptMicrosoft SilverlightSun JavaXML Browser Applications (XBAP)
ActiveX Controls
Living with Other Plug-ins
II. Browser Security Features
9. Content Isolation Logic
Same-Origin Policy for the Document Object Modeldocument.domainpostMessage(...)Interactions with Browser Credentials
Same-Origin Policy for XMLHttpRequest
Same-Origin Policy for Web Storage
Security Policy for Cookies
Impact of Cookies on the Same-Origin PolicyProblems with Domain RestrictionsThe Unusual Danger of “localhost”Cookies and “Legitimate” DNS Hijacking
Plug-in Security Rules
Adobe FlashMarkup-Level Security ControlsSecurity.allowDomain(...)Cross-Domain Policy FilesPolicy File Spoofing RisksMicrosoft SilverlightJava
Coping with Ambiguous or Unexpected Origins
IP AddressesHostnames with Extra PeriodsNon-Fully Qualified HostnamesLocal FilesPseudo-URLsBrowser Extensions and UI
Other Uses of Origins
10. Origin Inheritance
Origin Inheritance for about:blank
Inheritance for data: URLs
Inheritance for javascript: and vbscript: URLs
A Note on Restricted Pseudo-URLs
11. Life Outside Same-Origin Rules
Window and Frame InteractionsChanging the Location of Existing DocumentsFrame Hijacking RisksFrame Descendant Policy and Cross-Domain CommunicationsUnsolicited FramingBeyond the Threat of a Single Click
Cross-Domain Content Inclusion
A Note on Cross-Origin Subresources
Privacy-Related Side Channels
Other SOP Loopholes and Their Uses
12. Other Security Boundaries
Navigation to Sensitive Schemes
Access to Internal Networks
Prohibited Ports
Limitations on Third-Party Cookies
13. Content Recognition Mechanisms
Document Type Detection LogicMalformed MIME TypesSpecial Content-Type ValuesUnrecognized Content TypeDefensive Uses of Content-DispositionContent Directives on SubresourcesDownloaded Files and Other Non-HTTP Content
Character Set Handling
Byte Order MarksCharacter Set Inheritance and OverrideMarkup-Controlled Charset on SubresourcesDetection for Non-HTTP Files
14. Dealing with Rogue Scripts
Denial-of-Service AttacksExecution Time and Memory Use RestrictionsConnection LimitsPop-Up FilteringDialog Use Restrictions
Window-Positioning and Appearance Problems
Timing Attacks on User Interfaces
15. Extrinsic Site Privileges
Browser- and Plug-in-Managed Site PermissionsHardcoded Domains
Form-Based Password Managers
Internet Explorer’s Zone Model
Mark of the Web and Zone.Identifier
III. A Glimpse of Things to Come
16. New and Upcoming Security Features
Security Model Extension FrameworksCross-Domain RequestsCORS Request TypesSecurity Checks for Simple RequestsNon-simple Requests and PreflightCurrent Status of CORSXDomainRequestOther Uses of the Origin Header
Security Model Restriction Frameworks
Content Security PolicyPrimary CSP DirectivesPolicy ViolationsCriticisms of CSPSandboxed FramesScripting, Forms, and NavigationSynthetic OriginsStrict Transport SecurityPrivate Browsing Modes
Other Developments
In-Browser HTML SanitizersXSS Filtering
17. Other Browser Mechanisms of Note
URL- and Protocol-Level Proposals
Content-Level Features
I/O Interfaces
18. Common Web Vulnerabilities
Vulnerabilities Specific to Web Applications
Problems to Keep in Mind in Web Application Design
Common Problems Unique to Server-Side Code
A. Epilogue
Notes
Chapter 1Page 19Chapter 2Chapter 3Chapter 4Chapter 5Chapter 6Chapter 7Chapter 8Chapter 9Chapter 10Chapter 11Chapter 12Chapter 13Chapter 14Chapter 15Chapter 16Chapter 17
Index
About the Author
UPDATES

Content preview from The Tangled Web

Character Set Handling

Document type detection is one of the more important pieces of the content-processing puzzle, but it is certainly not the only one. For all types of text-based files rendered in the browser, one more determination needs to be made: The appropriate character set transformation must be identified and applied to the input stream. The output encoding sought by the browser is typically UTF-8 or UTF-16; the input, on the other hand, is up to the author of the page.

In the simplest scenario, the appropriate encoding method will be provided by the server in a charset parameter of the Content-Type header. In the case of HTML documents, the same information may also be conveyed to some extent through the <meta> directive. (The browser ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781593273880Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

The Tangled Web

by Michal Zalewski

Character Set Handling

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.