book

The Art of SQL

by Stephane Faroult, Peter Robson

March 2006

Intermediate to advanced

367 pages

11h 20m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Dedication
Preface
Why Another SQL Book?AudienceAssumptions This Book MakesContents of This BookConventions Used in This BookUsing Code ExamplesComments and QuestionsSafari® EnabledAcknowledgments
1. Laying Plans
1.1. The Relational View of Data1.2. The Importance of Being Normal1.2.1. Step 1: Ensure Atomicity1.2.2. Step 2: Check Dependence on the Whole Key1.2.3. Step 3: Check Attribute Independence1.3. To Be or Not to Be, or to Be Null1.4. Qualifying Boolean Columns1.5. Understanding Subtypes1.6. Stating the Obvious1.7. The Dangers of Excess Flexibility1.8. The Difficulties of Historical Data1.9. Design and Performance1.10. Processing Flow1.11. Centralizing Your Data1.12. System Complexity1.13. The Completed Plans
2. Waging War
2.1. Query Identification2.2. Stable Database Connections2.3. Strategy Before Tactics2.4. Problem Definition Before Solution2.5. Stable Database Schema2.6. Operations Against Actual Data2.7. Set Processing in SQL2.8. Action-Packed SQL Statements2.9. Profitable Database Accesses2.10. Closeness to the DBMS Kernel2.11. Doing Only What Is Required2.12. SQL Statements Mirror Business Logic2.13. Program Logic into Queries2.14. Multiple Updates at Once2.15. Careful Use of User-Written Functions2.16. Succinct SQL2.17. Offensive Coding with SQL2.18. Discerning Use of Exceptions
3. Tactical Dispositions
3.1. The Identification of “Entry Points”3.2. Indexes and Content Lists3.3. Making Indexes Work3.4. Indexes with Functions and Conversions3.5. Indexes and Foreign Keys3.6. Multiple Indexing of the Same Columns3.7. System-Generated Keys3.8. Variability of Index Accesses
4. Maneuvering
4.1. The Nature of SQL4.1.1. SQL and Databases4.1.2. SQL and the Optimizer4.1.3. Limits of the Optimizer4.2. Five Factors Governing the Art of SQL4.2.1. Total Quantity of Data4.2.2. Criteria Defining the Result Set4.2.3. Size of the Result Set4.2.4. Number of Tables4.2.4.1. Joins4.2.4.2. Complex queries and complex views4.2.5. Number of Other Users4.3. Filtering4.3.1. Meaning of Filtering Conditions4.3.2. Evaluation of Filtering Conditions4.3.2.1. Buyers of Batmobiles4.3.2.2. More Batmobile purchases4.3.2.3. Lessons to be learned from the Batmobile trade4.3.3. Querying Large Quantities of Data4.3.4. The Proportions of Retrieved Data
5. Terrain
5.1. Structural Types5.2. The Conflicting Goals5.3. Considering Indexes as Data Repositories5.4. Forcing Row Ordering5.5. Automatically Grouping Data5.5.1. Round-Robin Partitioning5.5.2. Data-Driven Partitioning5.6. The Double-Edged Sword of Partitioning5.7. Partitioning and Data Distribution5.8. The Best Way to Partition Data5.9. Pre-Joining Tables5.10. Holy Simplicity
6. The Nine Situations
6.1. Small Result Set, Direct Specific Criteria6.1.1. Index Usability6.1.2. Query Efficiency and Index Usage6.1.3. Data Dispersion6.1.4. Criterion Indexability6.2. Small Result Set, Indirect Criteria6.3. Small Intersection of Broad Criteria6.4. Small Intersection, Indirect Broad Criteria6.5. Large Result Set6.6. Self-Joins on One Table6.7. Result Set Obtained by Aggregation6.8. Simple or Range Searching on Dates6.8.1. Many Items, Few Historical Values6.8.1.1. Using subqueries6.8.1.2. Using OLAP functions6.8.2. Many Historical Values Per Item6.8.3. Current Values6.9. Result Set Predicated on Absence of Data
7. Variations in Tactics
7.1. Tree Structures7.1.1. Tree Structures Versus Master/Detail Relationships7.1.2. Practical Examples of Hierarchies7.2. Representing Trees in an SQL Database7.3. Practical Implementation of Trees7.3.1. Adjacency Model7.3.2. Materialized Path Model7.3.3. Nested Sets Model (After Celko)7.4. Walking a Tree with SQL7.4.1. Top-Down Walk: The Vandamme Query7.4.1.1. Adjacency model7.4.1.2. Materialized path model7.4.1.3. Nested sets model7.4.1.4. Comparing the Vandamme query under the various models7.4.2. Bottom-Up Walk: The Highlanders Query7.4.2.1. Adjacency model7.4.2.2. Materialized path model7.4.2.3. Nested sets model7.4.2.4. Comparing the various models for the Highlanders query7.5. Aggregating Values from Trees7.5.1. Aggregation of Values Stored in Leaf Nodes7.5.1.1. Modeling head counts7.5.1.2. Computing head counts at every level7.5.2. Propagation of Percentages Across Different Levels
8. Weaknesses and Strengths
8.1. Deceiving Criteria8.2. Abstract Layers8.3. Distributed Systems8.4. Dynamically Defined Search Criteria8.4.1. Designing a Simple Movie Database and the Main Query8.4.2. Right-Sizing Queries8.4.3. Wrapping SQL in PHP

9. Multiple Fronts
9.1. The Database Engine as a Service Provider9.1.1. The Virtues of Indexes9.1.2. A Just-So Story9.1.3. Get in Line9.2. Concurrent Data Changes9.2.1. Locking9.2.1.1. Locking granularity9.2.1.2. Lock handling9.2.1.3. Locking and committing9.2.1.4. Locking and scalability9.2.2. Contention9.2.2.1. Insertion and contention9.2.2.2. DBA solutions9.2.2.3. Architectural solutions9.2.2.4. Development solutions9.2.2.5. Results
10. Assembly of Forces
10.1. Increasing Volumes10.1.1. Sensitivity of Operations to Volume Increases10.1.1.1. Insensitivity to volume increase10.1.1.2. Linear sensitivity to volume increases10.1.1.3. Non-linear sensitivity to volume increases10.1.1.4. Putting it all together10.1.1.5. Disentangling subqueries10.1.2. Partitioning to the Rescue10.1.3. Data Purges10.2. Data Warehousing10.2.1. Facts and Dimensions: the Star Schema10.2.2. Query Tools10.2.3. Extraction, Transformation, and Loading10.2.3.1. Data extraction10.2.3.2. Transformation10.2.3.3. Loading10.2.3.4. Integrity constraints and indexes10.2.4. Querying Dimensions and Facts: Ad Hoc Reports10.2.4.1. The star transformation10.2.4.2. Emulating the star transformation10.2.4.3. Querying a star schema the way it is not intended to be queried10.2.5. A (Strong) Word of Caution
11. Stratagems
11.1. Turning Data Around11.1.1. Rows That Should Have Been Columns11.1.2. Columns That Should Have Been Rows11.1.2.1. Creating a pivot table11.1.2.2. Multiplying rows with a pivot table11.1.2.3. Using pivot table values11.1.2.4. The pivot and unpivot operators11.1.3. Single Columns That Should Have Been Something Else11.1.3.1. First normal form on the fly11.1.3.2. Lifting the veil on the Chapter 7 mystery path explosion11.2. Querying with a Variable in List11.3. Aggregating by Range (Bands)11.4. Superseding a General Case11.5. Selecting Rows That Match Several Items in a List11.6. Finding the Best Match11.7. Optimizer Directives
12. Employment of Spies
12.1. The Database Is Slow12.2. The Components of Server Load12.3. Defining Good Performance12.3.1. Knowing What You Spend12.3.2. Knowing What You Get12.3.3. Checking Against Acknowledged Standards12.3.4. Defining Performance Goals12.4. Thinking in Business Tasks12.5. Execution Plans12.5.1. Identifying the Fastest Execution Plan12.5.1.1. Our contestants12.5.1.2. Our battle field12.5.1.3. And the winner is.. .12.5.2. Forcing the Right Execution Plan12.5.2.1. A stubborn query12.5.2.2. Study of search criteria12.5.2.3. A moral to the story12.6. Using Execution Plans Properly12.6.1. How Not to Execute a Query12.6.2. Hidden Complexity12.7. What Really Matters?
PHOTO CREDITS
About the Authors
About the Author
Copyright

Content preview from The Art of SQL

Preface

There used to be a time when what is known today as “Information Technology” or IT was less glamorously known as “Electronic Data Processing.” And the truth is that for all the buzz about trendy techniques, the processing of data is still at the core of our systems—and all the more as the volume of data under management seems to be increasing even faster than the speed of processors. The most vital corporate data is today stored in databases and accessed through the imperfect, but widely known, SQL language—a combination that had begun to gain acceptance in the pinstriped circles at the beginning of the 1980s and has since wiped out the competition.

You can hardly interview a young developer today who doesn’t claim a good working knowledge of SQL, the lingua franca of database access, a standard part of any basic IT course. This claim is usually reasonably true, if you define knowledge as the ability to obtain, after some effort, functionally correct results. However, enterprises all over the world are today confronted with exploding volumes of data. As a result, “functionally correct” results are no longer enough: they also have to be fast. Database performance has become a major headache in many companies. Interestingly, although everyone agrees that the source of performance issues lies in the code, it seems accepted everywhere that the first concern of developers should be to provide code that works—which seems to be a reasonable expectation. The thought seems to be that ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 0596008945Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

The Art of SQL

by Stephane Faroult, Peter Robson

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.