Chapter 32. Filter Early
Have you ever moved? I mean, packed up all your stuff, put it in a truck, and driven it halfway across the country? Seems like every time I’ve done it, I spend time in my new place throwing stuff away as I unpack it. It’s crazy. If I had just thrown that stuff away before I moved it, I could have bought fewer boxes and saved time packing and unpacking. Maybe I could have rented a smaller truck.
That’s what happened to Nancy. She had been transporting every name in her company’s vendor table across town and throwing away everything that wasn’t Xerox. All this time, the guys in a room a few miles away were scratching their heads, wondering if they needed a bigger truck.
To filter early is to discard unwanted material as early in a process as possible. Whenever I see someone like Jeff make a database query run 100,000 times faster, it’s almost always because he improved the data access algorithm to filter earlier. Here’s an example of how he does it. Here’s before:
-
# Baseline: algorithm is slow because it filters late
-
For each row R (1,000,000 rows):
-
Calculate some result S = f (R).
-
If R matches our predicate (10 rows), then add S to the result set.
And here’s after:
-
# After improving: algorithm is 100,000× faster because it filters early
-
For each row R that matches our predicate (10 rows):
-
Add S = f (R) to the result set.
The baseline algorithm will execute f a million times. The improved algorithm will execute f ten times. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access