#75: Loading a Terabyte Into Memory Just to Filter 99% of It Is an Architectural Failure

Share

Predicate pushdown is the difference between a sub-second query and an out-of-memory crash. It means filtering spatial data at the storage layer—before it ever reaches your compute engine—using bounding box metadata to skip irrelevant chunks entirely.

Filtering at disk beats filtering in memory. Traditional workflows load a dataset, then filter: “Give me all features in this region.” The entire dataset—geometry, attributes, everything—crosses the network into RAM. Then the query engine applies the spatial filter and discards 99%. You paid to move and store terabytes of garbage.

Predicate pushdown inverts this. The query engine reads file metadata (Parquet min/max bounding boxes) and asks: “Which chunks contain geometries in my target region?” Only those chunks are read and transferred. Irrelevant data never leaves disk.

Bounding boxes are the mechanism. GeoParquet files store min/max coordinates for every chunk. A query for “features in Seattle” uses the metadata to skip chunks whose bounding boxes don’t overlap Seattle. One operation skips 99% of the data.

Modern query engines (Spark, Trino, DuckDB) push predicates automatically—if your data is structured to support it. Read the execution plan. If you see a spatial filter happening after a full table scan, pushdown failed. You’re paying full network and memory costs.

Poorly formatted files bypass pushdown entirely. Unsorted geometries mean every chunk’s bounding box spans the entire dataset. No filtering possible.

The rule: Filter at the disk; compute in memory. If your spatial filter isn’t pushed down, you’re paying to network data you’ll immediately discard. A spatial query without predicate pushdown is just an expensive full table scan.

#74: Executing SELECT * on a Wide Spatial Table Bankrupts Your Cloud Budget

Prev
Comments
Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Get every update, in your inbox.
Get every update, in your inbox.
Get every update, in your inbox.
One tip, every day
Get every update, in your inbox.
Subscribe below and join 11,000+ others learning modern GIS.