39: Spatial Predicate Pushdown: Why GeoParquet Reads Less

Share

Most people think cloud-native GIS is fast because of better hardware. It’s actually faster because it reads less data. Spatial predicate pushdown is how.

Predicate pushdown means filtering happens at the storage layer, not after data loads into memory. A traditional workflow reads the entire file, then filters (“which features intersect this bounding box?”). Cloud-native pushes that filter down: the database asks GeoParquet “which row groups have bounding boxes overlapping my query geometry?” and skips the rest entirely.

GeoParquet stores bounding box metadata in the file footer. Every row group (a chunk of ~128MB) has min/max bounds recorded. Query engine: “I need features in this region.” GeoParquet: “Row group 3 and 7 overlap. Row groups 1, 2, 4-6 don’t.” Engine reads only 3 and 7. The remaining 80% of the file never touches disk or memory.

This works only if data is spatially sorted or clustered. A randomly shuffled GeoParquet file has every row group’s bounding box covering the entire extent—pushdown becomes useless. Spatial clustering concentrates nearby features in the same row groups, making pushdown effective.

Engines like DuckDB, Apache Sedona, and GeoPandas (via PyArrow) leverage this automatically. Query a 100GB GeoParquet for a small study area? It might read 2GB. Without pushdown, you’d read all 100GB, transfer costs explode, and memory pressure cripples the query.

The rule: Sort spatial data by geography before saving to Parquet. Z-order curves or spatial clustering matter more than file compression. Pushdown doesn’t fix bad data layout. Query engines can’t optimize what they can’t skip.

38: Cloud-Native GIS: Architecture, Not Just Location

Prev
Comments
Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Get every update, in your inbox.
Get every update, in your inbox.
Get every update, in your inbox.
One tip, every day
Get every update, in your inbox.
Subscribe below and join 11,000+ others learning modern GIS.