Most data engineers don’t understand why Parquet is faster than CSV. The answer: Parquet reads what you ask for. CSV reads everything, then filters.
Parquet is columnar storage. Instead of storing rows (feature 1: all attributes, feature 2: all attributes), it stores columns (all geometry values, all owner names, all zoning codes). This matters because queries almost never need every column. A vegetation index calculation needs only coordinates and reflectance bands. An ownership lookup needs only ID and owner name.
With CSV or GeoJSON, reading 1% of columns requires reading 100% of the file. With Parquet, you read 1% of the file. Bandwidth drops. Speed increases.
Compression amplifies this. Storing the same type together (millions of integers, millions of dates) compresses aggressively. RLE (run-length encoding) collapses repeated values. Dictionary encoding turns repeated strings into integers. Parquet files are often 5-10x smaller than CSV equivalents.
Predicate pushdown is the second optimization. Parquet file footers contain min/max statistics for every column in every chunk. Query “parcels where owner=’Smith'”? Parquet skips chunks where the owner column min/max doesn’t include ‘Smith’. You avoid reading data you don’t need.
The trade-off: Parquet is expensive to write (compression is CPU-intensive). But you write once, read many times. In data lakes, this pays back instantly. GeoParquet extends this to spatial data, storing geometry efficiently and enabling spatial filtering at read time.
The rule: Use Parquet for analytical workloads, data lakes, and archives. Skip it for transactional systems where you insert and update individual rows constantly. If you query subsets of columns, columnar storage is mandatory, not optional.
Spatial Lab – My private community where GIS professionals, data engineers, and analysts connect, swap workflows, and build repeatable systems together.
Modern GIS Accelerator – A guided program to help you break out of legacy GIS habits and learn modern, scalable workflows.
Career Compass – A career-focused program designed to help GIS pros navigate the job market, sharpen their pitch, and find roles beyond traditional GIS paths.
Sponsorship: Interested in sponsoring this newsletter (or other content)? Learn more here and fill out the form to get in touch!


