#41: Compression Is a Performance Dial, Not Just Storage Savings

Share

Most teams default to Snappy compression for GeoParquet and wonder why their egress bills are massive. Compression choice depends on how you’ll access the data—speed or size.

Columnar storage makes compression brutal. A column of 10 million identical geometries (WKB binary) compresses far better than interleaved JSON. A column of 10 million repetitive attributes (state codes, land use types) compresses even harder. Grouping similar data enables algorithms to find patterns.

Snappy is the default codec for a reason: fast decompression, decent compression (4-6x typical). Use it for hot data—dashboards, frequent queries, interactive analysis. You trade storage for latency. A 10GB dataset compresses to 2GB, reads quickly.

ZSTD is the modern standard for archival. Tunable compression levels (1-22) give 10-20x ratios on geometry-heavy data. ZSTD level 3-6 is practical: high compression, acceptable decompression speed. Geometry columns (WKB) benefit specifically—binary patterns compress aggressively under ZSTD. A 100GB raster archive becomes 5GB.

LZ4 is for extreme latency sensitivity—interactive dashboards where every millisecond matters. Compression ratio suffers, but decompression is instantaneous.

Dictionary encoding handles attributes automatically. Repetitive values (state names, zoning codes) become integers—massive compression without codec overhead.

The rule: Default to Snappy for active data lakes. Switch to ZSTD for archives and cold storage. Use LZ4 only if latency outweighs cost. Pair codec choice with spatial sorting—unsorted data defeats compression entirely.

Whenever you’re ready, here are 4 ways I can help you grow in GIS & spatial data:

​Spatial Lab​ – My private community where GIS professionals, data engineers, and analysts connect, swap workflows, and build repeatable systems together.

​Modern GIS Accelerator​ – A guided program to help you break out of legacy GIS habits and learn modern, scalable workflows.

​Career Compass​ – A career-focused program designed to help GIS pros navigate the job market, sharpen their pitch, and find roles beyond traditional GIS paths.

​Sponsorship​: Interested in sponsoring this newsletter (or other content)? ​Learn more here​ and fill out the form to get in touch!

40: Row Groups Determine Query Speed Before You Write the Query

Prev

#42: The Fastest Queries Are Engineered During the Write Process

Next
Comments
Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Get every update, in your inbox.
Get every update, in your inbox.
Get every update, in your inbox.
One tip, every day
Get every update, in your inbox.
Subscribe below and join 11,000+ others learning modern GIS.