Uncategorized

#72: Stop Parsing Text. Start Reading Typed Geometries.

March 27, 2026

1 min read

CSV files are a trap for spatial workflows. No CRS. No schema enforcement. No spatial indexes. Every read forces expensive text parsing that fails silently.

A CSV has no concept of a Coordinate Reference System. You export latitude and longitude as columns, losing the CRS entirely. The downstream pipeline guesses: “probably EPSG:4326?” If it guesses wrong, every distance calculation is wrong. Every buffer is wrong. The errors compound invisibly.

Text parsing is brutally expensive. A CSV coordinate is a string: "-122.456789". Reading it requires parsing text to float, then reconstructing geometry from Well-Known Text. A binary format like GeoParquet stores coordinates as IEEE 754 doubles—just copy them into memory. CSV parsing burns CPU cycles on every read. At scale, this cost explodes.

Schema enforcement is missing. A CSV has no metadata about what columns mean. Is longitude really longitude, or is it stored in the opposite order? Are missing values NULL or just blank? Downstream systems guess. Silent truncation happens. Features vanish.

Spatial queries require full scans. A CSV has no bounding box metadata, no spatial index. “Find features in this region” reads every row. No predicate pushdown. No optimization. A modern format stores min/max bounds and enables the database to skip 99% of the data.

Use CSV only for simple one-off exchanges of point data with non-technical users. For production pipelines, use GeoParquet or FlatGeobuf. Schema-enforced. CRS-preserved. Indexed. Metadata-rich.

The rule: If your pipeline guesses the CRS from a column header, your architecture is fragile. Reconstructing WKT from CSV wastes more compute than the actual spatial join.

Uncategorized

Updated on Mar 13, 2026

#71: Treating GeoParquet as a Universal Database Replacement Cripples Transactional Workflows

#73: Treating Metadata as an Afterthought Turns Your Data Lake Into an Unsearchable Graveyard

Comments

Add a comment

#74: Executing SELECT * on a Wide Spatial Table Bankrupts Your Cloud Budget

Column pruning is the difference between a query that costs cents and one that costs hundreds of dollars. In…

Matt Forrest

March 29, 2026

Uncategorized

#73: Treating Metadata as an Afterthought Turns Your Data Lake Into an Unsearchable Graveyard

Metadata is not documentation. It’s operational infrastructure. A dataset without machine-readable…

Matt Forrest

March 28, 2026

Uncategorized

#1: When to Use GIS, When to Use Spatial Data Science

Most people treat GIS and spatial data science as the same thing. They’re not. GIS answers where and what.…

Matt Forrest

January 1, 2026

#74: Executing SELECT * on a Wide Spatial Table Bankrupts Your Cloud Budget

#73: Treating Metadata as an Afterthought Turns Your Data Lake Into an Unsearchable Graveyard