#54: NetCDF Assumes You Have a File System. Zarr Assumes You Have a URL

Share

NetCDF and Zarr handle identical data. They’re optimized for different infrastructure. Moving between them is a one-way trip—forward to the cloud, backward to the archive.

NetCDF is monolithic. All metadata and data live in a single binary file (built on HDF5). Reading a variable requires parsing a complex binary header to find byte offsets. Writing is serialized—the file pointer moves sequentially. Multiple writers need coordination (locks, checksums). This was designed for POSIX file systems with strong consistency guarantees. On S3 with high latency? Slow.

Zarr distributes data as objects. Each chunk is a separate file: temp/0/0/0.bin, temp/0/0/1.bin… Metadata is human-readable JSON (.zmetadata). There’s no global lock. Five hundred Lambda functions can write five hundred different chunks simultaneously without knowing each other exists. S3 handles atomicity per-object.

The cloud penalty on NetCDF is real. Metadata overhead, locking bottlenecks, and byte-offset calculations create latency. Zarr maps chunk indices directly to storage keys—raw HTTP GET.

The trade-off: NetCDF is better for archival. One file is easier to cite, checksum, and move. Zarr is better for cloud computation (Pangeo, Dask). If you’re analyzing data, convert to Zarr first.

Kerchunk bridges the gap—it generates Zarr-like metadata for NetCDF files without duplicating data, enabling fast cloud reads of legacy archives.

The rule: Archive in NetCDF for history and portability. Convert to Zarr for speed. Kerchunk for reading old data fast.

Whenever you’re ready, here are 4 ways I can help you grow in GIS & spatial data:

​Spatial Lab​ – My private community where GIS professionals, data engineers, and analysts connect, swap workflows, and build repeatable systems together.

​Modern GIS Accelerator​ – A guided program to help you break out of legacy GIS habits and learn modern, scalable workflows.

​Career Compass​ – A career-focused program designed to help GIS pros navigate the job market, sharpen their pitch, and find roles beyond traditional GIS paths.

​Sponsorship​: Interested in sponsoring this newsletter (or other content)? ​Learn more here​ and fill out the form to get in touch!

#53: There Is No Correct Chunk Shape, Only the Shape That Matches Your Query

Prev

#55: Distributed Data Is Fast. Distributed Metadata Is Slow.

Next
Comments
Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Get every update, in your inbox.
Get every update, in your inbox.
Get every update, in your inbox.
One tip, every day
Get every update, in your inbox.
Subscribe below and join 11,000+ others learning modern GIS.