How It Works

Content identity

Every file is a sequence of bytes. C4 computes a SHA-512 hash of those bytes and encodes it in base58 format. The result is a C4 ID — a permanent, universal name for that content.

c45xZeXwKjQ2nMrBz7L1pYvWqRtN8sHfJd3g6CmAeU9kXoP4bG5hT0iVlDaSwFuO7yE

Same bytes, same ID. Always. On any machine, in any language, forever.

This is not a new idea — git uses content hashing for commits, IPFS uses it for blocks, package managers use it for checksums. C4 applies it to files and directories in a way that’s human-readable and Unix-composable.

The c4m format

A c4m file is one line per entry. Each line contains:

permissions  timestamp  size  name  c4id

For example:

-rw-r--r-- 2026-03-20T12:00:00Z   8192 README.md  c45xZeXwKjQ2...
-rwxr-xr-x 2026-03-20T12:00:00Z  41023 build.sh   c43zYcLnRtP8...
drwxr-xr-x 2026-03-20T12:00:00Z    ... src/       c47mNqPvWxY1...

That’s it. Plain text, one entry per line. You can:

Read it — it’s a text file
Diff it — diff old.c4m new.c4m works, c4 diff is smarter
Grep it — grep '.exr' delivery.c4m to find all EXR files
Pipe it — c4 id ./footage/ | sort -k4 to sort by filename
Email it — it’s small, even for millions of files
Version it — commit it to git, it’s just text

A c4m file fully describes a directory tree — a complete filesystem description without the file contents. And the text format costs almost nothing: compressed c4m files are within 2% of a purpose-built binary format, because the SHA-512 IDs are genuinely high-entropy and dominate the file size regardless of encoding. The text format is not a compromise — it’s effectively free.

Directory entries

Directories get their own C4 IDs, computed from their contents. If any file inside a directory changes, the directory’s ID changes too. This gives you Merkle tree verification for free — verify the root ID and you’ve verified everything beneath it.

The content store

When you use c4, file contents are stored in a local content-addressed store by default — a directory where files are named by their C4 IDs. This gives you:

Automatic deduplication — identical files are stored once, regardless of how many directories contain them
Instant verification — if the file exists in the store under a given ID, it’s correct by definition
Shared storage — multiple projects can share the same store

The store is just a directory of files. No database, no daemon, no lock files. Any tool that understands C4 IDs can read from it.

How the pieces fit together

The C4 ecosystem is a set of tools that all speak the same language: c4m files and C4 IDs.

Tool	Role
c4	Creates c4m files, diffs them, produces patches
c4sh	Mounts c4m files as virtual directories in your shell
c4py	Pure Python library for C4 identification, manifests, and diffing
c4ts	TypeScript / JavaScript library for browsers and Node.js
c4-swift	Native Swift library for Apple platforms
c4git	Tracks large files in git by C4 ID
libc4	C library for embedding C4 in any application

Because the format is plain text, you don’t need any of these tools to read a c4m file. They exist to make creation, verification, and manipulation fast and convenient.

Unix composability

c4 is designed in the Unix tradition. Nearly every c4 command produces c4m output, which means commands compose naturally:

c4 path/ — scan a directory, output c4m, store content
c4 path/ | c4 — scan, then identify and store the c4m itself, returning its C4 ID
c4 diff a.c4m b.c4m | c4 paths — diff two states, extract just the paths that changed
c4 intersect id a.c4m b.c4m | c4 paths — find content shared between two projects

Because c4 stores content by default, a simple pipeline like c4 workspace/ both captures the c4m description and stores every file’s content — one command, ready for distribution. The ergonomic display (c4 cat -e) reformats timestamps, adds comma-separated sizes, and pushes C4 IDs to the right margin for readability, while the canonical form stays machine-parseable.

Why SHA-512?

SHA-512 was chosen for permanence. It’s well-studied, widely implemented, and has a 512-bit output that provides collision resistance far beyond practical concern. A C4 ID computed today will still be valid and verifiable decades from now.

See the FAQ for more on the cryptographic choices.