How It Works
Content identity
Every file is a sequence of bytes. C4 computes a SHA-512 hash of those bytes and encodes it in base58 format. The result is a C4 ID — a permanent, universal name for that content.
c45xZeXwKjQ2nMrBz7L1pYvWqRtN8sHfJd3g6CmAeU9kXoP4bG5hT0iVlDaSwFuO7yE
Same bytes, same ID. Always. On any machine, in any language, forever.
This is not a new idea — git uses content hashing for commits, IPFS uses it for blocks, package managers use it for checksums. C4 applies it to files and directories in a way that’s human-readable and Unix-composable.
The c4m format
A c4m file is one line per entry. Each line contains:
permissions timestamp size name c4id
For example:
-rw-r--r-- 2026-03-20T12:00:00Z 8192 README.md c45xZeXwKjQ2...
-rwxr-xr-x 2026-03-20T12:00:00Z 41023 build.sh c43zYcLnRtP8...
drwxr-xr-x 2026-03-20T12:00:00Z ... src/ c47mNqPvWxY1...
That’s it. Plain text, one entry per line. You can:
- Read it — it’s a text file
- Diff it —
diff old.c4m new.c4mworks,c4 diffis smarter - Grep it —
grep '.exr' delivery.c4mto find all EXR files - Pipe it —
c4 id ./footage/ | sort -k4to sort by filename - Email it — it’s small, even for millions of files
- Version it — commit it to git, it’s just text
A c4m file fully describes a directory tree — a complete filesystem description without the file contents. And the text format costs almost nothing: compressed c4m files are within 2% of a purpose-built binary format, because the SHA-512 IDs are genuinely high-entropy and dominate the file size regardless of encoding. The text format is not a compromise — it’s effectively free.
Directory entries
Directories get their own C4 IDs, computed from their contents. If any file inside a directory changes, the directory’s ID changes too. This gives you Merkle tree verification for free — verify the root ID and you’ve verified everything beneath it.
The content store
When you use c4 id, the tool can optionally store file contents in a local content-addressed store — a directory where files are named by their C4 IDs. This gives you:
- Automatic deduplication — identical files are stored once, regardless of how many directories contain them
- Instant verification — if the file exists in the store under a given ID, it’s correct by definition
- Shared storage — multiple projects can share the same store
The store is just a directory of files. No database, no daemon, no lock files. Any tool that understands C4 IDs can read from it.
How the pieces fit together
The C4 ecosystem is a set of tools that all speak the same language: c4m files and C4 IDs.
| Tool | Role |
|---|---|
| c4 | Creates c4m files, diffs them, produces patches |
| c4sh | Mounts c4m files as virtual directories in your shell |
| c4py | Pure Python library for C4 identification, manifests, and diffing |
| c4git | Tracks large files in git by C4 ID |
| libc4 | C library for embedding C4 in any application |
Because the format is plain text, you don’t need any of these tools to read a c4m file. They exist to make creation, verification, and manipulation fast and convenient.
Why SHA-512?
SHA-512 was chosen for permanence. It’s well-studied, widely implemented, and has a 512-bit output that provides collision resistance far beyond practical concern. A C4 ID computed today will still be valid and verifiable decades from now.
See the FAQ for more on the cryptographic choices.