c4py — Python Library
c4py is a pure Python implementation of C4 content identification. It scans directories, creates and diffs c4m manifests, verifies trees, and manages a content-addressed store — all without requiring a running daemon.
Install
pip install c4py
Quick start
Identify a file
import c4py
c4id = c4py.identify_file("path/to/file.mov")
print(c4id) # c45xZeXwKjQ2...
Scan a directory
import c4py
manifest = c4py.scan("/projects/experiment-042")
print(manifest.summary())
# 1,247 files, 38 directories, 14.2 GB total, 1,201 unique C4 IDs
Load and save c4m files
import c4py
manifest = c4py.scan("./footage/")
c4py.dump(manifest, "footage.c4m")
# Later, load it back
manifest = c4py.load("footage.c4m")
print(manifest.summary())
Diff two manifests
import c4py
old = c4py.load("dataset-v1.c4m")
new = c4py.load("dataset-v2.c4m")
result = c4py.diff(old, new)
print(f"Added: {len(result.added)}")
print(f"Removed: {len(result.removed)}")
print(f"Modified: {len(result.modified)}")
Verify a directory against a manifest
import c4py
report = c4py.verify_tree("delivery.c4m", "/mnt/incoming/vendor_A/")
print(f"{len(report.ok)} files OK")
print(f"{len(report.missing)} missing")
print(f"{len(report.corrupt)} corrupt")
if report.is_ok:
print("Delivery verified.")
Use cases
ML experiment tracking
import c4py
# Snapshot training data before each run
manifest = c4py.scan("./training_data/")
c4py.dump(manifest, "training-data-v1.c4m")
# Later, verify nothing has changed
report = c4py.verify_tree("training-data-v1.c4m", "./training_data/")
if not report.is_ok:
print(f"{len(report.corrupt)} files changed, {len(report.missing)} missing")
Data pipeline verification
import c4py
# Record input state
input_manifest = c4py.scan("./input/")
c4py.dump(input_manifest, "pipeline-input.c4m")
# Run pipeline
run_pipeline()
# Record output state and diff against expected
output_manifest = c4py.scan("./output/")
expected = c4py.load("expected-output.c4m")
result = c4py.diff(expected, output_manifest)
if result.is_empty:
print("Pipeline output matches expected.")
else:
print(f"{len(result.modified)} files differ")
Find duplicates
import c4py
manifest = c4py.scan("/projects/deliverables/")
dupes = manifest.duplicates()
for c4id, paths in dupes.items():
print(f"{c4id}: {len(paths)} copies")
for p in paths:
print(f" {p}")
Content store
import c4py
store = c4py.open_store()
# Store content
with open("render.exr", "rb") as f:
c4id = store.put(f)
# Retrieve content
content = store.get(c4id)
Interoperability
c4py produces the same C4 IDs and c4m files as the c4 CLI, c4sh, and every other tool in the ecosystem. A manifest created by the CLI can be loaded by c4py, and vice versa. The c4m format is the common language — all tools read and write it.