collate
The objective file-comparison engine for the Print With Synergy stack.
collate answers one question — “do these two files match, and where exactly do they differ?” — and answers it with measured facts, never a verdict. Its first capability is plate ↔ 1-up comparison: align a set of decoded separation plates (1-bit TIFF / Esko LEN) against an approved 1-up PDF and report the per-ink coverage and geometry differences, which separations are missing or extra, and (optionally) a per-ink visual difference image and an AI visual note.
Where collate sits in the stack
Section titled “Where collate sits in the stack”The Print With Synergy engines split by who owns what:
| Engine | Owns |
|---|---|
| codex | Extraction — single-file facts (per-separation coverage, screen ruling/angle, Pantone, dieline…). |
| collate | Comparison — two-file difference facts (coverage/geometry deltas, presence, diff images). |
| lint | Policy — rules + verdicts (LPDF_PLATE_CMP_*, pass/fail, tolerances). |
| lens | Display — the visual inspection UX. |
collate is an objective-layer engine: it states the numbers. Whether a
3-point coverage delta or a 1.5 mm geometry shift is acceptable is a tolerance
decision, and tolerances are policy — they live in lint, not here. collate
ships no coverage_mismatch flag, no overall_match roll-up, and no
tolerance constant. It measures; lint judges.
collate owns no raster primitives of its own — it reuses
codex-pdf for plate decode, the
Ghostscript tiffsep separation render, ink normalization, and the Pantone
catalogue. codex stays the extraction layer; collate is comparison built on top.
| Route | Purpose |
|---|---|
GET /healthz | Liveness + capability flags (Ghostscript availability, codex version). |
GET /readyz | Readiness (the PDF render path degrades gracefully without Ghostscript). |
GET /v1/contract | Engine / version / schema / routes / capabilities. |
POST /v1/compare/plates | Compare a plate set (1+ TIFF/LEN files) against a 1-up PDF → CollateCompareResult. |
POST /v1/compare/documents | Compare 1+ candidate PDFs against one reference PDF (single 1-up or stepped/gang sheet) → CollateDocumentCompareResult. |
POST /v1/compare/plates is multipart/form-data:
plates— one or more separation files (repeat the field per file).pdf— the approved 1-up PDF.page(default1),dpi(default150),ai(defaultfalse),diff_images(defaultfalse).
curl -sS https://<collate-host>/v1/compare/plates \ -F dpi=150 -F diff_images=trueThe response is neutral comparison facts — see
CollateCompareResult. Errors are RFC 7807
Problem Details (application/problem+json).
When Ghostscript is unavailable the PDF side self-skips: the result carries
the plate-side facts, pdf_rendered: false, and a note. A consumer must not read
that as a clean compare — lint floors it to INCONCLUSIVE.
Global vision document compare
Section titled “Global vision document compare”POST /v1/compare/documents is collate’s “global vision” compare: it takes
one or more candidate PDFs and one reference PDF — and the reference may
be a single 1-up or a stepped/gang/imposed sheet — auto-aligns the inks, and
reports the per-ink coverage / geometry differences + structural presence, plus a
RAW per-candidate match score. Real-world shape: a customer supplies file(s);
we compare them to an approved proof to find where they match and where they
differ — PDF↔PDF, and N candidate PDFs ↔ one stepped/gang reference (each
candidate is mapped to the reference instance/region it best matches, surfaced as
mapped_instance_index).
It is multipart/form-data:
candidates— one or more candidate PDFs (repeat the field per file).reference— the approved reference PDF (single 1-up or stepped/gang sheet).page(default1),dpi(default150),ai(defaultfalse),diff_images(defaultfalse), and optionalexpected_n_across/expected_n_down(echoed into the layout notes — the matches-expected verdict is lint’s).
curl -sS https://<collate-host>/v1/compare/documents \ -F dpi=150 -F diff_images=trueThe response is a
CollateDocumentCompareResult: per-candidate
measured differences + a match_score. The match_score is a raw similarity
in [0, 1], not a verdict — 1 − mean(|coverage_delta|/100) over the shared
inks minus a small penalty per missing/extra separation, clamped. Whether a given
score is acceptable is lint’s policy call. When Ghostscript is unavailable the
reference side self-skips (reference_rendered: false + candidate-side facts
only + a note) exactly as the plate path does.
Library use
Section titled “Library use”The same comparison runs in process via the client (no HTTP), which is how lint consumes collate when they share a host:
from collate.client import CollateClient
client = CollateClient() # in-process; or CollateClient(base_url="https://…") for HTTPresult = client.compare_plates( [(cyan_bytes, "cyan.tif"), (black_bytes, "black.tif")], pdf_bytes, dpi=150, diff_images=True,)for ink in result.inks: print(ink.ink_name, ink.presence, ink.coverage_delta_percent)
# Document ↔ document ("global vision") — N candidates vs one reference sheet:doc = client.compare_documents( [(job_a_bytes, "job-a.pdf"), (job_b_bytes, "job-b.pdf")], reference_bytes, dpi=150,)for match in doc.candidates: print(match.candidate_name, match.mapped_instance_index, match.match_score)Local dev
Section titled “Local dev”The distribution is published as collate-pdf (the bare collate name is
taken on PyPI), matching the codex-pdf family; the import package is collate.
pip install -e . pytest httpx ruff # pulls codex-pdf from the indexruff check src testspytest # gs-free (the PDF side is monkeypatched)uvicorn collate.api.main:app --reload --port 8080The comparison’s PDF-render path needs Ghostscript (gs) at runtime; the
test suite is gs-free, but a real POST /v1/compare/plates against a PDF needs
it installed (the Docker image ships it).
License
Section titled “License”AGPL-3.0-or-later, matching the engine stack.