Skip to content

collate

The objective file-comparison engine for the Print With Synergy stack.

collate answers one question — “do these two files match, and where exactly do they differ?” — and answers it with measured facts, never a verdict. Its first capability is plate ↔ 1-up comparison: align a set of decoded separation plates (1-bit TIFF / Esko LEN) against an approved 1-up PDF and report the per-ink coverage and geometry differences, which separations are missing or extra, and (optionally) a per-ink visual difference image and an AI visual note.

The Print With Synergy engines split by who owns what:

EngineOwns
codexExtraction — single-file facts (per-separation coverage, screen ruling/angle, Pantone, dieline…).
collateComparison — two-file difference facts (coverage/geometry deltas, presence, diff images).
lintPolicy — rules + verdicts (LPDF_PLATE_CMP_*, pass/fail, tolerances).
lensDisplay — the visual inspection UX.

collate is an objective-layer engine: it states the numbers. Whether a 3-point coverage delta or a 1.5 mm geometry shift is acceptable is a tolerance decision, and tolerances are policy — they live in lint, not here. collate ships no coverage_mismatch flag, no overall_match roll-up, and no tolerance constant. It measures; lint judges.

collate owns no raster primitives of its own — it reuses codex-pdf for plate decode, the Ghostscript tiffsep separation render, ink normalization, and the Pantone catalogue. codex stays the extraction layer; collate is comparison built on top.

RoutePurpose
GET /healthzLiveness + capability flags (Ghostscript availability, codex version).
GET /readyzReadiness (the PDF render path degrades gracefully without Ghostscript).
GET /v1/contractEngine / version / schema / routes / capabilities.
POST /v1/compare/platesCompare a plate set (1+ TIFF/LEN files) against a 1-up PDF → CollateCompareResult.
POST /v1/compare/documentsCompare 1+ candidate PDFs against one reference PDF (single 1-up or stepped/gang sheet) → CollateDocumentCompareResult.

POST /v1/compare/plates is multipart/form-data:

  • plates — one or more separation files (repeat the field per file).
  • pdf — the approved 1-up PDF.
  • page (default 1), dpi (default 150), ai (default false), diff_images (default false).
Terminal window
curl -sS https://<collate-host>/v1/compare/plates \
-F dpi=150 -F diff_images=true

The response is neutral comparison facts — see CollateCompareResult. Errors are RFC 7807 Problem Details (application/problem+json).

When Ghostscript is unavailable the PDF side self-skips: the result carries the plate-side facts, pdf_rendered: false, and a note. A consumer must not read that as a clean compare — lint floors it to INCONCLUSIVE.

POST /v1/compare/documents is collate’s “global vision” compare: it takes one or more candidate PDFs and one reference PDF — and the reference may be a single 1-up or a stepped/gang/imposed sheet — auto-aligns the inks, and reports the per-ink coverage / geometry differences + structural presence, plus a RAW per-candidate match score. Real-world shape: a customer supplies file(s); we compare them to an approved proof to find where they match and where they differ — PDF↔PDF, and N candidate PDFs ↔ one stepped/gang reference (each candidate is mapped to the reference instance/region it best matches, surfaced as mapped_instance_index).

It is multipart/form-data:

  • candidates — one or more candidate PDFs (repeat the field per file).
  • reference — the approved reference PDF (single 1-up or stepped/gang sheet).
  • page (default 1), dpi (default 150), ai (default false), diff_images (default false), and optional expected_n_across / expected_n_down (echoed into the layout notes — the matches-expected verdict is lint’s).
Terminal window
curl -sS https://<collate-host>/v1/compare/documents \
-F dpi=150 -F diff_images=true

The response is a CollateDocumentCompareResult: per-candidate measured differences + a match_score. The match_score is a raw similarity in [0, 1], not a verdict1 − mean(|coverage_delta|/100) over the shared inks minus a small penalty per missing/extra separation, clamped. Whether a given score is acceptable is lint’s policy call. When Ghostscript is unavailable the reference side self-skips (reference_rendered: false + candidate-side facts only + a note) exactly as the plate path does.

The same comparison runs in process via the client (no HTTP), which is how lint consumes collate when they share a host:

from collate.client import CollateClient
client = CollateClient() # in-process; or CollateClient(base_url="https://…") for HTTP
result = client.compare_plates(
[(cyan_bytes, "cyan.tif"), (black_bytes, "black.tif")],
pdf_bytes,
dpi=150,
diff_images=True,
)
for ink in result.inks:
print(ink.ink_name, ink.presence, ink.coverage_delta_percent)
# Document ↔ document ("global vision") — N candidates vs one reference sheet:
doc = client.compare_documents(
[(job_a_bytes, "job-a.pdf"), (job_b_bytes, "job-b.pdf")],
reference_bytes,
dpi=150,
)
for match in doc.candidates:
print(match.candidate_name, match.mapped_instance_index, match.match_score)

The distribution is published as collate-pdf (the bare collate name is taken on PyPI), matching the codex-pdf family; the import package is collate.

Terminal window
pip install -e . pytest httpx ruff # pulls codex-pdf from the index
ruff check src tests
pytest # gs-free (the PDF side is monkeypatched)
uvicorn collate.api.main:app --reload --port 8080

The comparison’s PDF-render path needs Ghostscript (gs) at runtime; the test suite is gs-free, but a real POST /v1/compare/plates against a PDF needs it installed (the Docker image ships it).

AGPL-3.0-or-later, matching the engine stack.