Skip to content

Changelog

All notable changes to collate are documented here. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

  • zero-touch async compare APIPOST /v1/compare/jobs (202 → {job_id}) + GET /v1/compare/jobs/{job_id}. Submit a document compare and poll or receive a webhook (callback_url, optionally HMAC-signed via COLLATE_WEBHOOK_SECRET) instead of holding a long request open — the hands-off, API-first path. Retries are safe: an Idempotency-Key header replays the same job, and identical inputs + params resolve to a content-addressed cache (SHA-256 of the bytes + params + engine + schema versions, so a release rotates it — mirroring codex). New collate/jobs.py (in-memory JobStore + background run_job, bounded by COLLATE_MAX_JOBS); new CollateJobResponse model; /v1/contract advertises async_jobs. Jobs run in-process off the event loop; a failed compare is recorded as an error job (never a 500). In-memory only for now — durable/multi-replica storage (Redis, as codex uses) and a synergy node-type are documented follow-ups, as is async plates (jobs currently accept kind='documents').
  • composite (“as a whole”) difference regions alongside the per-separation ones. With emit_regions=true, the document compare now also builds one “total ink” raster per side (the MAX ink intensity across separations — reusing the separation PNGs collate already holds, no extra render), registers the two composites, and extracts difference regions in composite space (space="composite", ink_name=None) — the whole-artwork comparison that pairs with the per-plate/separation regions to satisfy “plates vs separations AND as a whole”. New composite_difference_regions in compare/registered.py; a composite-labelled CollateRegistration is included. Verdict-free and self-skipping on codex-pdf < 1.43.0, same as the separation path.
  • registered difference regions for document compare (emit_regions). The document compare can now report where a candidate differs from the reference, per separation — not just scalar coverage deltas. With emit_regions=true, collate composes the two new codex primitives (codex_pdf.align registration + codex_pdf.diff difference regions, codex ≥1.43.0): each shared separation is spatially registered (FFT phase correlation) and then diffed in the registered frame, so a global shift is corrected rather than reported as content. New neutral models CollateRegistration (per-ink pixel shift + confidence) and CollateDifferenceRegion (bbox in mm + area + peak/mean delta + kind added/removed/changed), surfaced as additive optional registrations / difference_regions on CollateDocumentMatch (no COMPARE_SCHEMA_VERSION bump). Logic lives in compare/registered.py; the API route + CollateClient gained the emit_regions flag. Still verdict-free — magnitudes only; the noise floor (sensitivity) is a measurement-resolution knob, not a tolerance; lint owns the defect verdict. Self-skips (empty + a note, never raises) on codex-pdf < 1.43.0. Regions run on whole-reference compares; an instance-mapped candidate self-notes (per-instance crop is a follow-up).
  • Floor-pin codex-pdf>=1.43.0 (was >=1.37.0) for the align / diff primitives the registered compare consumes.
  • collision-free multi-up assignment for document compare. When several candidate 1-ups are compared against ONE stepped/gang reference, each candidate is now mapped to a distinct reference instance via a one-to-one assignment (compare/assignment.py), so two different candidates can no longer collide on the same cell — the “multiple 1-ups ↔ one multi-up” case. The solver is pure and deterministic (greedy on descending coverage-similarity with a stable tie-break), and keeps the non-destructive auto-detect guard via a per-candidate eligibility floor (an auto-detected instance only wins when it beats matching the whole sheet; an operator-declared gang is trusted). Single-candidate compares are unchanged. Still verdict-free — the assignment is a structural FACT and match_score a raw similarity; lint owns acceptability.
  • per-instance consistency is scored against a population consensus, not an arbitrary anchor. consistency_check anchored every instance to window[0], so when the first step was the defective cell it reported the wrong outlier_instance_index (a good cell) and inverted the per-instance consistency_score. Now it picks the medoid window (highest summed cross-correlation to all others) as the reference and scores every window against it; the cheap coverage tier compares against the per-ink median. The lowest-scoring window is the outlier — correct even when instance 0 is the bad one. Neutral facts only; output keys/shape unchanged.
  • uniform-solid step cells no longer read as inconsistent. _xcorr_peak returned 0.0 for any constant-solid crop, so two genuinely-identical 100%-inked cells that differed by one pixel of crop width (normal pitch rounding) were flagged inconsistent with a spurious outlier. A constant-vs-constant pair now resolves by equality — 1.0 when both are the same uniform value, 0.0 only when one is constant and the other is not.
  • plate compare now detects a stepped/gang grid by default. The plate rasters that drive the stepped layer (detect_repeat + per-instance compare) were only decoded when the ai or diff_images lane was requested — so a plain plate compare silently never detected the gang (repeat/instances stayed empty unless you also asked for AI). Decode the rasters unconditionally; the stepped layer already self-skips when there’s no repeating grid, so a single-1-up plate compare is unaffected bar the cheap decode. Gang detection is a core capability, not an AI add-on.
  • POST /v1/compare/plates now accepts expected_n_across / expected_n_down — the operator-declared step-and-repeat grid for a gang/stepped plate set, the same hint the document compare already takes. collate echoes it into the repeat facts (the detected-vs-expected verdict stays lint’s). The core compare_plates_to_pdf and the CollateClient already supported the params; the HTTP route was the missing link (it silently dropped them), so the synergy gateway’s hint never reached the engine. Plate-compare and document-compare are now at parity on the operator-layout input.
  • document compare: an auto-detected repeat no longer breaks a 1-up↔1-up compare. codex’s detect_repeat can false-positive a single 1-up as a stepped sheet (a carton’s symmetric panels read as a vertical step), which split the reference into half-page cells and compared a full candidate against a crop — wrecking the deltas and the match_score (an identical file scored ~0.62). The candidate→region mapper now seeds with the whole reference page for AUTO-detected layouts: an instance wins only if it strictly beats the whole-sheet match, so a false repeat is harmless while a real gang still maps per-instance. An operator-declared gang (expected_n_across / expected_n_down) keeps trusting instance mapping. The detection fact is still surfaced in reference_repeat either way — collate states the geometry, lint judges.

collate’s second comparison capability — the “global vision” document compare (POST /v1/compare/documents). It compares 1+ candidate PDFs against one reference PDF (a single 1-up OR a stepped/gang/imposed sheet), auto-aligns the inks, and reports neutral measured differences + a RAW per-candidate match score. Same objective-layer discipline as the plate compare: collate MEASURES, lint JUDGES — no verdict, no tolerance constant, no *_mismatch flag, and the match_score is documented explicitly as a raw similarity, not a pass/fail.

  • POST /v1/compare/documents — multipart candidates (1+ PDFs) + reference (one PDF) + page / dpi / ai / diff_images / expected_n_across / expected_n_down form fields. Returns a CollateDocumentCompareResult. Validates ≥1 candidate + a reference, caps the candidate fan-out + total upload size, runs the CPU work via asyncio.to_thread, and emits RFC 7807 Problem Details on bad input.
  • CollateDocInkDelta — one ink’s candidate↔reference FACT with neutral candidate/reference naming (presence ∈ {both, candidate_only, reference_only}, signed coverage_delta_percent = candidate − reference, geometry_delta_mm, Pantone match, optional AI note). No *_mismatch flag.
  • CollateDocumentMatch — one candidate’s result: candidate_index / candidate_name, mapped_instance_index (which reference instance/region the candidate aligned to for a stepped/gang reference; None for a single 1-up), the per-ink inks, inks_in_both, a RAW match_score (0..1 — 1 − mean(|coverage_delta|/100) over shared inks minus a small per-presence- mismatch penalty, clamped; NOT a pass/fail), and optional diff_images.
  • CollateDocumentCompareResultcandidates, reference_separations, reference_rendered (false when gs unavailable → candidate-side facts only + a note; consumers must not read absence as a clean compare), reference_repeat
    • reference_instances + layout_source (the reference sheet’s detected step-and-repeat / gang layout, empty for a single 1-up), ai_used, notes.
  • collate.compare.documents module (compare_documents + helpers). It renders both sides through the existing codex tiffsep seam (pdf_separation_coverage), reuses codex’s detect_repeat to find the reference layout (wrapping the reference’s rendered separation PNGs into bool rasters for codex’s own detector), and maps each candidate to its best-matching reference instance by coverage similarity. collate still owns no raster code. Self-skips (never raises) on missing Ghostscript.
  • CollateClient.compare_documents([...], reference_bytes, ...) — HTTP-first with the in-process fallback, mirroring compare_plates.
  • COMPARE_SCHEMA_VERSION bumped 1.0.01.1.0 (new top-level document- compare shape; the plate-compare result is unchanged).
  • Honest limits recorded in notes: no rotation/scale invariance; the gang/stepped candidate→instance mapping is best-effort coverage matching; a low-confidence layout is relayed (codex’s confidence) and left to the consumer.

The stepped / gang (step-and-repeat) compare layer, ported faithfully from codex and kept 100% neutral: it adds grid geometry + per-instance difference facts while introducing no tolerance, no *_mismatch status, and no overall_match — the objective layer stays verdict-free.

  • Step-and-repeat auto-detect: with no layout hint, collate detects a repeating grid from the decoded plate rasters (reusing codex’s detect_repeat — a single-plate fact codex owns) and reports the grid as a neutral CollateRepeat (counts, pitch, gutter, offset, work-and-turn flip, codex’s confidence, source). Each detected cell becomes a CollateInstance with per-ink CollateInkDeltas (presence + raw coverage / geometry deltas — no thresholding).
  • Cross-instance consistency (instances_consistent + outlier_instance_index + per-instance consistency_score): a tiered coverage-vector / hash / cross-correlation measure of whether every step is the same artwork. It is a measurement, not a verdict — whether a flagged outlier is a defect is lint’s call.
  • Gang / operator layouts: operator_regions (or a parsed CIP3/JDF imposition_layout) supply authoritative region rects, yielding per-region instances with layout_source in {operator, cip3, jdf}; the gang region-to-pdf mapping (pdf_bytes_by_name) selects which 1-up each region compares against. The all-instances-identical check is skipped (gang regions are intentionally distinct).
  • CollateRepeat / CollateInstance neutral-facts models and the additive CollateCompareResult fields (repeat, instances, instances_consistent, outlier_instance_index, layout_source). These are additive optional fieldsCOMPARE_SCHEMA_VERSION stays 1.0.0.
  • New collate.compare.imposition module (windows_from_step / windows_from_regions / compare_instances / consistency_check + InstanceWindow). It reuses codex raster primitives only (detect_repeat, _coverage_facts, the decoded PlateRaster bitmaps) — collate still owns no raster code.
  • compare_plates_to_pdf gains additive kwargs imposition_layout, operator_regions, pdf_bytes_by_name, expected_n_across, expected_n_down. The stepped/gang layer runs in a try/except: when no layout applies (or it fails), the result is the byte-identical single-1-up aggregate with the new fields empty/None.

The first cut of collate, the objective file-comparison engine — split out of codex so the comparison verdict leaves the extraction layer and the objective layer stays verdict-free.

  • Plate ↔ 1-up comparison (POST /v1/compare/plates): align a plate set (1+ TIFF/LEN separation files) against an approved 1-up PDF and report neutral comparison FACTS — per-ink signed coverage delta, geometry delta, structural presence (both / plate_only / pdf_only), Pantone match, and inks_in_both.
  • Optional per-ink difference images (diff_images=true) and an optional AI visual-diff note (ai=true, gated by CODEX_AI_ENABLED).
  • CollateInkDelta / CollateDiffImage / CollateCompareResult neutral-facts contract (COMPARE_SCHEMA_VERSION = 1.0.0), with no tolerance / verdict surface — pass/fail is lint’s policy call.
  • Ops endpoints: /healthz, /readyz, /v1/contract. RFC 7807 Problem Details error envelope.
  • CollateClient — HTTP-first with an in-process fallback (decode via codex + compare in process).
  • Reuses codex-pdf (>= 1.37.0) for plate decode, the Ghostscript tiffsep render, ink normalization, and the Pantone catalogue. collate adds no raster primitives of its own.
  • Self-skip on missing Ghostscript (plate-side facts + pdf_rendered=false + a note), never an exception.