Preflight capability map
LintPDF preflight capability map (engine)
Section titled “LintPDF preflight capability map (engine)”This document maps what the lint-pdf/ engine can detect today to the major preflight “surfaces” (print production, packaging, accessibility/PDF-UA, and AI regulatory checks), and to the builtin profiles that enable them.
Profiles (builtin)
Section titled “Profiles (builtin)”lintpdf-default.json- Enabled:
LPDF_*,PDFX4-*,PDFX1A-*,PDFA-*,AI_* - Conformance target:
pdfx4 - Notes: AI enabled (
categories=["all"])
- Enabled:
lintpdf-strict.json- Enabled:
LPDF_*,PDFX4-* - Conformance target:
pdfx4 - Notes: stricter DPI + small-text thresholds; AI disabled (no
AI_*)
- Enabled:
lintpdf-advisory-only.json- Enabled:
LPDF_*(withmax_severity="advisory") - Conformance target: none
- Notes: good for “non-blocking” runs; explicitly disables
PDFX4-*
- Enabled:
Engine pipeline (what runs)
Section titled “Engine pipeline (what runs)”The main pipeline is PreflightOrchestrator.run() in src/lintpdf/profiles/orchestrator.py:
- Parse PDF (
pikepdf) - Build semantic model (pages, resources, fonts, images, boxes)
- Interpret content streams into events (text/image/path paint events)
- Run engine analyzers (deterministic
LPDF_*) - Run built-in conformance validators (PDF/X, PDF/A variants) where configured
- Run veraPDF conformance (PDF/X, PDF/A, PDF/UA) when configured and opted-in
- Run OCR text-region pass (best-effort; enables outlined-text heuristics)
- Run AI analyzers (if profile enables
AI_*andprofile.ai.enabled) - Filter/override findings per profile rules; enrich bboxes from events
Detection domains (non-AI, LPDF_*)
Section titled “Detection domains (non-AI, LPDF_*)”Print production & structural
Section titled “Print production & structural”- Page boxes / bleed / safety / dimensions:
PageGeometryAnalyzer- IDs:
LPDF_BOX_*(incl.LPDF_BOX_005,LPDF_BOX_006,LPDF_BOX_010)
- IDs:
- Fonts:
FontAnalyzer- IDs:
LPDF_FONT_*
- IDs:
- Images / effective DPI / compression:
ImageAnalyzer- IDs:
LPDF_IMG_*
- IDs:
- Transparency / blend-space:
TransparencyAnalyzer- IDs:
LPDF_TRANS_*
- IDs:
- Overprint:
OverprintAnalyzer- IDs:
LPDF_OVER_*
- IDs:
- ICC profiles / output intent:
IccProfileAnalyzer- IDs:
LPDF_ICC_*
- IDs:
- Metadata / XMP / language:
MetadataAnalyzer- IDs:
LPDF_META_*,LPDF_LANG_*,LPDF_XMP_*
- IDs:
- Structure / encryption / interactive features:
StructureAnalyzer,DocumentAnalyzer,AnnotationAnalyzer- IDs:
LPDF_STRUCT_*,LPDF_DOC_*,LPDF_ANNOT_*(varies by module)
- IDs:
Hairlines, strokes, and legibility
Section titled “Hairlines, strokes, and legibility”- HairlineAnalyzer: detects strokes and text rendered as paths that are too thin to reproduce reliably on press. Walks the full content stream (CTM, ExtGState, color state) to compute effective line widths and font sizes after all transformations.
- IDs:
LPDF_STROKE_001(hairline stroke),LPDF_STROKE_002(very thin stroke),LPDF_PATH_001/LPDF_PATH_002(thin path fill),LPDF_TEXT_001/LPDF_TEXT_002/LPDF_TEXT_003(thin text stroke / small text)
- IDs:
- LegibilityCompositeAnalyzer: catches small outlined text (text rendered in stroke-only mode, e.g., converted to outlines at a tiny point size).
- ID:
LPDF_TEXT_OUTLINED_SMALL
- ID:
Codex path note (0.1.0b23+): these checks previously produced zero findings on the codex path because the event stream was empty.
codex_adapter_events.pynow walks pages with pikepdf and emits realPathPaintingEvent/TextRenderedEventobjects so all hairline and legibility checks fire correctly.
Color, ink coverage, and prepress heuristics
Section titled “Color, ink coverage, and prepress heuristics”- Ink coverage:
InkCoverageAnalyzer- IDs:
LPDF_INK_*
- IDs:
- Spot colors:
SpotColorAnalyzer+ spot-name analyzers- IDs:
LPDF_SPOT_*,LPDF_SPOT_NAME_*
- IDs:
- Process/pure-K/rich-black classification:
ColorAnalyzer(IDs:LPDF_COLOR_009,LPDF_COLOR_010, …)AdvancedColorAnalyzer(IDs:LPDF_ADV_*incl.LPDF_ADV_005)
Packaging-specific
Section titled “Packaging-specific”- PackagingAnalyzer (only conditionally added when profile id contains
"packaging")- IDs:
LPDF_PKG_*
- IDs:
- Dieline family:
DielineIso19593Analyzer,DielinePerfIndicatorAnalyzer, plus orchestrator dieline detection attachment- IDs:
AI_DIE_*(AI) andLPDF_*packaging geometry checks depending on analyzer
- IDs:
- Seal zone / keepout:
SealZoneKeepoutAnalyzer- ID:
LPDF_BOX_SEAL_ZONE_VIOLATION
- ID:
Barcodes
Section titled “Barcodes”- BarcodeAnalyzer
- 1D candidate detection + grading:
LPDF_BARCODE_001–013,019–031 - 2D fill-grid heuristics:
LPDF_BARCODE_014–018
- 1D candidate detection + grading:
Conformance (PDF/X, PDF/A, PDF/UA)
Section titled “Conformance (PDF/X, PDF/A, PDF/UA)”- Built-in validators (engine-side) run based on
profile.conformance:pdfx4,pdfx1a,pdfx3,pdfa1b/2b/3b
- veraPDF runner (
src/lintpdf/conformance/verapdf_runner.py) runs when configured and emits:LPDF_PDFX_CONF(PDF/X)LPDF_PDFA_CONF(PDF/A)LPDF_UA_CONF(PDF/UA-1; only when profile opts intoLPDF_UA_*)
AI analyzers (AI_*)
Section titled “AI analyzers (AI_*)”AI analyzers are registered in src/lintpdf/ai/** and run only when the profile enables AI.
Common families:
- Regulatory compliance:
AI_EU1169_*,AI_PHARMA_*,AI_FDA_*,AI_GHS_*,AI_COSM_*,LPDF_TOBACCO_*(implemented as AI analyzer) - Accessibility (contrast):
AI_WCAG_*
Notable gaps / footguns to watch
Section titled “Notable gaps / footguns to watch”- “Enabled pattern” != “actually runs”: some analyzers are conditionally added (e.g. packaging analyzer depends on profile id, not checks pattern).
- veraPDF “silent skip”: when veraPDF is unreachable, findings are suppressed (good for resilience, but you need metadata to know it didn’t run).
- Outlined-text / hidden-layer text: hairline and legibility checks now work on the codex path (0.1.0b23+). The
content_streamstring-presence check (LPDF_BOX_004) was tightened in 0.1.0b22 to require four independent structural signals before flagging a page as empty.