Skip to content
AutoResearch
StaleKeptMedium bandFamily Manager

Flyer OCR pre-pass before extraction

Baseline
71%
Final
89%
Delta
+18 pts
Variants
3
Objective

What we set out to improve

Reduce missed dates on scanned camp flyers by adding an OCR pre-pass before structured extraction.

KeptPromoted to a templateWrote to a KB

Kept. The high-DPI OCR pre-pass lifted date recall from 0.71 to 0.89 at a medium resource cost, and the heuristic was promoted to the family-documents knowledge base.

Iterations

Variants we tried

Each variant and its coarse objective metric. The kept variant is marked; bars are relative to the best run.

  • 1Baseline — extraction onlyLow71%
  • 2Variant A — OCR pre-pass, default DPIMedium83%
  • 3Variant B — OCR pre-pass, high DPI + deskewWinnerMedium89%
Run

Stages

  1. baseline

    Succeeded · 4.2s

  2. variant run

    Succeeded · 9.8s

  3. eval

    Succeeded · 1.5s

  4. promote

    Succeeded · 300ms

Output

Artifacts and what shipped

Redaction-safe artifact previews, diffs, metric tables, and prompt variants with sensitive text removed.

  • Metric table

    Date-recall by variant (baseline → 0.89)

  • Diff summary

    Pipeline diff: insert OCR pre-pass stage

  • KB write

    Promoted OCR heuristic to the family-docs KB

What you can see, and what is hidden

Every projection on this page is redaction-safe by construction. Redaction level: Sample content, curated, public-safe excerpts only.

Shown

  • Identifiers & counts
  • Closed-enum statuses
  • Coarse quality / resource bands
  • Timestamps & freshness

Intentionally hidden

  • Raw prompts
  • Raw documents
  • raw tool log
  • Raw trace spans
  • Embedding vectors
  • Free-text feedback
  • Auth internals & secrets
  • Secrets

Related in the Lab