Skip to content
AutoResearch
StaleInconclusiveMedium bandFamily Manager

Shared-source hint confidence thresholds

Baseline
62%
Final
62%
Delta
+0 pts
Variants
3
Objective

What we set out to improve

Tune the confidence threshold for suggesting that a source is shared across family members, without over-suggesting on weak signal.

Inconclusive

Inconclusive. The eval stage failed before producing a verdict (the holdout set could not be assembled), so no threshold change was promoted. The baseline 0.7 threshold was retained and the experiment is queued for a clean re-run.

Iterations

Variants we tried

Each variant and its coarse objective metric. The kept variant is marked; bars are relative to the best run.

  • 1Baseline — current 0.7 thresholdLow62%
  • 2Variant A — lower 0.5 thresholdMedium60%
  • 3Variant B — 0.8 threshold + recency boostMedium62%
Run

Stages

  1. baseline

    Succeeded · 2.0s

  2. variant run

    Succeeded · 7.1s

  3. eval

    Failed · 640ms

Output

Artifacts and what shipped

Redaction-safe artifact previews, diffs, metric tables, and prompt variants with sensitive text removed.

  • Metric table

    Hint precision by threshold (eval incomplete)

  • Report

    Inconclusive: the eval stage failed before a verdict

What you can see, and what is hidden

Every projection on this page is redaction-safe by construction. Redaction level: Sample content, curated, public-safe excerpts only.

Shown

  • Identifiers & counts
  • Closed-enum statuses
  • Coarse quality / resource bands
  • Timestamps & freshness

Intentionally hidden

  • Raw prompts
  • Raw documents
  • raw tool log
  • Raw trace spans
  • Embedding vectors
  • Free-text feedback
  • Auth internals & secrets
  • Secrets

Related in the Lab