AutoResearch
StaleInconclusiveMedium bandFamily Manager
Shared-source hint confidence thresholds
- Baseline
- 62%
- Final
- 62%
- Delta
- +0 pts
- Variants
- 3
Objective
What we set out to improve
Tune the confidence threshold for suggesting that a source is shared across family members, without over-suggesting on weak signal.
Inconclusive
Inconclusive. The eval stage failed before producing a verdict (the holdout set could not be assembled), so no threshold change was promoted. The baseline 0.7 threshold was retained and the experiment is queued for a clean re-run.
Iterations
Variants we tried
Each variant and its coarse objective metric. The kept variant is marked; bars are relative to the best run.
- 1Baseline — current 0.7 thresholdLow62%
- 2Variant A — lower 0.5 thresholdMedium60%
- 3Variant B — 0.8 threshold + recency boostMedium62%
Run
Stages
baseline
Succeeded · 2.0s
variant run
Succeeded · 7.1s
eval
Failed · 640ms
Output
Artifacts and what shipped
Redaction-safe artifact previews, diffs, metric tables, and prompt variants with sensitive text removed.
- Metric table
Hint precision by threshold (eval incomplete)
- Report
Inconclusive: the eval stage failed before a verdict