StaleInconclusiveMedium bandFamily Manager

Shared-source hint confidence thresholds

Baseline: 62%
Final: 62%
Delta: +0 pts
Variants: 3

Objective

What we set out to improve

Tune the confidence threshold for suggesting that a source is shared across family members, without over-suggesting on weak signal.

Inconclusive

Inconclusive. The eval stage failed before producing a verdict (the holdout set could not be assembled), so no threshold change was promoted. The baseline 0.7 threshold was retained and the experiment is queued for a clean re-run.

Iterations

Variants we tried

Each variant and its coarse objective metric. The kept variant is marked; bars are relative to the best run.

1Baseline — current 0.7 thresholdLow62%
2Variant A — lower 0.5 thresholdMedium60%
3Variant B — 0.8 threshold + recency boostMedium62%

Run

Stages

baseline
Succeeded · 2.0s
variant run
Succeeded · 7.1s
eval
Failed · 640ms

Output

Artifacts and what shipped

Redaction-safe artifact previews, diffs, metric tables, and prompt variants with sensitive text removed.

Metric table
Hint precision by threshold (eval incomplete)
Report
Inconclusive: the eval stage failed before a verdict

What you can see, and what is hidden

Every projection on this page is redaction-safe by construction. Redaction level: Sample content, curated, public-safe excerpts only.

Shown

Identifiers & counts
Closed-enum statuses
Coarse quality / resource bands
Timestamps & freshness

Intentionally hidden

Raw prompts
Raw documents
raw tool log
Raw trace spans
Embedding vectors
Free-text feedback
Auth internals & secrets
Secrets

Related in the Lab

All experimentsThe full gallery, grouped by product area.Family Manager observatoryThe capture → approve → route loop.News DigestRanked items, source health, and feedback.Knowledge Base observatoryWhere promoted learnings are written.