AI / Data / SoftwareShort (weeks)Detectability: Hard
LLM hallucination risk assumed to decrease monotonically
A team shipped an internal assistant with the assumption that model updates would steadily reduce hallucinations.
“Silence is not stability.”
Decision summary
- Year
- 2023
- Failure mode
- Changing error shape: fluency increased faster than grounding and governance.
- Silent failure window
- 2–3 weeks: errors were rare enough to escape attention but impactful enough to cause downstream rework.
The original logic
Benchmarks improved across releases, user satisfaction increased, and a light “human-in-the-loop” review was considered sufficient for the initial launch scope.
Key assumptions
- Model upgrades would monotonically reduce hallucination rates in our domain.Confidence at decision: MediumExpected lifetime: Weeks
- Prompting and retrieval would bound outputs to approved sources.Confidence at decision: MediumExpected lifetime: 1–3 months
- Users would treat the assistant as a draft, not an authority.Confidence at decision: LowExpected lifetime: Weeks
What changed
A model update improved general reasoning but changed failure modes: it became more fluent in incorrect specifics. Retrieval coverage was incomplete for edge cases, and users began trusting the assistant as confidence rose.
Outcome
A small number of high-impact incorrect recommendations made it into customer-facing materials, forcing retraction, process changes, and tighter governance controls.
Early warning signals (missed)
- A shift from “obvious” hallucinations to plausible-but-wrong citations
- Increased user copy-paste into external documents
- RAG coverage gaps (queries with low retrieval confidence) not surfaced to users
How AssureAI would have helped
- Assumption half-life tracking for “model update improves safety,” requiring explicit re-validation post-upgrade.
- Signals: retrieval confidence + citation coverage tracked as drift signals with thresholds.
- Audit exports: every recommendation includes sources, confidence, and “review required” triggers.
Non-obvious lessons
- Improvement is not monotonic; it is multi-dimensional.
- Fluency is a risk amplifier when governance is weak.
- If the tool feels authoritative, the process must be authoritative too.