AI / Data / SoftwareShort (weeks)Detectability: Hard

LLM hallucination risk assumed to decrease monotonically

A team shipped an internal assistant with the assumption that model updates would steadily reduce hallucinations.

“Silence is not stability.”

Decision summary

Year: 2023
Failure mode: Changing error shape: fluency increased faster than grounding and governance.
Silent failure window: 2–3 weeks: errors were rare enough to escape attention but impactful enough to cause downstream rework.

The original logic

Benchmarks improved across releases, user satisfaction increased, and a light “human-in-the-loop” review was considered sufficient for the initial launch scope.

Key assumptions

Model upgrades would monotonically reduce hallucination rates in our domain.
Confidence at decision: Medium
Expected lifetime: Weeks
Prompting and retrieval would bound outputs to approved sources.
Confidence at decision: Medium
Expected lifetime: 1–3 months
Users would treat the assistant as a draft, not an authority.
Confidence at decision: Low
Expected lifetime: Weeks

What changed

A model update improved general reasoning but changed failure modes: it became more fluent in incorrect specifics. Retrieval coverage was incomplete for edge cases, and users began trusting the assistant as confidence rose.

Outcome

A small number of high-impact incorrect recommendations made it into customer-facing materials, forcing retraction, process changes, and tighter governance controls.

Early warning signals (missed)

A shift from “obvious” hallucinations to plausible-but-wrong citations
Increased user copy-paste into external documents
RAG coverage gaps (queries with low retrieval confidence) not surfaced to users

How AssureAI would have helped

Assumption half-life tracking for “model update improves safety,” requiring explicit re-validation post-upgrade.
Signals: retrieval confidence + citation coverage tracked as drift signals with thresholds.
Audit exports: every recommendation includes sources, confidence, and “review required” triggers.

Non-obvious lessons

Improvement is not monotonic; it is multi-dimensional.
Fluency is a risk amplifier when governance is weak.
If the tool feels authoritative, the process must be authoritative too.