AI / Data / SoftwareMedium (months)Detectability: Moderate

Credit scoring models trained pre-COVID

A lender relied on a credit scoring model trained on pre-2020 consumer behavior to drive approvals and limits.

Decision summary

Year
2020
Failure mode
Label lag + distribution shift: the model failed quietly before the outcomes caught up.
Silent failure window
~4 months: approval rates held, but risk separation degraded on new applicants.

The original logic

The model had strong historical AUC, stable calibration, and was supported by monitoring on aggregate delinquency and approval rates.

Key assumptions

  • The relationship between observed features (income, employment, utilization) and risk remained stable.
    Confidence at decision: High
    Expected lifetime: 6–12 months
  • Macro shocks would be captured quickly enough by lagging outcome metrics.
    Confidence at decision: Medium
    Expected lifetime: 3–6 months
  • Operational interventions (forbearance, payment holidays) wouldn’t materially distort labels.
    Confidence at decision: Low
    Expected lifetime: Weeks

What changed

Consumer spending patterns and employment stability changed abruptly, while policy responses distorted delinquency labels. The model continued to look “fine” on delayed metrics but was making different errors on new cohorts.

Outcome

Increased charge-offs concentrated in segments the model historically considered low-risk; emergency policy overlays were added and the model required rapid re-training and governance review.

Early warning signals (missed)

  • Population stability index (PSI) shifts in key features on new applicants
  • Growing reliance on manual overrides and exception policies
  • Cohort-level performance drift (recent vintages) masked by portfolio aggregates

How AssureAI would have helped

  • Explicit assumption tracking for “stationarity” with required evidence and a re-validation cadence.
  • Early drift signals (PSI, cohort calibration) tied to the decision record, not buried in dashboards.
  • Governance-ready export showing what changed and when the confidence expired.

Non-obvious lessons

  • Historical performance is an argument about the past, not a warranty.
  • Portfolio averages are comfort food.
  • If outcomes lag, your validation cadence must lead.
Credit scoring models trained pre-COVID — Decision Graveyard