2025 AI & ML / Research

Bank Failure Prediction

0 %
XGBoost AUC
Classification performance
0
Quarter Lead Time
Early warning prediction window
0
Key Indicators
Selected features after RFE
CCSC
Conference
Presented April 2025
01

Research Overview

Bank failures don't happen without warning. The warning signs are buried in quarterly FDIC filings that traditional models can't read fast enough, or interpret well enough, to act on in time.

This research — presented at CCSC Central Plains 2025 — tests whether machine learning can identify at-risk institutions up to six quarters before collapse, using a HEALTH score framework that transforms raw financial ratios into interpretable risk signals. The core finding: XGBoost reaches AUC 0.93 on the classification task; Transformers reach 0.892 while capturing the temporal deterioration patterns that tree models miss.

Collaborative research with Coleman Pagac, Rediet Ayalew, and Braedon Stapelman, supervised by Dr. Eric Manley and Dr. Sean Severe at Drake University.

02

HEALTH Score Framework

Capital Adequacy

Measured as equity minus goodwill over total assets. The most direct signal of whether a bank has enough of a buffer to absorb losses before becoming insolvent.

Asset Quality

Lending portfolio stability and exposure to non-performing assets. Deterioration here typically precedes the capital adequacy problem by several quarters.

Liquidity

Ability to meet short-term obligations without fire sales. SHAP analysis puts this at 0.23 importance — the single strongest predictor in the HEALTH framework.

Profitability

Sustainable earnings and return on equity relative to risk. SHAP importance 0.19. A bank can be temporarily unprofitable without failing — but sustained losses erode capital.

03

Methodology

Traditional Methods

XGBoost AUC 0.93
Random Forest Ensemble baseline
Logistic Regression Interpretability
SVM Margin classifier

Deep Learning

Transformer AUC 0.892
LSTM Sequential patterns
GRU Temporal modeling
CNN-RNN Hybrid Multi-scale features

Feature Engineering

SMOTE Class balancing
RFE Feature selection
SHAP Explainability
Z-Score Normalization
04

Research Challenges

Severe Class Imbalance

Banks almost never fail. In decades of FDIC data, failures account for a tiny fraction of all bank-quarter observations. A model that never predicts failure is trivially accurate — and completely useless for the actual problem.

Fix. Weighted loss functions that penalize missed failures more than false alarms, combined with SMOTE to synthesize minority-class examples. Exponential rescaling on the HEALTH score to amplify sensitivity at the low end of the distribution where early deterioration happens.

Temporal Dependencies in Financial Data

A single quarter's financials don't predict failure well. The signal is in the trajectory — how capital ratios declined over five quarters, how loan quality degraded while earnings compensated. Cross-sectional models miss this entirely.

Fix. A fixed four-quarter input window (quarters 6 through 9 prior to the prediction point) that captures the early deterioration trajectory while providing predictions six full quarters before failure. Transformers capture the attention patterns in this sequence that LSTM and GRU architectures partially miss.

Regulatory Interpretability Requirement

A regulator who receives an alert from a black-box model can't act on it. "The model said so" isn't a basis for a supervisory action against a financial institution. The prediction needs to come with an explanation.

Fix. SHAP value analysis on every prediction to show which HEALTH components drove the score and by how much. Recursive feature elimination to keep the feature set to the variables that actually matter, making the explanations precise rather than diffuse.

05

Key Findings

Six quarters is a long time in finance. If regulators had a reliable signal that far in advance, intervention becomes a policy choice rather than a crisis response.

01

XGBoost outperforms deep learning on this task (AUC 0.93 vs 0.892) because the feature relationships are structured and tabular, not sequential. Architecture choice should follow data structure.

02

SHAP values changed what the research was about. We started trying to predict failure. SHAP showed us that the more useful output is knowing which specific ratio to watch for each individual bank.

03

Class imbalance in rare-event prediction isn't just a technical problem. Choosing the wrong metric — accuracy instead of AUC or recall — would have led us to declare the model good while it was systematically missing actual failures.

04

Collaborative research with four people across finance and CS required explicit conventions around data versioning and model checkpointing from day one, not after the first reproducibility problem.