Machine Learning for Bank Failure Prediction: Enhancing Early Warning Systems
Research project investigating AI-driven approaches to financial stability monitoring
⚠️ Research presented at CCSC Central Plains Conference - April 2025
Conference Presentation
Our research team recently presented this work at the Consortium for Computing Sciences in Colleges (CCSC) Central Plains Conference. The presentation highlighted our novel approach to bank failure prediction using machine learning techniques and received valuable feedback from the academic community.
While this research is ongoing, we're excited to share our preliminary findings and methodological approach with the broader computer science and financial analytics communities.
Project Overview
This collaborative research project focuses on improving early warning systems for bank failures using machine learning techniques. Traditional statistical models face significant challenges due to sparse failure events and highly non-linear financial indicators. Our approach examines how advanced machine learning models can identify early warning signs of financial instability, potentially providing regulators with valuable lead time for intervention before crisis points emerge.
Research Team
This project is a collaborative effort between:
- Khalid Mohammed
- Coleman Pagac
- Rediet Ayalew
- Braedon Stapelman
Under the supervision of Dr. Eric Manley and Dr. Sean Severe at Drake University.
Research Challenges
Key Challenges
- Class Imbalance: Bank failures are rare events, creating significant training challenges
- Temporal Dependencies: Financial indicators show complex time-varying relationships
- Feature Interaction: Non-linear relationships between financial metrics
- Interpretability: Balancing predictive power with regulatory transparency
Research Team
This project is a collaborative effort between:
- Khalid Mohammed
- Coleman Pagac
- Rediet Ayalew
- Braedon Stapelman
Under the supervision of Dr. Eric Manley and Dr. Sean Severe at Drake University.
Feature Engineering
Our preprocessing begins with z-score normalization and log-transforming skewed indicators to ensure distributional stability. We emphasize lower HEALTH ranges via exponential rescaling, improving model sensitivity to early signs of deterioration.
Class imbalance, critical in the rare-event nature of bank failures, is addressed with weighted loss functions and SMOTE augmentation techniques. We use a fixed four-quarter input window (specifically quarters 6-9 prior to the prediction quarter), allowing our models to make predictions six quarters in advance using a full year of historical financial data.
Recursive feature elimination combined with SHAP value analysis helps select high-impact, interpretable features without introducing overfitting or collinearity, resulting in a final set of 24 key indicators. Feature importance analyses highlight liquidity ratios (SHAP 0.23) and capital adequacy metrics (SHAP 0.19) as key predictors.
HEALTH Score Framework
We've developed a composite financial metric called HEALTH (Holistic Evaluation of Asset Liquidity, Transparency, and Hotdder stability), which is defined as (Equity - Goodwill) / Total Assets. This ratio serves as a comprehensive proxy for bank financial well-being:
HEALTH Score Components
The HEALTH score captures core dimensions of banking performance:
- Capital adequacy - Measures financial resilience
- Asset quality - Evaluates lending portfolio stability
- Liquidity - Assesses ability to meet obligations
- Profitability - Examines sustainable earnings
Following Wheelock and Wilson (2000), banks falling below a HEALTH threshold of 0.02 are flagged as at-risk.
Predictive Advantage
Unlike binary failure labels, this continuous target enables both granular risk assessments and earlier detection of deterioration trends. HEALTH is employed in both classification (thresholded) and regression (direct prediction) contexts, allowing for rich interpretability and dynamic modeling flexibility across varying regulatory needs.
Our approach aims to identify at-risk institutions up to six quarters before traditional warning signs appear, providing regulators crucial lead time for intervention.
Visualizations
ROC Curves for Model Performance
XGBoost Model Performance
Methodological Approach
Our research employs a multi-model comparative framework to identify the most effective techniques for bank failure prediction:
Traditional Methods
Benchmark statistical and tree-based models providing interpretability and baseline performance:
- Logistic Regression
- Random Forest
- Support Vector Machines
- Decision Trees
Recurrent Networks
Temporal deep learning models capturing sequential patterns in financial data over time:
- LSTM Networks
- GRU Networks
- Bidirectional RNNs
- Sequence-to-One Models
Advanced Architectures
State-of-the-art neural network designs leveraging attention mechanisms and hybrid approaches:
- Transformer Networks
- Attention Mechanisms
- Ensemble Methods
- Hybrid CNN-RNN
Conclusions & Future Work
Current Findings
Our findings validate ML architectures as valuable tools for bank failure prediction. The HEALTH framework supports both risk flagging and continuous monitoring. XGBoost delivers superior classification (AUC 0.93) while transformer architectures (AUC 0.892) excel at capturing deterioration patterns.
Liquidity ratios and capital adequacy emerge as critical predictors. ML-enhanced systems can identify at-risk institutions up to six quarters before traditional warning signs, providing regulators crucial lead time for intervention.
Next Steps
Following the CCSC Conference presentation, our team is continuing to refine our methodologies and evaluate different model architectures to identify the most effective approach to bank failure prediction. Full methodological details and final results will be published following the completion of our research.
For more information about this ongoing research project, feel free to contact me at [email protected].
CCSC Poster
The poster shown above is the poster used in the 2025 CCSC Student Poster Contest.
CCSC Central Plains Conference References
Conference Materials
References
- FDIC. (2019). Bank Financial Health Metrics: A Comprehensive Rating System. FDIC Quarterly, 13(2), 37-51.
- Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD.
- Vaswani, A., et al. (2017). Attention is All You Need. Advances in Neural Information Processing Systems.
- Wheelock, D. C., & Wilson, P. W. (2000). Why do banks disappear? Review of Economics and Statistics, 82(1), 127–138.
⚠️ Research ongoing - Check back for complete findings and implementation details