Bank Failure Prediction - Khalid Mohammed

Conference Presentation

Our research team recently presented this work at the Consortium for Computing Sciences in Colleges (CCSC) Central Plains Conference. The presentation highlighted our novel approach to bank failure prediction using machine learning techniques and received valuable feedback from the academic community.

While this research is ongoing, we're excited to share our preliminary findings and methodological approach with the broader computer science and financial analytics communities.

Project Overview

This collaborative research project focuses on improving early warning systems for bank failures using machine learning techniques. Traditional statistical models face significant challenges due to sparse failure events and highly non-linear financial indicators. Our approach examines how advanced machine learning models can identify early warning signs of financial instability, potentially providing regulators with valuable lead time for intervention before crisis points emerge.

Research Team

This project is a collaborative effort between:

Khalid Mohammed
Coleman Pagac
Rediet Ayalew
Braedon Stapelman

Under the supervision of Dr. Eric Manley and Dr. Sean Severe at Drake University.

Research Challenges

Key Challenges

Class Imbalance: Bank failures are rare events, creating significant training challenges
Temporal Dependencies: Financial indicators show complex time-varying relationships
Feature Interaction: Non-linear relationships between financial metrics
Interpretability: Balancing predictive power with regulatory transparency

Research Team

This project is a collaborative effort between:

Khalid Mohammed
Coleman Pagac
Rediet Ayalew
Braedon Stapelman

Under the supervision of Dr. Eric Manley and Dr. Sean Severe at Drake University.

Feature Engineering

Our preprocessing begins with z-score normalization and log-transforming skewed indicators to ensure distributional stability. We emphasize lower HEALTH ranges via exponential rescaling, improving model sensitivity to early signs of deterioration.

Class imbalance, critical in the rare-event nature of bank failures, is addressed with weighted loss functions and SMOTE augmentation techniques. We use a fixed four-quarter input window (specifically quarters 6-9 prior to the prediction quarter), allowing our models to make predictions six quarters in advance using a full year of historical financial data.

Recursive feature elimination combined with SHAP value analysis helps select high-impact, interpretable features without introducing overfitting or collinearity, resulting in a final set of 24 key indicators. Feature importance analyses highlight liquidity ratios (SHAP 0.23) and capital adequacy metrics (SHAP 0.19) as key predictors.

HEALTH Score Framework

We've developed a composite financial metric called HEALTH (Holistic Evaluation of Asset Liquidity, Transparency, and Hotdder stability), which is defined as (Equity - Goodwill) / Total Assets. This ratio serves as a comprehensive proxy for bank financial well-being:

HEALTH Score Components

The HEALTH score captures core dimensions of banking performance:

Capital adequacy - Measures financial resilience
Asset quality - Evaluates lending portfolio stability
Liquidity - Assesses ability to meet obligations
Profitability - Examines sustainable earnings

Following Wheelock and Wilson (2000), banks falling below a HEALTH threshold of 0.02 are flagged as at-risk.

Predictive Advantage

Unlike binary failure labels, this continuous target enables both granular risk assessments and earlier detection of deterioration trends. HEALTH is employed in both classification (thresholded) and regression (direct prediction) contexts, allowing for rich interpretability and dynamic modeling flexibility across varying regulatory needs.

Our approach aims to identify at-risk institutions up to six quarters before traditional warning signs appear, providing regulators crucial lead time for intervention.

Visualizations

ROC Curves for Model Performance

XGBoost Model Performance

Methodological Approach

Our research employs a multi-model comparative framework to identify the most effective techniques for bank failure prediction:

Traditional Methods

Benchmark statistical and tree-based models providing interpretability and baseline performance:

Logistic Regression
Random Forest
Support Vector Machines
Decision Trees

Recurrent Networks

Temporal deep learning models capturing sequential patterns in financial data over time:

LSTM Networks
GRU Networks
Bidirectional RNNs
Sequence-to-One Models

Advanced Architectures

State-of-the-art neural network designs leveraging attention mechanisms and hybrid approaches:

Transformer Networks
Attention Mechanisms
Ensemble Methods
Hybrid CNN-RNN

Machine Learning Financial Analytics Time Series Analysis Risk Assessment

Conclusions & Future Work

Current Findings

Our findings validate ML architectures as valuable tools for bank failure prediction. The HEALTH framework supports both risk flagging and continuous monitoring. XGBoost delivers superior classification (AUC 0.93) while transformer architectures (AUC 0.892) excel at capturing deterioration patterns.

Liquidity ratios and capital adequacy emerge as critical predictors. ML-enhanced systems can identify at-risk institutions up to six quarters before traditional warning signs, providing regulators crucial lead time for intervention.

Next Steps

Following the CCSC Conference presentation, our team is continuing to refine our methodologies and evaluate different model architectures to identify the most effective approach to bank failure prediction. Full methodological details and final results will be published following the completion of our research.

For more information about this ongoing research project, feel free to contact me at [email protected].

CCSC Poster

The poster shown above is the poster used in the 2025 CCSC Student Poster Contest.

CCSC Central Plains Conference References

Conference Materials

Poster Contest Entry Conference Program

References

FDIC. (2019). Bank Financial Health Metrics: A Comprehensive Rating System. FDIC Quarterly, 13(2), 37-51.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD.
Vaswani, A., et al. (2017). Attention is All You Need. Advances in Neural Information Processing Systems.
Wheelock, D. C., & Wilson, P. W. (2000). Why do banks disappear? Review of Economics and Statistics, 82(1), 127–138.

⚠️ Research ongoing - Check back for complete findings and implementation details

Machine Learning for Bank Failure Prediction: Enhancing Early Warning Systems

Conference Presentation

Project Overview

Research Team

Research Challenges

Key Challenges

Research Team

Feature Engineering

HEALTH Score Framework

HEALTH Score Components

Predictive Advantage

Visualizations

ROC Curves for Model Performance

XGBoost Model Performance

Methodological Approach

Traditional Methods

Recurrent Networks

Advanced Architectures

Conclusions & Future Work

Current Findings

Next Steps

CCSC Poster

CCSC Central Plains Conference References

Conference Materials

References