ES
Data Science

Smoothing risk: A Machine Learning pipeline for early debt recovery (executive summary)

30/01/2024

Leer en español



At BBVA AI Factory we develop predictive models for debt mitigation, anticipating possible undesirable situations for our customers and offering them solutions in an expeditious manner.

The core business of the bank is lending money. Through the repayment of installments, financial institutions enable people to buy homes, buy cars, or start their own entrepreneurship. However, when clients face adversities and fall behind on their payments, an unfavorable situation can arise for both the bank and its customers.

At AI Factory, the global hub where BBVA builds its major AI-based solutions, we develop predictive models for early debt recovery, using state-of-the-art ML techniques in the context of supervised learning1 with tabular data. The goal is to assist the client in quickly overcoming or preventing the worsening of this unfavorable situation. Thanks to these models, the bank can offer early solutions, such as refinancements and adjusting installments into affordable fees.

Specifically, we have created a pipeline to automate various standard processes that we apply to the different debt recovery models we develop. In the context of Machine Learning, a pipeline is a structured sequence of data processing and modeling steps that automates the end-to-end process. One can understand an ML pipeline as a recipe for creating and deploying models: it guides the process from gathering and cleaning data to training a model, evaluating its performance, and putting it to use. It enhances workflow organization, reproducibility, and scalability in Data Science projects.

In this case, the ML pipeline we introduce combines traditional risk analysis methods with the latest in supervised learning libraries.

Our Use Cases: Different ML models to address different debt states

Customers can find themselves in various states concerning their debt payments, for which we apply different data models. All these models assist BBVA’s managers in deciding as soon as possible what actions to take.

Debt status Up-to-date on payments. In irregular payment status, when they have some overdue installments for less than three months. In default, when they have failed to pay one or more installments for three months or more.
Models 1. Model for predicting entry into irregular payment status. 2. Model for predicting exit from irregular payment status. 3. Model for predicting exit from default in a short period (45 days).

4. Model for predicting prolonged default (not exiting default within two years).
Debt status Models
Up-to-date on payments. 1. Model for predicting entry into irregular payment status.
In irregular payment status, when they have some overdue installments for less than three months. 2. Model for predicting exit from irregular payment status.
In default, when they have failed to pay one or more installments for three months or more. 3. Model for predicting exit from default in a short period (45 days).

4. Model for predicting prolonged default (not exiting default within two years).

Why an ML pipeline for debt management?

Traditional mathematical models, such as logistic regressions, are still applied in the field of risk analysis. These models are simple and very interpretable; however, sometimes, they don’t reach the performance levels other non-linear ML methods can. At AI Factory, we aim to find a balance between the more traditional methodologies standardized in risk, which help us have a reference starting point, and innovative techniques that allow us to create productive models with greater predictive power by contrasting both methodologies.

In addressing various debt recovery problems and applying ML models to solve them, we realized the same steps were continually repeated. That’s why we decided to unify these phases into a debt management modeling pipeline to be more agile, allowing us to reuse code and automate processes across different projects.

We start from some premises:

Our models focus on supervised learning with tabular datasets to predict variables, which are usually binary.
We need to consider a good amount of variables – more than 1800 in some cases – which include behavioral, sociodemographic, transactional, and debt level data. The data’s diversity and volume demand a consistent dimensionality reduction process2.
We face tight deadlines and must build and validate effective models beforehand. An automated pipeline significantly accelerates this process.
It’s essential to create explainable models that allow us to interpret their results, thereby translating them into business language.
Score segmentation is necessary for precise evaluation and adaptation to different contexts.

Our pipeline’s standardization also helps us reduce the time-to-value, –the time we take to deliver value– when we start a new project with the same business area. In a more extensive article published recently we include in-depth explanations about our ML pipeline, as well as the state-of-the-art libraries we’ve used.

Conclusions: Tradition-innovation symbiosis to create productizable models

Innovative ML methods allow us to streamline the process in the creation of more efficient and accurate predictive models. In the context of debt management, these models help us to prevent undesirable situations for our customers regarding their debt status and to offer solutions to mitigate it. The ML pipeline we propose can be used in any use case involving supervised learning.

We achieve greater precision and efficiency by applying new state-of-the-art libraries in the traditional ML product development cycle phases – a perfect symbiosis between tradition and innovation.