Michał Smyczek
CFO
Cledar helped a for-profit data partner for financial institutions analyze the underlying data of securitization products to better understand the probability and factors affecting credit default, achieving prediction accuracy of over 98%.
Country
Germany, Frankfurt
Industry
Financial Services
Duration
August 2019-Feb 2020
European Data Warehouse is a for-profit institution that gathers data and sells data to financial institutions so they can use it to inform their investment decisions, and create and sell new products and services.
Cledar was invited to analyze the data held by the European Data Warehouse on European securitization products and combine it with macroeconomic data and trends to understand and predict the level of risk associated with each loan and, thus, with each securitization product. This credit risk analysis project could empower our client and its customers to make data-informed decisions about how to price and assess securitization products.
analyzed as part of the test study
in each portfolio
was the the study period
For this study, Cledar used historical loan-level data from 208 ABS products. The anonymized 2013-2019 data sets available for analysis included history of payments, account balance, delinquencies, defaults, and property value.
The goal was to analyze this data while also considering external, macroeconomic data, such as unemployment, labor force, house price index, and average annual income. To improve accuracy, we added statistics such as average delinquency rates, average debt to income, values of original loans, current balance, pre-payments (if any) and balance in arrears.
By combining this data and by deploying the right data analytics and machine learning tools, it would be possible to predict the likelihood of default with greater accuracy, increase the value of the data, and open the possibility of creating new products and services
To start, we decided on two approaches – to look at the individual mortgages and then at the overall values of the whole portfolio. For individual mortgages, we looked at the probability of default in the near term (3, 6, and 12 months) based on the last four payments, combined with the other data available to us. For this, we used the Explainable Boosting Machine (EBM) model, which is based on the idea of Additive Models, meaning that each feature fed into the algorithm could be assigned a weight that defines how it affects the overall outcome. With this, it was possible to pinpoint the most important factors that could influence default probability.
To estimate the general performance of the portfolio, we focused on metrics like delinquency rate and default balance. In addition to historical data for the portfolio, we also used global economic indicators for Spain. Using EBM for regression we were able to obtain very accurate results compared to very common Linear Regression.
The results were impressive. Using our approach, we were able to achieve a default probability accuracy of over 98%. In addition, our approach enabled us to enhance our understanding of which factors were most likely to contribute to default. We were also able to capture the impact of macroeconomic trend reversals of default likelihood as well as the timing of defaults. These results demonstrated the value of our solution and its ability to help investors improve their risk management and make better investment decisions.