Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation

Machine learning approaches for the prediction of credit risk

Abstract : Predicting the possible occurrence of a future event, which may eventually never happen, is a fundamental problem that naturally occurs in most scientific as well as industrial fields.This problem, commonly referred to as survival analysis after its canonical application in epidemiology, has long been one of the classical problems in statistics whose exceptional contributions have enabled immeasurable advancements in the natural sciences.More recently, through advancements in the field of machine learning, those same natural scientific fields and industrial applications have also been able to achieve significant leap forwards by exploiting large amounts of high-dimensional data using highly flexible estimators.In this thesis we try to reconcile both approaches and show how to best make use of the highly flexible machine learning approaches in the survival analysis setting in a principled and motivated way.We show in this work how the classical ERM framework can be adapted to the survival analysis setting by introducing a reweighted objective called the Kaplan-Meier ERM and derive non-asymptotic error bounds without parametric assumptions on the true generating process, effectively bringing the results one has come to expect in the machine learning field to survival analysis.We also show how to construct highly flexible estimators of the survival function, one of the key building blocks of our Kaplan-Meier ERM framework. We formulate the survival as a normalizing flow problem and introduce a novel conditional normalizing flow estimator of the survival density, giving a tractable, easy to sample from, but highly expressive estimator of the survival density.In order to reduce the complexity of the two previous approaches, we introduce an estimator of the gradient of a black box function and show how to use it for variable selection, a simple yet highly effective method for dimensionality reduction.Finally, we apply the methods developed here to a particular instance of the survival problem: predicting the defaults of companies. We show how to use estimators of the probability of default to build optimal portfolios as well as how to efficiently make use of small data through hierarchical methods.
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Monday, February 21, 2022 - 1:08:08 PM
Last modification on : Tuesday, February 22, 2022 - 3:06:05 AM
Long-term archiving on: : Sunday, May 22, 2022 - 6:42:06 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03582658, version 1



Guillaume Ausset. Machine learning approaches for the prediction of credit risk. Machine Learning [stat.ML]. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAT034⟩. ⟨tel-03582658⟩



Record views


Files downloads