This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilise this knowledge for its portfolio and risk assessment.
To develop your understanding of the domain, you are advised to independently research a little about risk analytics - understanding the types of variables and their significance should be enough).
Download the dataset from below.
https://drive.google.com/file/d/1nBXmamiJoy_Z-AVY610ufqnhlYbemgXp/view?usp=sharing
https://drive.google.com/file/d/1hvQM5ChyARZCZxM_5gdQUChf834hPTmT/view?usp=sharing
https://drive.google.com/file/d/1AZJjGBUCCIZi6RnqYiJZtVmoHuv_55qo/view?usp=sharing
This dataset has 3 files as explained below:
-
'application_data.csv' contains all the information of the client at the time of application. The data is about whether a client has payment difficulties.
-
'previous_application.csv' contains information about the client’s previous loan data. It contains the data whether the previous application had been Approved, Cancelled, Refused or Unused offer.
-
'columns_description.csv' is data dictionary which describes the meaning of the variables.