AI can not only automate the credit decision process, it can also make it more efficient and accurate. Machine learning algorithms can be trained on historical loan data to predict the likelihood of a loan being approved or denied; this can help underwriters make more informed decisions and cut down on the significant time and resources necessitated by a manual review process.

Additionally, AI can be used to identify patterns and trends in the data that would otherwise take a significant amount of time to appear to human underwriters. This can help improve the accuracy of the credit-decisioning process. However, it’s important to note that AI is only as good as the data that it’s trained on, so it’s crucial to have a clean and diverse dataset in order to achieve accurate results.

This blog will cover the three topics listed below:

1. Synthetic datasets for loan approval applications

2. Deep learning and its components (such as the neural network)

3. How to use LIME to interpret such a model and determine the features leading to the prediction

Deep Learning and its Components

For the purpose of this blog, I have used a synthetic dataset found online to demonstrate a deep learning use case.

The dataset contains the following columns:

  • Application ID: The unique identifier of an application
  • Gender: Gender of the applicant
  • Married: Marital Status of the applicant
  • Dependents: Stating if the applicant has any dependents
  • Education: Stating if the applicant is a graduate or not
  • Self Employed: If the applicant is self employed or not
  • Credit History: If the applicant has any previous credit history
  • Property Area: The property in discussion is urban or rural or semi-urban
  • Income: The Income of the applicant is low, medium or high
  • Application Status: Application Status is the target variable and signifies if the application was approved or not

For the sake of simplicity, I have not performed fairness bias testing on the gender attribute, even though it is provided in the data. However, it is generally recommended to examine the distribution of attributes such as gender to check for bias in the data. Unknowingly, bias can creep in the source data and can have serious legal implications.

What is Deep Learning?

Deep learning is a subfield of machine learning that seeks to replicate the pattern recognition abilities of the human brain by training artificial neural networks on large datasets. Deep learning aims to build more accurate predictive models and improve the performance of machine learning algorithms. Deep learning models are trained using neural network architectures and labeled datasets.

What is a Neural Network?

Neural networks are algorithms that attempt to simulate the way the human brain processes and recognizes patterns in data. They are composed of interconnected units called neurons, which process and transmit information.

The diagram below accurately depicts the architecture of a simple neural network. There are three main components to it:

  • Input Layer: This layer stores and processes the input data. A neuron is the basic unit of a neural network (which are the dark blue circles on the left).
  • Activation Function: The activation function decides whether the neuron’s input is essential in the prediction process. There are multiple kinds of activation functions, such as sigmoid or tanh.
  • Weights: They control the strength of the connection between two neurons. In other words, a weight decides how much influence the input will have on the output.

I will further break down the concept of “weight” with the following example. Say that you are trying to make coffee. A standard coffee beverage requires just three ingredients (coffee, milk, and sugar). These ingredients can be referred to as the neurons, because they are the starting point of the process. The amount of each ingredient represents the weight. Once all the ingredients are mixed, they transform into another state. This transformation process is called “activation.”

Credit: Investopedia.com
  • Hidden Layer: The hidden layer takes all the inputs from the input layer and performs all the calculations needed to generate the output. This is also called “hidden,” as the operations are hidden from the user.
  • Output Layer: The calculations performed in the hidden layer are then sent to the output layer, where the user can view the results of the computations.

What is Classification?

Classification is a supervised machine learning task in which the goal is to predict class labels based on input data. There are various types of classification, including binary classification, which involves predicting one of two class labels (such as “approved” or “not approved”). In this case, we will use binary classification to solve the task.

You can click through here to check out a live example of how to solve this using the tensor flow classification method.

Before you proceed, ensure tensor flow is installed in your system. Here is a link that can assist you with that.

References:

Avatar photo
Supreet Kaur
AVP at Morgan Stanley