From c9ed838c9d5b47a3de8501494c11705443e89a7d Mon Sep 17 00:00:00 2001 From: ph_ Date: Tue, 2 Jun 2020 15:54:22 -0400 Subject: [PATCH 1/3] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 596e8f7..d155306 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ ![A responsible machine learning workingflow](/img/rml_diagram_no_hilite.png) -A Responsible Machine Learning Workflow Diagram. **Source:** [*Information*, 11(3) (March 2020)](https://www.mdpi.com/2078-2489/11/3) +A Responsible Machine Learning Workflow Diagram. **Source:** [*Information*, 11(3) (March 2020)](https://www.mdpi.com/2078-2489/11/3). ## GWU_DNSC 6290: Course Outline From 30f56194e8e3a28e3a2fd084c566c18419df7671 Mon Sep 17 00:00:00 2001 From: patrickh Date: Wed, 3 Jun 2020 21:11:56 -0400 Subject: [PATCH 2/3] lecture 3 --- lecture_3.ipynb | 7816 +++++++++++++++++++++++++++++++++++++++++++++ rmltk/debug.py | 73 + rmltk/evaluate.py | 107 +- 3 files changed, 7995 insertions(+), 1 deletion(-) create mode 100644 lecture_3.ipynb diff --git a/lecture_3.ipynb b/lecture_3.ipynb new file mode 100644 index 0000000..0f851af --- /dev/null +++ b/lecture_3.ipynb @@ -0,0 +1,7816 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## License \n", + "\n", + "Copyright 2020 Patrick Hall (jphall@gwu.edu)\n", + "\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + "\n", + " http://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**DISCLAIMER:** This notebook is not legal compliance advice." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "***" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Testing a Constrained Model for Discrimination and Remediating Discovered Discrimination\n", + "\n", + "Fairness is a difficult and complex topic. So much so that leading scholars have yet to agree on a strict definition. However, there is a practical way to discuss and handle observational fairness, or how your model predictions affect different groups of people. This procedure is often known as disparate impact analysis (DIA). This example DIA notebook starts by loading a pre-trained constrained, monotonic gradient boosting machine (GBM) classifier. A probability cutoff for making credit decisions is selected by maximizing Youden's J statistic and confusion matrices are generated to summarize the GBM’s decisions across male and female customers. A basic DIA procedure is then conducted using the information stored in the confusion matrices and several traditional fair lending measures are also calculated.\n", + "\n", + "Because DIA only considers groups of people, it's also important to look for any local, or individual, discrimination that would not be flagged in the group fairness quantities. This notebook closes by illustrating a basic search for cases of individual discrimination." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Global hyperpameters" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "SEED = 12345 # global random seed for better reproducibility" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Python imports and inits" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.\n", + "Attempting to start a local H2O server...\n", + " Java Version: openjdk version \"1.8.0_252\"; OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~18.04-b09); OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)\n", + " Starting server from /home/patrickh/Workspace/GWU_rml/env_rml/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar\n", + " Ice root: /tmp/tmpwbigsuw9\n", + " JVM stdout: /tmp/tmpwbigsuw9/h2o_patrickh_started_from_python.out\n", + " JVM stderr: /tmp/tmpwbigsuw9/h2o_patrickh_started_from_python.err\n", + " Server is running at http://127.0.0.1:54321\n", + "Connecting to H2O server at http://127.0.0.1:54321 ... successful.\n", + "Warning: Your H2O cluster version is too old (9 months and 10 days)! Please download and install the latest version from http://h2o.ai/download/\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
H2O cluster uptime:00 secs
H2O cluster timezone:America/New_York
H2O data parsing timezone:UTC
H2O cluster version:3.26.0.3
H2O cluster version age:9 months and 10 days !!!
H2O cluster name:H2O_from_python_patrickh_8fev5r
H2O cluster total nodes:1
H2O cluster free memory:1.879 Gb
H2O cluster total cores:24
H2O cluster allowed cores:24
H2O cluster status:accepting new members, healthy
H2O connection url:http://127.0.0.1:54321
H2O connection proxy:None
H2O internal security:False
H2O API Extensions:Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4
Python version:3.6.9 final
" + ], + "text/plain": [ + "-------------------------- ---------------------------------------------------\n", + "H2O cluster uptime: 00 secs\n", + "H2O cluster timezone: America/New_York\n", + "H2O data parsing timezone: UTC\n", + "H2O cluster version: 3.26.0.3\n", + "H2O cluster version age: 9 months and 10 days !!!\n", + "H2O cluster name: H2O_from_python_patrickh_8fev5r\n", + "H2O cluster total nodes: 1\n", + "H2O cluster free memory: 1.879 Gb\n", + "H2O cluster total cores: 24\n", + "H2O cluster allowed cores: 24\n", + "H2O cluster status: accepting new members, healthy\n", + "H2O connection url: http://127.0.0.1:54321\n", + "H2O connection proxy:\n", + "H2O internal security: False\n", + "H2O API Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4\n", + "Python version: 3.6.9 final\n", + "-------------------------- ---------------------------------------------------" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from rmltk import debug, evaluate, model # simple module for evaluating, debugging, and training models\n", + "\n", + "# h2o Python API with specific classes\n", + "import h2o \n", + "from h2o.estimators.gbm import H2OGradientBoostingEstimator # for GBM\n", + "\n", + "import numpy as np # array, vector, matrix calculations\n", + "import pandas as pd # DataFrame handling\n", + "\n", + "import matplotlib.pyplot as plt # general plotting\n", + "pd.options.display.max_columns = 999 # enable display of all columns in notebook\n", + "\n", + "# display plots in-notebook\n", + "%matplotlib inline \n", + "\n", + "h2o.init(max_mem_size='2G') # start h2o\n", + "h2o.remove_all() # remove any existing data structures from h2o memory" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Download, Explore, and Prepare UCI Credit Card Default Data\n", + "\n", + "UCI credit card default data: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients\n", + "\n", + "The UCI credit card default data contains demographic and payment information about credit card customers in Taiwan in the year 2005. The data set contains 23 input variables: \n", + "\n", + "* **`LIMIT_BAL`**: Amount of given credit (NT dollar)\n", + "* **`SEX`**: 1 = male; 2 = female\n", + "* **`EDUCATION`**: 1 = graduate school; 2 = university; 3 = high school; 4 = others \n", + "* **`MARRIAGE`**: 1 = married; 2 = single; 3 = others\n", + "* **`AGE`**: Age in years \n", + "* **`PAY_0`, `PAY_2` - `PAY_6`**: History of past payment; `PAY_0` = the repayment status in September, 2005; `PAY_2` = the repayment status in August, 2005; ...; `PAY_6` = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; ...; 8 = payment delay for eight months; 9 = payment delay for nine months and above. \n", + "* **`BILL_AMT1` - `BILL_AMT6`**: Amount of bill statement (NT dollar). `BILL_AMNT1` = amount of bill statement in September, 2005; `BILL_AMT2` = amount of bill statement in August, 2005; ...; `BILL_AMT6` = amount of bill statement in April, 2005. \n", + "* **`PAY_AMT1` - `PAY_AMT6`**: Amount of previous payment (NT dollar). `PAY_AMT1` = amount paid in September, 2005; `PAY_AMT2` = amount paid in August, 2005; ...; `PAY_AMT6` = amount paid in April, 2005. \n", + "\n", + "Demographic variables will not be used as model inputs as is common in credit scoring models. However, demographic variables will be used after model training to test for disparate impact." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Import data and clean" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# import XLS file\n", + "path = 'default_of_credit_card_clients.xls'\n", + "data = pd.read_excel(path,\n", + " skiprows=1)\n", + "\n", + "# remove spaces from target column name \n", + "data = data.rename(columns={'default payment next month': 'DEFAULT_NEXT_MONTH'}) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Assign modeling roles" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "y = DEFAULT_NEXT_MONTH\n", + "X = ['LIMIT_BAL', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6']\n" + ] + } + ], + "source": [ + "# assign target and inputs for GBM\n", + "y_name = 'DEFAULT_NEXT_MONTH'\n", + "x_names = [name for name in data.columns if name not in [y_name, 'ID', 'AGE', 'EDUCATION', 'MARRIAGE', 'SEX']]\n", + "print('y =', y_name)\n", + "print('X =', x_names)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Helper function for recoding values in the UCI credict card default data\n", + "This simple function maps longer, more understandable character string values from the UCI credit card default data dictionary to the original integer values of the input variables found in the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Parse progress: |█████████████████████████████████████████████████████████| 100%\n" + ] + } + ], + "source": [ + "def recode_cc_data(frame):\n", + " \n", + " \"\"\" Recodes numeric categorical variables into categorical character variables\n", + " with more transparent values. \n", + " \n", + " Args:\n", + " frame: Pandas DataFrame version of UCI credit card default data.\n", + " \n", + " Returns: \n", + " H2OFrame with recoded values.\n", + " \n", + " \"\"\"\n", + " \n", + " # define recoded values\n", + " sex_dict = {1:'male', 2:'female'}\n", + " education_dict = {0:'other', 1:'graduate school', 2:'university', 3:'high school', \n", + " 4:'other', 5:'other', 6:'other'}\n", + " marriage_dict = {0:'other', 1:'married', 2:'single', 3:'divorced'}\n", + " \n", + " # recode values using apply() and lambda function\n", + " frame['SEX'] = frame['SEX'].apply(lambda i: sex_dict[i])\n", + " frame['EDUCATION'] = frame['EDUCATION'].apply(lambda i: education_dict[i]) \n", + " frame['MARRIAGE'] = frame['MARRIAGE'].apply(lambda i: marriage_dict[i]) \n", + " \n", + " return h2o.H2OFrame(frame)\n", + "\n", + "data = recode_cc_data(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Split data into training and validation partitions\n", + "Fairness metrics will be calculated for the validation data to give a better idea of how explanations will look on future unseen data." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Train data rows = 21060, columns = 25\n", + "Validation data rows = 8940, columns = 25\n" + ] + } + ], + "source": [ + "# split into training and validation\n", + "train, valid = data.split_frame([0.7], seed=12345)\n", + "\n", + "# summarize split\n", + "print('Train data rows = %d, columns = %d' % (train.shape[0], train.shape[1]))\n", + "print('Validation data rows = %d, columns = %d' % (valid.shape[0], valid.shape[1]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Load Pre-trained Monotonic GBM\n", + "Load the model known as `mgbm5` from the first lecture." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model Details\n", + "=============\n", + "H2OGradientBoostingEstimator : Gradient Boosting Machine\n", + "Model Key: best_mgbm\n", + "\n", + "\n", + "Model Summary: " + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
number_of_treesnumber_of_internal_treesmodel_size_in_bytesmin_depthmax_depthmean_depthmin_leavesmax_leavesmean_leaves
046.046.06939.03.03.03.05.08.07.369565
\n", + "
" + ], + "text/plain": [ + " number_of_trees number_of_internal_trees model_size_in_bytes \\\n", + "0 46.0 46.0 6939.0 \n", + "\n", + " min_depth max_depth mean_depth min_leaves max_leaves mean_leaves \n", + "0 3.0 3.0 3.0 5.0 8.0 7.369565 " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "ModelMetricsBinomial: gbm\n", + "** Reported on train data. **\n", + "\n", + "MSE: 0.13637719864300343\n", + "RMSE: 0.3692928358945018\n", + "LogLoss: 0.4351274080189972\n", + "Mean Per-Class Error: 0.2913939696264273\n", + "AUC: 0.7716491282246187\n", + "pr_auc: 0.5471826859054356\n", + "Gini: 0.5432982564492375\n", + "\n", + "Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.21968260039166268: " + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01ErrorRate
0013482.02814.00.1727(2814.0/16296.0)
111907.02743.00.4101(1907.0/4650.0)
2Total15389.05557.00.2254(4721.0/20946.0)
\n", + "
" + ], + "text/plain": [ + " 0 1 Error Rate\n", + "0 0 13482.0 2814.0 0.1727 (2814.0/16296.0)\n", + "1 1 1907.0 2743.0 0.4101 (1907.0/4650.0)\n", + "2 Total 15389.0 5557.0 0.2254 (4721.0/20946.0)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Maximum Metrics: Maximum metrics at their respective thresholds\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
metricthresholdvalueidx
0max f10.2196830.537474248.0
1max f20.1278590.630227329.0
2max f0point50.4466990.583033147.0
3max accuracy0.4466990.821493147.0
4max precision0.9502471.0000000.0
5max recall0.0506091.000000395.0
6max specificity0.9502471.0000000.0
7max absolute_mcc0.3251590.413494194.0
8max min_per_class_accuracy0.1775420.698495281.0
9max mean_per_class_accuracy0.2196830.708606248.0
\n", + "
" + ], + "text/plain": [ + " metric threshold value idx\n", + "0 max f1 0.219683 0.537474 248.0\n", + "1 max f2 0.127859 0.630227 329.0\n", + "2 max f0point5 0.446699 0.583033 147.0\n", + "3 max accuracy 0.446699 0.821493 147.0\n", + "4 max precision 0.950247 1.000000 0.0\n", + "5 max recall 0.050609 1.000000 395.0\n", + "6 max specificity 0.950247 1.000000 0.0\n", + "7 max absolute_mcc 0.325159 0.413494 194.0\n", + "8 max min_per_class_accuracy 0.177542 0.698495 281.0\n", + "9 max mean_per_class_accuracy 0.219683 0.708606 248.0" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Gains/Lift Table: Avg response rate: 22.20 %, avg score: 22.00 %\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
groupcumulative_data_fractionlower_thresholdliftcumulative_liftresponse_ratescorecumulative_response_ratecumulative_scorecapture_ratecumulative_capture_rategaincumulative_gain
010.0100740.8139273.6078833.6078830.8009480.8434460.8009480.8434460.0363440.036344260.788259260.788259
120.0203380.7955753.5198083.5634320.7813950.8051530.7910800.8241190.0361290.072473251.980795256.343177
230.0303160.7636793.4053283.5113940.7559810.7839700.7795280.8109050.0339780.106452240.532798251.139446
340.0400080.7151383.2618913.4509540.7241380.7398150.7661100.7936840.0316130.138065226.189099245.095388
450.0500810.6644163.1168693.3837550.6919430.6866950.7511920.7721640.0313980.169462211.686898238.375473
560.1000190.5433842.8594633.1219840.6347990.6017940.6930790.6871010.1427960.312258185.946339212.198445
670.1500050.3662372.2242932.8228490.4937920.4469510.6266710.6070760.1111830.423441122.429306182.284922
780.2056720.2927651.5955102.4906590.3542020.3127770.5529250.5274220.0888170.51225859.551043149.065864
890.3012510.1966481.1745042.0730770.2607390.2344990.4602220.4344850.1122580.62451617.450421107.307684
9100.4000290.1738170.8643271.7746040.1918800.1848440.3939610.3728420.0853760.709892-13.56728477.460410
10110.5002860.1514310.7014181.5595370.1557140.1613350.3462160.3304550.0703230.780215-29.85824955.953665
11120.6003060.1312140.6192371.4028700.1374700.1407090.3114360.2988410.0619350.842151-38.07634240.286982
12130.7006590.1147940.5593141.2820500.1241670.1228170.2846140.2736300.0561290.898280-44.06856828.204987
13140.8008210.1022260.3692931.1678870.0819830.1080620.2592700.2529210.0369890.935269-63.07069716.788724
14150.9045640.0918610.4021521.0800660.0892770.0975240.2397740.2350990.0417200.976989-59.7848088.006633
15161.0000000.0348100.2411121.0000000.0535270.0769890.2219990.2200100.0230111.000000-75.8887830.000000
\n", + "
" + ], + "text/plain": [ + " group cumulative_data_fraction lower_threshold lift \\\n", + "0 1 0.010074 0.813927 3.607883 \n", + "1 2 0.020338 0.795575 3.519808 \n", + "2 3 0.030316 0.763679 3.405328 \n", + "3 4 0.040008 0.715138 3.261891 \n", + "4 5 0.050081 0.664416 3.116869 \n", + "5 6 0.100019 0.543384 2.859463 \n", + "6 7 0.150005 0.366237 2.224293 \n", + "7 8 0.205672 0.292765 1.595510 \n", + "8 9 0.301251 0.196648 1.174504 \n", + "9 10 0.400029 0.173817 0.864327 \n", + "10 11 0.500286 0.151431 0.701418 \n", + "11 12 0.600306 0.131214 0.619237 \n", + "12 13 0.700659 0.114794 0.559314 \n", + "13 14 0.800821 0.102226 0.369293 \n", + "14 15 0.904564 0.091861 0.402152 \n", + "15 16 1.000000 0.034810 0.241112 \n", + "\n", + " cumulative_lift response_rate score cumulative_response_rate \\\n", + "0 3.607883 0.800948 0.843446 0.800948 \n", + "1 3.563432 0.781395 0.805153 0.791080 \n", + "2 3.511394 0.755981 0.783970 0.779528 \n", + "3 3.450954 0.724138 0.739815 0.766110 \n", + "4 3.383755 0.691943 0.686695 0.751192 \n", + "5 3.121984 0.634799 0.601794 0.693079 \n", + "6 2.822849 0.493792 0.446951 0.626671 \n", + "7 2.490659 0.354202 0.312777 0.552925 \n", + "8 2.073077 0.260739 0.234499 0.460222 \n", + "9 1.774604 0.191880 0.184844 0.393961 \n", + "10 1.559537 0.155714 0.161335 0.346216 \n", + "11 1.402870 0.137470 0.140709 0.311436 \n", + "12 1.282050 0.124167 0.122817 0.284614 \n", + "13 1.167887 0.081983 0.108062 0.259270 \n", + "14 1.080066 0.089277 0.097524 0.239774 \n", + "15 1.000000 0.053527 0.076989 0.221999 \n", + "\n", + " cumulative_score capture_rate cumulative_capture_rate gain \\\n", + "0 0.843446 0.036344 0.036344 260.788259 \n", + "1 0.824119 0.036129 0.072473 251.980795 \n", + "2 0.810905 0.033978 0.106452 240.532798 \n", + "3 0.793684 0.031613 0.138065 226.189099 \n", + "4 0.772164 0.031398 0.169462 211.686898 \n", + "5 0.687101 0.142796 0.312258 185.946339 \n", + "6 0.607076 0.111183 0.423441 122.429306 \n", + "7 0.527422 0.088817 0.512258 59.551043 \n", + "8 0.434485 0.112258 0.624516 17.450421 \n", + "9 0.372842 0.085376 0.709892 -13.567284 \n", + "10 0.330455 0.070323 0.780215 -29.858249 \n", + "11 0.298841 0.061935 0.842151 -38.076342 \n", + "12 0.273630 0.056129 0.898280 -44.068568 \n", + "13 0.252921 0.036989 0.935269 -63.070697 \n", + "14 0.235099 0.041720 0.976989 -59.784808 \n", + "15 0.220010 0.023011 1.000000 -75.888783 \n", + "\n", + " cumulative_gain \n", + "0 260.788259 \n", + "1 256.343177 \n", + "2 251.139446 \n", + "3 245.095388 \n", + "4 238.375473 \n", + "5 212.198445 \n", + "6 182.284922 \n", + "7 149.065864 \n", + "8 107.307684 \n", + "9 77.460410 \n", + "10 55.953665 \n", + "11 40.286982 \n", + "12 28.204987 \n", + "13 16.788724 \n", + "14 8.006633 \n", + "15 0.000000 " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "ModelMetricsBinomial: gbm\n", + "** Reported on validation data. **\n", + "\n", + "MSE: 0.13326994104124376\n", + "RMSE: 0.3650615578792757\n", + "LogLoss: 0.4278285715046422\n", + "Mean Per-Class Error: 0.2856607030196092\n", + "AUC: 0.7776380047998697\n", + "pr_auc: 0.5486322626112021\n", + "Gini: 0.5552760095997393\n", + "\n", + "Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.27397344199105433: " + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01ErrorRate
006093.0975.00.1379(975.0/7068.0)
11863.01123.00.4345(863.0/1986.0)
2Total6956.02098.00.203(1838.0/9054.0)
\n", + "
" + ], + "text/plain": [ + " 0 1 Error Rate\n", + "0 0 6093.0 975.0 0.1379 (975.0/7068.0)\n", + "1 1 863.0 1123.0 0.4345 (863.0/1986.0)\n", + "2 Total 6956.0 2098.0 0.203 (1838.0/9054.0)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Maximum Metrics: Maximum metrics at their respective thresholds\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
metricthresholdvalueidx
0max f10.2739730.549951217.0
1max f20.1478350.634488307.0
2max f0point50.4366200.590736153.0
3max accuracy0.4569630.825271147.0
4max precision0.9470691.0000000.0
5max recall0.0451061.000000397.0
6max specificity0.9470691.0000000.0
7max absolute_mcc0.3472460.429999184.0
8max min_per_class_accuracy0.1815850.709970275.0
9max mean_per_class_accuracy0.2305180.714339240.0
\n", + "
" + ], + "text/plain": [ + " metric threshold value idx\n", + "0 max f1 0.273973 0.549951 217.0\n", + "1 max f2 0.147835 0.634488 307.0\n", + "2 max f0point5 0.436620 0.590736 153.0\n", + "3 max accuracy 0.456963 0.825271 147.0\n", + "4 max precision 0.947069 1.000000 0.0\n", + "5 max recall 0.045106 1.000000 397.0\n", + "6 max specificity 0.947069 1.000000 0.0\n", + "7 max absolute_mcc 0.347246 0.429999 184.0\n", + "8 max min_per_class_accuracy 0.181585 0.709970 275.0\n", + "9 max mean_per_class_accuracy 0.230518 0.714339 240.0" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Gains/Lift Table: Avg response rate: 21.94 %, avg score: 22.52 %\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
groupcumulative_data_fractionlower_thresholdliftcumulative_liftresponse_ratescorecumulative_response_ratecumulative_scorecapture_ratecumulative_capture_rategaincumulative_gain
010.0111550.8150103.2950553.2950550.7227720.8398580.7227720.8398580.0367570.036757229.505549229.505549
120.0205430.7955753.7007643.4804600.8117650.8056310.7634410.8242170.0347430.071501270.076417248.045999
230.0300420.7835503.6047213.5197490.7906980.7924410.7720590.8141700.0342400.105740260.472142251.974853
340.0400930.7431923.0058763.3909270.6593410.7613350.7438020.8009250.0302110.135952200.587630239.092657
450.0500330.6977023.4445123.4015730.7555560.7230910.7461370.7854610.0342400.170191244.451158240.157260
560.1012810.5531933.1047773.2513940.6810340.6147360.7131950.6990750.1591140.329305210.477654225.139444
670.1503200.3835642.1870462.9041710.4797300.4660670.6370320.6230610.1072510.436556118.704581190.417123
780.2000220.2969151.5804232.5752440.3466670.3278170.5648810.5496980.0785500.51510658.042296157.524427
890.3013030.2035391.1335142.0906160.2486370.2506480.4585780.4491740.1148040.62990913.351366109.061561
9100.4034680.1769700.9610681.8045950.2108110.1871900.3958390.3828360.0981870.728097-3.89319880.459549
10110.5002210.1520280.6557341.5823820.1438360.1635660.3470960.3404240.0634440.791541-34.42660358.238248
11120.5999560.1330090.5553491.4116510.1218160.1416510.3096470.3073810.0553880.846928-44.46507641.165144
12130.7022310.1150620.4923231.2777570.1079910.1235490.2802770.2806070.0503520.897281-50.76768527.775745
13140.8019660.1023800.3534041.1628020.0775190.1078340.2550610.2591210.0352470.932528-64.65959416.280206
14150.9053460.0918610.3799091.0734050.0833330.0975850.2354520.2406750.0392750.971803-62.0090637.340501
15161.0000000.0348100.2978991.0000000.0653440.0768840.2193510.2251720.0281971.000000-70.2101410.000000
\n", + "
" + ], + "text/plain": [ + " group cumulative_data_fraction lower_threshold lift \\\n", + "0 1 0.011155 0.815010 3.295055 \n", + "1 2 0.020543 0.795575 3.700764 \n", + "2 3 0.030042 0.783550 3.604721 \n", + "3 4 0.040093 0.743192 3.005876 \n", + "4 5 0.050033 0.697702 3.444512 \n", + "5 6 0.101281 0.553193 3.104777 \n", + "6 7 0.150320 0.383564 2.187046 \n", + "7 8 0.200022 0.296915 1.580423 \n", + "8 9 0.301303 0.203539 1.133514 \n", + "9 10 0.403468 0.176970 0.961068 \n", + "10 11 0.500221 0.152028 0.655734 \n", + "11 12 0.599956 0.133009 0.555349 \n", + "12 13 0.702231 0.115062 0.492323 \n", + "13 14 0.801966 0.102380 0.353404 \n", + "14 15 0.905346 0.091861 0.379909 \n", + "15 16 1.000000 0.034810 0.297899 \n", + "\n", + " cumulative_lift response_rate score cumulative_response_rate \\\n", + "0 3.295055 0.722772 0.839858 0.722772 \n", + "1 3.480460 0.811765 0.805631 0.763441 \n", + "2 3.519749 0.790698 0.792441 0.772059 \n", + "3 3.390927 0.659341 0.761335 0.743802 \n", + "4 3.401573 0.755556 0.723091 0.746137 \n", + "5 3.251394 0.681034 0.614736 0.713195 \n", + "6 2.904171 0.479730 0.466067 0.637032 \n", + "7 2.575244 0.346667 0.327817 0.564881 \n", + "8 2.090616 0.248637 0.250648 0.458578 \n", + "9 1.804595 0.210811 0.187190 0.395839 \n", + "10 1.582382 0.143836 0.163566 0.347096 \n", + "11 1.411651 0.121816 0.141651 0.309647 \n", + "12 1.277757 0.107991 0.123549 0.280277 \n", + "13 1.162802 0.077519 0.107834 0.255061 \n", + "14 1.073405 0.083333 0.097585 0.235452 \n", + "15 1.000000 0.065344 0.076884 0.219351 \n", + "\n", + " cumulative_score capture_rate cumulative_capture_rate gain \\\n", + "0 0.839858 0.036757 0.036757 229.505549 \n", + "1 0.824217 0.034743 0.071501 270.076417 \n", + "2 0.814170 0.034240 0.105740 260.472142 \n", + "3 0.800925 0.030211 0.135952 200.587630 \n", + "4 0.785461 0.034240 0.170191 244.451158 \n", + "5 0.699075 0.159114 0.329305 210.477654 \n", + "6 0.623061 0.107251 0.436556 118.704581 \n", + "7 0.549698 0.078550 0.515106 58.042296 \n", + "8 0.449174 0.114804 0.629909 13.351366 \n", + "9 0.382836 0.098187 0.728097 -3.893198 \n", + "10 0.340424 0.063444 0.791541 -34.426603 \n", + "11 0.307381 0.055388 0.846928 -44.465076 \n", + "12 0.280607 0.050352 0.897281 -50.767685 \n", + "13 0.259121 0.035247 0.932528 -64.659594 \n", + "14 0.240675 0.039275 0.971803 -62.009063 \n", + "15 0.225172 0.028197 1.000000 -70.210141 \n", + "\n", + " cumulative_gain \n", + "0 229.505549 \n", + "1 248.045999 \n", + "2 251.974853 \n", + "3 239.092657 \n", + "4 240.157260 \n", + "5 225.139444 \n", + "6 190.417123 \n", + "7 157.524427 \n", + "8 109.061561 \n", + "9 80.459549 \n", + "10 58.238248 \n", + "11 41.165144 \n", + "12 27.775745 \n", + "13 16.280206 \n", + "14 7.340501 \n", + "15 0.000000 " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "Scoring History: " + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
timestampdurationnumber_of_treestraining_rmsetraining_loglosstraining_auctraining_pr_auctraining_lifttraining_classification_errorvalidation_rmsevalidation_loglossvalidation_aucvalidation_pr_aucvalidation_liftvalidation_classification_error
02020-05-28 14:33:2343.415 sec0.00.4155910.5294270.5000000.0000001.0000000.7780010.4138150.5261050.5000000.0000001.0000000.780649
12020-05-28 14:33:2343.443 sec1.00.4078220.5118640.7161310.5347173.4749120.2363700.4055380.5074960.7267310.5371253.4442640.187652
22020-05-28 14:33:2343.467 sec2.00.4014830.4987460.7446460.5321723.5297060.2287310.3988080.4936980.7529090.5345883.4223070.232825
32020-05-28 14:33:2343.489 sec3.00.3964710.4890130.7481890.5356213.5297060.2286360.3933940.4832730.7564480.5356923.4223070.214491
42020-05-28 14:33:2343.515 sec4.00.3924420.4814300.7501210.5353583.5297060.2107800.3890300.4751350.7585110.5360953.4223070.217915
52020-05-28 14:33:2343.535 sec5.00.3891410.4753750.7500580.5351983.5297060.2450590.3854530.4686300.7585050.5356593.4223070.214270
62020-05-28 14:33:2343.570 sec6.00.3863990.4703320.7569860.5350243.5297060.2439610.3824470.4631570.7647220.5360393.4223070.229843
72020-05-28 14:33:2343.592 sec7.00.3841910.4663160.7570050.5354183.5297060.2439610.3800450.4588340.7646340.5364113.4223070.220013
82020-05-28 14:33:2343.614 sec8.00.3823410.4627600.7611060.5401763.5143590.2474460.3780630.4550490.7703400.5420433.4575240.204330
92020-05-28 14:33:2343.639 sec9.00.3807010.4595890.7625150.5408803.5182790.2356540.3761840.4514640.7723580.5435223.4575240.223548
102020-05-28 14:33:2343.668 sec10.00.3792020.4567050.7625220.5414243.5182790.2356060.3745830.4483800.7729820.5438933.4575240.226309
112020-05-28 14:33:2343.697 sec11.00.3780520.4544670.7616480.5415053.5213320.2310230.3733540.4459730.7729250.5445533.4608820.228960
122020-05-28 14:33:2343.729 sec12.00.3770430.4524200.7627670.5416583.5213320.2299720.3721990.4436700.7734120.5431953.4608820.224542
132020-05-28 14:33:2343.762 sec13.00.3761370.4505170.7647950.5432643.5258990.2343170.3713690.4419320.7741610.5436323.4480380.227413
142020-05-28 14:33:2343.796 sec14.00.3753570.4489630.7651450.5431133.5258990.2356540.3705490.4403350.7741760.5432023.4480380.228076
152020-05-28 14:33:2343.848 sec15.00.3746990.4475430.7661180.5440373.5284170.2332190.3699990.4391610.7745920.5437093.4480380.228297
162020-05-28 14:33:2343.903 sec16.00.3740980.4463410.7665290.5438963.5607130.2291610.3693900.4379260.7750210.5448513.4248550.226751
172020-05-28 14:33:2343.949 sec17.00.3735340.4451150.7663120.5442083.5683700.2314520.3688100.4366690.7749270.5459573.4429290.225425
182020-05-28 14:33:2344.004 sec18.00.3731210.4441710.7667850.5447203.5683700.2293520.3684960.4359090.7752560.5455863.4429290.226530
192020-05-28 14:33:2344.054 sec19.00.3727220.4433600.7671450.5450593.5683700.2264390.3680470.4350060.7754740.5459223.4429290.224652
\n", + "
" + ], + "text/plain": [ + " timestamp duration number_of_trees training_rmse \\\n", + "0 2020-05-28 14:33:23 43.415 sec 0.0 0.415591 \n", + "1 2020-05-28 14:33:23 43.443 sec 1.0 0.407822 \n", + "2 2020-05-28 14:33:23 43.467 sec 2.0 0.401483 \n", + "3 2020-05-28 14:33:23 43.489 sec 3.0 0.396471 \n", + "4 2020-05-28 14:33:23 43.515 sec 4.0 0.392442 \n", + "5 2020-05-28 14:33:23 43.535 sec 5.0 0.389141 \n", + "6 2020-05-28 14:33:23 43.570 sec 6.0 0.386399 \n", + "7 2020-05-28 14:33:23 43.592 sec 7.0 0.384191 \n", + "8 2020-05-28 14:33:23 43.614 sec 8.0 0.382341 \n", + "9 2020-05-28 14:33:23 43.639 sec 9.0 0.380701 \n", + "10 2020-05-28 14:33:23 43.668 sec 10.0 0.379202 \n", + "11 2020-05-28 14:33:23 43.697 sec 11.0 0.378052 \n", + "12 2020-05-28 14:33:23 43.729 sec 12.0 0.377043 \n", + "13 2020-05-28 14:33:23 43.762 sec 13.0 0.376137 \n", + "14 2020-05-28 14:33:23 43.796 sec 14.0 0.375357 \n", + "15 2020-05-28 14:33:23 43.848 sec 15.0 0.374699 \n", + "16 2020-05-28 14:33:23 43.903 sec 16.0 0.374098 \n", + "17 2020-05-28 14:33:23 43.949 sec 17.0 0.373534 \n", + "18 2020-05-28 14:33:23 44.004 sec 18.0 0.373121 \n", + "19 2020-05-28 14:33:23 44.054 sec 19.0 0.372722 \n", + "\n", + " training_logloss training_auc training_pr_auc training_lift \\\n", + "0 0.529427 0.500000 0.000000 1.000000 \n", + "1 0.511864 0.716131 0.534717 3.474912 \n", + "2 0.498746 0.744646 0.532172 3.529706 \n", + "3 0.489013 0.748189 0.535621 3.529706 \n", + "4 0.481430 0.750121 0.535358 3.529706 \n", + "5 0.475375 0.750058 0.535198 3.529706 \n", + "6 0.470332 0.756986 0.535024 3.529706 \n", + "7 0.466316 0.757005 0.535418 3.529706 \n", + "8 0.462760 0.761106 0.540176 3.514359 \n", + "9 0.459589 0.762515 0.540880 3.518279 \n", + "10 0.456705 0.762522 0.541424 3.518279 \n", + "11 0.454467 0.761648 0.541505 3.521332 \n", + "12 0.452420 0.762767 0.541658 3.521332 \n", + "13 0.450517 0.764795 0.543264 3.525899 \n", + "14 0.448963 0.765145 0.543113 3.525899 \n", + "15 0.447543 0.766118 0.544037 3.528417 \n", + "16 0.446341 0.766529 0.543896 3.560713 \n", + "17 0.445115 0.766312 0.544208 3.568370 \n", + "18 0.444171 0.766785 0.544720 3.568370 \n", + "19 0.443360 0.767145 0.545059 3.568370 \n", + "\n", + " training_classification_error validation_rmse validation_logloss \\\n", + "0 0.778001 0.413815 0.526105 \n", + "1 0.236370 0.405538 0.507496 \n", + "2 0.228731 0.398808 0.493698 \n", + "3 0.228636 0.393394 0.483273 \n", + "4 0.210780 0.389030 0.475135 \n", + "5 0.245059 0.385453 0.468630 \n", + "6 0.243961 0.382447 0.463157 \n", + "7 0.243961 0.380045 0.458834 \n", + "8 0.247446 0.378063 0.455049 \n", + "9 0.235654 0.376184 0.451464 \n", + "10 0.235606 0.374583 0.448380 \n", + "11 0.231023 0.373354 0.445973 \n", + "12 0.229972 0.372199 0.443670 \n", + "13 0.234317 0.371369 0.441932 \n", + "14 0.235654 0.370549 0.440335 \n", + "15 0.233219 0.369999 0.439161 \n", + "16 0.229161 0.369390 0.437926 \n", + "17 0.231452 0.368810 0.436669 \n", + "18 0.229352 0.368496 0.435909 \n", + "19 0.226439 0.368047 0.435006 \n", + "\n", + " validation_auc validation_pr_auc validation_lift \\\n", + "0 0.500000 0.000000 1.000000 \n", + "1 0.726731 0.537125 3.444264 \n", + "2 0.752909 0.534588 3.422307 \n", + "3 0.756448 0.535692 3.422307 \n", + "4 0.758511 0.536095 3.422307 \n", + "5 0.758505 0.535659 3.422307 \n", + "6 0.764722 0.536039 3.422307 \n", + "7 0.764634 0.536411 3.422307 \n", + "8 0.770340 0.542043 3.457524 \n", + "9 0.772358 0.543522 3.457524 \n", + "10 0.772982 0.543893 3.457524 \n", + "11 0.772925 0.544553 3.460882 \n", + "12 0.773412 0.543195 3.460882 \n", + "13 0.774161 0.543632 3.448038 \n", + "14 0.774176 0.543202 3.448038 \n", + "15 0.774592 0.543709 3.448038 \n", + "16 0.775021 0.544851 3.424855 \n", + "17 0.774927 0.545957 3.442929 \n", + "18 0.775256 0.545586 3.442929 \n", + "19 0.775474 0.545922 3.442929 \n", + "\n", + " validation_classification_error \n", + "0 0.780649 \n", + "1 0.187652 \n", + "2 0.232825 \n", + "3 0.214491 \n", + "4 0.217915 \n", + "5 0.214270 \n", + "6 0.229843 \n", + "7 0.220013 \n", + "8 0.204330 \n", + "9 0.223548 \n", + "10 0.226309 \n", + "11 0.228960 \n", + "12 0.224542 \n", + "13 0.227413 \n", + "14 0.228076 \n", + "15 0.228297 \n", + "16 0.226751 \n", + "17 0.225425 \n", + "18 0.226530 \n", + "19 0.224652 " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "See the whole table with table.as_data_frame()\n", + "\n", + "Variable Importances: " + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
variablerelative_importancescaled_importancepercentage
0PAY_02794.4448241.0000000.693347
1PAY_2307.2373660.1099460.076231
2PAY_3215.1528930.0769930.053383
3PAY_4155.4344480.0556230.038566
4PAY_AMT1127.9863130.0458000.031755
5PAY_5127.5386280.0456400.031644
6PAY_6102.3516010.0366270.025395
7LIMIT_BAL82.4323500.0294990.020453
8PAY_AMT258.9341350.0210900.014623
9PAY_AMT458.8580470.0210630.014604
\n", + "
" + ], + "text/plain": [ + " variable relative_importance scaled_importance percentage\n", + "0 PAY_0 2794.444824 1.000000 0.693347\n", + "1 PAY_2 307.237366 0.109946 0.076231\n", + "2 PAY_3 215.152893 0.076993 0.053383\n", + "3 PAY_4 155.434448 0.055623 0.038566\n", + "4 PAY_AMT1 127.986313 0.045800 0.031755\n", + "5 PAY_5 127.538628 0.045640 0.031644\n", + "6 PAY_6 102.351601 0.036627 0.025395\n", + "7 LIMIT_BAL 82.432350 0.029499 0.020453\n", + "8 PAY_AMT2 58.934135 0.021090 0.014623\n", + "9 PAY_AMT4 58.858047 0.021063 0.014604" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# load saved best model from lecture 1 \n", + "best_mgbm = h2o.load_model('best_mgbm')\n", + "\n", + "# display model details\n", + "best_mgbm" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Select a Probability Cutoff by Maximizing Youden's J Statistic" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Bind model predictions to test data for further calculations" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "gbm prediction progress: |████████████████████████████████████████████████| 100%\n" + ] + } + ], + "source": [ + "# cbind predictions to training frame\n", + "# give them a nice name\n", + "yhat_name = 'p_DEFAULT_NEXT_MONTH'\n", + "preds1 = valid['ID'].cbind(best_mgbm.predict(valid).drop(['predict', 'p0']))\n", + "preds1.columns = ['ID', yhat_name]\n", + "valid_yhat = valid.cbind(preds1[yhat_name]).as_data_frame()\n", + "valid_yhat.reset_index(drop=True, inplace=True) # necessary for later searches/joins" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Select best cutoff based on Youden's J\n", + "Maximizing Youden's J statistic corresponds to selecting the cutoff where the AUC curve is the farthest from the baseline. This is appropriate for classifiers that were trained to maximize AUC. However, cutoff selection has a strong impact on group fairness measures. Other options for the cutoff are also presented below." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.22\n" + ] + } + ], + "source": [ + "j_frame = evaluate.get_youdens_j(valid_yhat, y_name, yhat_name)\n", + "best_cut = j_frame.loc[j_frame['J'].idxmax(), 'cutoff'] # Find cutoff w/ max F1\n", + "### !!! UNCOMMENT LINES BELOW TO REMEDIATE MINOR FAIRNESS ISSUES !!! ###\n", + "#best_cut = 0.31 # lowest cutoff that passess discrimination tests\n", + "#best_cut = 0.456963 # max accuracy\n", + "#best_cut = 0.347246 # max MCC\n", + "print('%.2f' % best_cut)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Plot ROC Curve\n", + "An receiver operating characteristic (ROC) curve for true positive rate (TPR) and false negative rate (FNR) is a typical way to visualize TPR and FNR for a binomial predictive model. " + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Plot AUROC w/ best cutoff\n", + "title_ = 'AUROC Curve: Cutoff = ' + str(best_cut)\n", + "ax = j_frame.plot(x='FNR', y='TPR', kind='scatter', title=title_, xlim=[0,1])\n", + "_ = ax.axvline(best_cut, color='r')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Report Raw Confusion Matrices\n", + "\n", + "The basic DIA procedure in this notebook is based on measurements found commonly in confusion matrices, so confusion matrices are calculated as a precursor to DIA and to provide a basic summary of the GBM's behavior in general and across men and women." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Overall confusion matrix" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
actual: 1actual: 0
predicted: 111791192
predicted: 08245745
\n", + "
" + ], + "text/plain": [ + " actual: 1 actual: 0\n", + "predicted: 1 1179 1192\n", + "predicted: 0 824 5745" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "evaluate.get_confusion_matrix(valid_yhat, y_name, yhat_name, cutoff=best_cut)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The general confusion matrix shows that the GBM is more accurate than not because the true positive and true negative cells contain the largest values by far. But the GBM seems to make a larger number of type II errors or false negative predictions. False negatives can be a disparity issue, because for complex reasons, many credit scoring and other models tend to over-estimate the likelihood of non-reference groups - typically people other than white males - to default. This is both a sociological discrimination problem and a financial problem if an unpriviledged group is not recieving the credit they deserve, in favor of undeserving white males. Deserving people miss out on potentially life-changing credit and lenders incur large write-off costs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Report confusion matrices by `SEX`\n", + "\n", + "The only values for `SEX` in the dataset are `female` and `male`. " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['female', 'male']" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sex_levels = list(valid_yhat['SEX'].unique())\n", + "sex_levels" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Confusion matrix for `SEX = male`" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
actual: 1actual: 0
predicted: 1511482
predicted: 03602124
\n", + "
" + ], + "text/plain": [ + " actual: 1 actual: 0\n", + "predicted: 1 511 482\n", + "predicted: 0 360 2124" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "male_cm = evaluate.get_confusion_matrix(valid_yhat, 'DEFAULT_NEXT_MONTH', 'p_DEFAULT_NEXT_MONTH', by='SEX', level='male', cutoff=best_cut)\n", + "male_cm" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Confusion matrix for `SEX = female`" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
actual: 1actual: 0
predicted: 1668710
predicted: 04643621
\n", + "
" + ], + "text/plain": [ + " actual: 1 actual: 0\n", + "predicted: 1 668 710\n", + "predicted: 0 464 3621" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "female_cm = evaluate.get_confusion_matrix(valid_yhat, 'DEFAULT_NEXT_MONTH', 'p_DEFAULT_NEXT_MONTH', by='SEX', level='female', cutoff=best_cut)\n", + "female_cm" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Both confusion matrices reflect the global confusion matrix: more accurate than not with a larger number of false negative predictions (type II errors) than false positive predictions (type I errors)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Disparate Accuracies and Errors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To perform the following basic DIA many different values from the confusion matrices reflecting different prediction behavior are calculated. These metrics essentially help us understand the GBM's overall performance and how it behaves when predicting:\n", + "\n", + "* Default correctly\n", + "* Non-default correctly\n", + "* Default incorrectly (type I errors)\n", + "* Non-default incorrectly (type II errors)\n", + "\n", + "In a real-life lending scenario, type I errors essentially amount to false accusations of financial impropriety and type II errors result in awarding loans to undeserving customers. Both types of errors can be costly to the lender too. Type I errors likely result in lost interest and fees. Type II errors often result in write-offs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Calculate and report group fairness metrics" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PrevalenceAccuracyTrue Positive RatePrecisionSpecificityNegative Predicted ValueFalse Positive RateFalse Discovery RateFalse Negative RateFalse Omissions Rate
female0.2072120.7851000.5901060.4847610.8360660.8864140.1639340.5152390.4098940.113586
male0.2505030.7578370.5866820.5146020.8150420.8550720.1849580.4853980.4133180.144928
\n", + "
" + ], + "text/plain": [ + " Prevalence Accuracy True Positive Rate Precision Specificity \\\n", + "female 0.207212 0.785100 0.590106 0.484761 0.836066 \n", + "male 0.250503 0.757837 0.586682 0.514602 0.815042 \n", + "\n", + " Negative Predicted Value False Positive Rate False Discovery Rate \\\n", + "female 0.886414 0.163934 0.515239 \n", + "male 0.855072 0.184958 0.485398 \n", + "\n", + " False Negative Rate False Omissions Rate \n", + "female 0.409894 0.113586 \n", + "male 0.413318 0.144928 " + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cm_dict = {'male': male_cm, 'female': female_cm} # group fairness metrics are based on confusion matrices\n", + "metrics_frame, disp_frame = debug.get_metrics_ratios(cm_dict, 'male') # calculate metrics and ratios\n", + "metrics_frame # display results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From eyeballing the raw metrics it appears that the model is treating men and women roughly similarly as groups. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Plot false omissions rate by `SEX`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Because the confusion matrices indicated there might be a problem with non-default predictions, false omissions rate will be examined closely. False omissions measures how many customers the model predicted *incorrectly* would not default, out of the customers in the group the model *predicted* would not default. " + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "_ = metrics_frame['False Omissions Rate'].plot(kind='bar', color='b', title='False Omissions Rate')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Calculate and report disparity\n", + "To calculate disparity we compare the metrics to a user-defined reference level and to user-defined thresholds. In this case, we take the class of people who seem most priviledged as the reference level, i.e. `SEX = male`. (Usually the reference level would be `race = white` or `sex = male`.) According to the four-fifths rule (https://en.wikipedia.org/wiki/Disparate_impact#The_80%_rule) thresholds are set such that metrics 20% lower or higher than the reference level metric will be flagged as disparate. **Technically, the four-fifths rule only applies to the adverse impact ratio, discussed further below, but it will be applied to all other displayed metrics here as a rule of thumb.**" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Prevalence RatioAccuracy RatioTrue Positive Rate RatioPrecision RatioSpecificity RatioNegative Predicted Value RatioFalse Positive Rate RatioFalse Discovery Rate RatioFalse Negative Rate RatioFalse Omissions Rate Ratio
female0.8271831.035971.005840.942011.025791.036650.8863341.061480.9917160.783745
male1111111111
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "parity_threshold_low = 0.8 # user-defined low threshold value\n", + "parity_threshold_hi = 1.25 # user-defined high threshold value\n", + "\n", + "\n", + "# small utility function to format pandas table output\n", + "def disparate_red(val):\n", + " \n", + " color = 'blue' if (parity_threshold_low < val < parity_threshold_hi) else 'red'\n", + " return 'color: %s' % color \n", + "\n", + "# display results\n", + "disp_frame.style.applymap(disparate_red)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For the selected thresholds, the GBM appears to have only one value for metrics that is low out-of-range, false omissions rate. The flagged false omissions rate disparity indicates males may be receiving too many loans they cannot pay back, potentially preventing females from recieving these loans." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Plot false omissions rate disparity" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "ax = disp_frame['False Omissions Rate Ratio'].plot(kind='bar', color='b', title='False Omissions Rate Ratio')\n", + "_ = ax.axhline(parity_threshold_low, color='r', linestyle='--')\n", + "_ = ax.axhline(parity_threshold_hi, color='r', linestyle='--')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The model is suffering from a minor disparity problem due to it's propensity to make false negative predictions for males. To address such discrimination, users could tune the GBM variables, cutoff or regularization, could try new methods for reweighing data prior to model training, try new modeling methods specifically designed for fairness, or post-process the decisions. Before attempting remediation here, more traditional fair lending measures will be assessed and, local, or individual, fairness will also be investigated. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Traditional Fair Lending Measures\n", + "\n", + "Along with adverse impact ratio (AIR), several measures have long-standing legal precedence in fair lending, including marginal effect and standardized mean difference. These measures are calculated and discussed here." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Calculate adverse impact ratio (AIR)\n", + "AIR is perhaps the most well-known discrimination measure. It was first delineated by the U.S. Equal Employment Opportunity Commission (EEOC) and AIR is associated with the convenient 4/5ths, or 0.8, cutoff threshold. AIR values below 0.8 can be considered evidence of illegal discrimination in many lending or employment scenarios in the U.S." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Male proportion accepted: 0.714\n", + "Female proportion accepted: 0.748\n", + "Adverse impact ratio: 1.047\n" + ] + } + ], + "source": [ + "print('Adverse impact ratio: %.3f' % debug.air(cm_dict, 'male', 'female'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Typical desirable ranges of AIR are above the 0.8 marker set by the 4/5ths rule. Here we see an almost ideal result where the protected and reference groups have very similar acceptance rates and AIR is near 1. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Calculate marginal effect\n", + "Marginal effect describes the difference between the percent of the reference group awarded a loan and the percent of the protected group awarded a loan under a model. " + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Male accepted: 71.44%\n", + "Female accepted: 74.78%\n", + "Marginal effect: -3.33%\n" + ] + } + ], + "source": [ + "print('Marginal effect: %.2f%%' % debug.marginal_effect(cm_dict, 'male', 'female'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "About 77% of men are awarded a loan by the model. About 79.6% of women are awarded a loan. This results in a marginal effect of -3.33%. Given that the marginal effect is negative, indicating that a higher percentage of individuals in the protected group were awarded a loan than in the reference group, this value would likely not indicate a discrimination problem in most scenarios. The magnitude of the marginal effect is also relatively small, another sign that discrimination concerning SEX is low under the model. Generally, larger marginal effects may be tolerated in newer credit products, whereas smaller marginal effects are expected in established credit products." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Calculate standardized mean difference\n", + "The standardized mean difference (SMD), i.e. Cohen's D, is the mean value of the prediction for the protected group minus the mean prediction for the reference group, all divided by the standard deviation of the prediction. Like AIR, SMD has some prescribed thresholds: 0.2, 0.5, and 0.8 for small, medium, and large differences, respectively. The standardized mean difference can also be used on continuous values like credit limits or loan amounts." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Male mean yhat: 0.23\n", + "Female mean yhat: 0.21\n", + "P_Default_Next_Month std. dev.: 0.18\n", + "Standardized Mean Difference: -0.08\n" + ] + } + ], + "source": [ + "print('Standardized Mean Difference: %.2f' % debug.smd(valid_yhat, 'SEX', yhat_name, 'male', 'female'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For this model, in the validation set, men receive a higher average probability of default than do women. This difference is evident even after standardizing with the standard deviation of the predictions. However, the difference is quite small, below the 0.2 threshold for a small difference. SMD also points to low disparity between men and women under this model." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Investigate Individual Disparity " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Similar people can be treated differenly by the model, so even if the model is mostly fair for most kinds of people, there could still be people the model treated unfairly. This could occur for multiple reasons, including the functional form of the learned model or because different variables are combined by the model to represent strong signals. If a variable is important in a dataset, model, or problem domain it's likely that a nonlinear model will find combinations of other variables to act as proxies for the problematic variable -- potentially even different combinations for different rows of data! So by simply testing for group fairness, you may miss instances of individual discrimination." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Augment predictions with decisions and logloss residuals for women with false positive predictions\n", + "In this notebook, residuals for false positive predictions for women will be examined in an attempt to locate any individual instances of model discrimination. These are women who the model said would default, but they did not default. So they may have experienced some discrimination under the model." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "valid_yhat_female = valid_yhat[valid_yhat['SEX'] == 'female'].copy(deep=True)\n", + "\n", + "\n", + "valid_yhat_female['d_DEFAULT_NEXT_MONTH'] = 0\n", + "valid_yhat_female.loc[valid_yhat_female[yhat_name] > best_cut, 'd_DEFAULT_NEXT_MONTH'] = 1\n", + "\n", + "valid_yhat_female['r_DEFAULT_NEXT_MONTH'] = -valid_yhat_female[y_name]*np.log(valid_yhat_female[yhat_name]) -\\\n", + " (1 - valid_yhat_female[y_name])*np.log(1 - valid_yhat_female[yhat_name]) \n", + " \n", + "valid_yhat_female_fp = valid_yhat_female[(valid_yhat_female[y_name] == 0) &\\\n", + " (valid_yhat_female['d_DEFAULT_NEXT_MONTH'] == 1)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Plot logloss residuals\n", + "Residuals are a common way to visualize the errors of a model, and in just two dimensions." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# initialize figure\n", + "fig, ax_ = plt.subplots(figsize=(8, 8)) \n", + "\n", + "ax_.plot(valid_yhat_female_fp[yhat_name],\n", + " valid_yhat_female_fp['r_DEFAULT_NEXT_MONTH'],\n", + " marker='o', linestyle='', alpha=0.3)\n", + "\n", + "# annotate plot\n", + "_ = plt.xlabel(yhat_name)\n", + "_ = plt.ylabel('r_DEFAULT_NEXT_MONTH')\n", + "_ = ax_.legend(loc=4)\n", + "_ = plt.title('Logloss Residuals')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Examine low logloss residual individuals" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Low residual individuals near the probability cutoff are some of the most likely to be treated differently from individuals who received the credit product. Those individuals are displayed below." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6DEFAULT_NEXT_MONTHp_DEFAULT_NEXT_MONTHd_DEFAULT_NEXT_MONTHr_DEFAULT_NEXT_MONTH
573519327230000femaleuniversitymarried401-2-1-1-1-100350442722977190003504427229771900000.22003710.248509
787026436160000femaleuniversitysingle441-2-1-1-1-1-4-4345403312003458033120000.22003710.248509
30931038720000femalegraduate schoolsingle29-1-1-1-1-1-1342792677-12138567926770214856000.22029610.248841
2821946650000femaleuniversitymarried3400200011367109821024310826116991014630001000100010002000200000.22032710.248881
15150130000femaleuniversitymarried3800002220344217052253724161251282457620001500200015000120000.22059210.249220
653621992150000femalegraduate schoolsingle29-12-1-1002599153039013667800039013660043100.22083210.249528
82442765250000femalegraduate schoolsingle231-1-1-2-2-2-697113610000120580000000.22090910.249628
625205850000femalehigh schoolmarried51002000447664804746640405511939804000081110000000.22093910.249665
17558620000femalegraduate schoolsingle25000002146031566116394167231805617618160013006001600080000.22096910.249705
776826079210000femaleuniversitysingle361-2-2-2-1000212306613206105830212306613206212000.22102610.249778
568219140260000femaleuniversitymarried3711-2-2-105917-15910-15910-1591024090139770004000050765600.22102610.249778
429314369450000femalegraduate schoolsingle541-2-2-1-1-1-237-2400-24003990300509993006390300509993000.22102610.249778
866329086220000femalegraduate schoolmarried561-2-2-2-1-100005889300000588930016500.22102610.249778
456315316300000femalegraduate schoolsingle351-2-2-2-1-100001500000000150000026700.22102610.249778
579219497150000femalegraduate schoolmarried491-2-2-1-1-100012200169613000001220016961300049300.22102610.249778
587419793320000femalegraduate schoolmarried451-2-2-1-1-1000370930100037093010000.22102610.249778
2716911630000femaleuniversitysingle2200-1-12217358-3623114247272419225905023150200002119000.22103610.249790
31941071530000femaleuniversitymarried380000207979904610279106101033987101200142375401000300000.22181510.250791
1460490530000femaleuniversitysingle2200002025536266352738329061284922985018001500210001800100000.22181510.250791
407513647240000femaleothersingle36-2-1200-2-2351765871871-155-1552000001550000.22250510.251678
30361014820000femalehigh schoolsingle3500020010704113521329712418125811230515172852050070050000.22377210.253309
86782912520000femalehigh schoolsingle56000200114711218815074144261452615026121431000500500000.22377210.253309
18056061170000femalegraduate schoolsingle36-1-12-1-1-12488324162304323-109214160230400000.22383610.253391
65432202130000femaleuniversitysingle2500200052938107773810179119021310930000300020001500100000.22387810.253445
779026150250000femalegraduate schoolsingle39-1-1-122212368174240292396002130411851742396007118505441600.22396710.253560
408013673180000femaleuniversitysingle27002000531965315446485339653310230002881432463300.22400910.253615
384212864220000femalegraduate schoolsingle36002000130434111152251983974-38219013980200776200.22400910.253615
6972334210000femalegraduate schoolmarried38-1-122-1-123604707446509306585544650093065855984000.22444010.254171
33161112120000femaleuniversitysingle23-1-1-1-1-1-139039039039039039039039039039039039000.22525610.255223
84692839250000femalehigh schoolmarried4200200028693318543106531976293003115136440176610362700000.22528710.255263
.......................................................................................
780226196180000femalegraduate schoolsingle271-1-1-1-1-201530016100015300161000000.25647410.296352
790426540200000femalehigh schoolmarried511-1-1-2-2-201221000012210000000.25647410.296352
776726076430000femalegraduate schoolmarried381-1-1-2-2-1-469215-101-1217-23332351100000050001200000.25647410.296352
1900636940000femalegraduate schoolsingle2600022220936206472156118438206191758230004000050000800000.25699610.297054
1506505050000femalehigh schoolsingle6100022036205386094299643992431554443630005064200002000200000.25766610.297956
480116151200000femaleuniversitymarried4800-122226465204358475816210962735010008475030000750500.25795710.298348
17896011450000femalegraduate schoolsingle351-2-100-2003581365400035817300000.25825910.298755
75002520030000femaleuniversitysingle241-1-1-2-2-1-5565-25-254897668557025049226788000.25906310.299840
45401524580000femaleuniversitymarried37-10022228652298693307133531341843350320004000130015000130000.25924010.300079
124413230000femalegraduate schoolmarried321-2-1-1-1-1002809459579234040280946067923404000.25936110.300241
456115312240000femalehigh schoolsingle371-2-1-1-1-1-3-32486250058049002489250058049017700.25936110.300241
349411711140000femaleuniversitysingle38002002154381214011225668869086438262003104000143500.26042710.301682
562118948290000femaleuniversitymarried3212000014685313975613601213646613692913217915035400520050004667500000.26052210.301810
2752924590000femaleuniversitymarried321-1-1-1-1-1032302520165103230252016510000.26062010.301943
21187108100000femalegraduate schoolsingle281-1-1-2-2-22909309770001534977000000.26122010.302756
42691429930000femaleuniversitymarried3700022226168274423016529383327393314320003500040001150100000.26163610.303319
434139020000femaleuniversitysingle220020021119215677151331558216677162595000010001500070000.26257010.304585
79222657650000femalehigh schoolsingle51-1-12-1-1-1390780390390390132078003903901320000.26314610.305366
1354180000femalegraduate schoolsingle2512000041402417424275843510444204531913002010176217621790162200.26355110.305915
26384180000femaleuniversitysingle2212000079318794597978829069265512710920002201200010001000200000.26384310.306312
698923529450000femalegraduate schoolsingle361-2-1-1-1-259099646131797679106439686151834682106973045100.26450510.307211
350811755160000femaleuniversitysingle271-2-100-200620462040006204000000.26471910.307503
15105055200000femaleuniversitysingle311-2-1-1-1-100637219570596063721957059678900.26471910.307503
524717669210000femalegraduate schoolmarried411-2-1-1-2-1-28-283330003000335800300000.26471910.307503
445714901150000femalegraduate schoolsingle37-122-1-1-2180158214023244-13-1314020324400000.26578710.308956
570819245100000femalehigh schoolmarried42-122-2-2-276892052128137702056562801281378620565634601300.26584810.309039
1306434730000femalehigh schoolmarried44-1-1200010725122444040503660366044400000000.26631410.309674
1919642030000femaleuniversitysingle2700200028392314393039429994298341596035000000000.26631410.309674
43151444150000femalegraduate schoolsingle331-2-100-10012493283656300999012493160000999000.26748710.311274
673922665450000femalegraduate schoolmarried331-1200-1621113682370282322434322114200224304322000.26776910.311659
\n", + "

150 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + " ID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 PAY_2 \\\n", + "5735 19327 230000 female university married 40 1 -2 \n", + "7870 26436 160000 female university single 44 1 -2 \n", + "3093 10387 20000 female graduate school single 29 -1 -1 \n", + "2821 9466 50000 female university married 34 0 0 \n", + "151 501 30000 female university married 38 0 0 \n", + "6536 21992 150000 female graduate school single 29 -1 2 \n", + "8244 27652 50000 female graduate school single 23 1 -1 \n", + "625 2058 50000 female high school married 51 0 0 \n", + "175 586 20000 female graduate school single 25 0 0 \n", + "7768 26079 210000 female university single 36 1 -2 \n", + "5682 19140 260000 female university married 37 1 1 \n", + "4293 14369 450000 female graduate school single 54 1 -2 \n", + "8663 29086 220000 female graduate school married 56 1 -2 \n", + "4563 15316 300000 female graduate school single 35 1 -2 \n", + "5792 19497 150000 female graduate school married 49 1 -2 \n", + "5874 19793 320000 female graduate school married 45 1 -2 \n", + "2716 9116 30000 female university single 22 0 0 \n", + "3194 10715 30000 female university married 38 0 0 \n", + "1460 4905 30000 female university single 22 0 0 \n", + "4075 13647 240000 female other single 36 -2 -1 \n", + "3036 10148 20000 female high school single 35 0 0 \n", + "8678 29125 20000 female high school single 56 0 0 \n", + "1805 6061 170000 female graduate school single 36 -1 -1 \n", + "6543 22021 30000 female university single 25 0 0 \n", + "7790 26150 250000 female graduate school single 39 -1 -1 \n", + "4080 13673 180000 female university single 27 0 0 \n", + "3842 12864 220000 female graduate school single 36 0 0 \n", + "697 2334 210000 female graduate school married 38 -1 -1 \n", + "3316 11121 20000 female university single 23 -1 -1 \n", + "8469 28392 50000 female high school married 42 0 0 \n", + "... ... ... ... ... ... ... ... ... \n", + "7802 26196 180000 female graduate school single 27 1 -1 \n", + "7904 26540 200000 female high school married 51 1 -1 \n", + "7767 26076 430000 female graduate school married 38 1 -1 \n", + "1900 6369 40000 female graduate school single 26 0 0 \n", + "1506 5050 50000 female high school single 61 0 0 \n", + "4801 16151 200000 female university married 48 0 0 \n", + "1789 6011 450000 female graduate school single 35 1 -2 \n", + "7500 25200 30000 female university single 24 1 -1 \n", + "4540 15245 80000 female university married 37 -1 0 \n", + "124 413 230000 female graduate school married 32 1 -2 \n", + "4561 15312 240000 female high school single 37 1 -2 \n", + "3494 11711 140000 female university single 38 0 0 \n", + "5621 18948 290000 female university married 32 1 2 \n", + "2752 9245 90000 female university married 32 1 -1 \n", + "2118 7108 100000 female graduate school single 28 1 -1 \n", + "4269 14299 30000 female university married 37 0 0 \n", + "434 1390 20000 female university single 22 0 0 \n", + "7922 26576 50000 female high school single 51 -1 -1 \n", + "13 54 180000 female graduate school single 25 1 2 \n", + "263 841 80000 female university single 22 1 2 \n", + "6989 23529 450000 female graduate school single 36 1 -2 \n", + "3508 11755 160000 female university single 27 1 -2 \n", + "1510 5055 200000 female university single 31 1 -2 \n", + "5247 17669 210000 female graduate school married 41 1 -2 \n", + "4457 14901 150000 female graduate school single 37 -1 2 \n", + "5708 19245 100000 female high school married 42 -1 2 \n", + "1306 4347 30000 female high school married 44 -1 -1 \n", + "1919 6420 30000 female university single 27 0 0 \n", + "4315 14441 50000 female graduate school single 33 1 -2 \n", + "6739 22665 450000 female graduate school married 33 1 -1 \n", + "\n", + " PAY_3 PAY_4 PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 \\\n", + "5735 -1 -1 -1 -1 0 0 3504 4272 \n", + "7870 -1 -1 -1 -1 -4 -4 3454 0 \n", + "3093 -1 -1 -1 -1 342 792 677 -1 \n", + "2821 2 0 0 0 11367 10982 10243 10826 \n", + "151 0 0 2 2 20344 21705 22537 24161 \n", + "6536 -1 -1 0 0 2599 1530 390 1366 \n", + "8244 -1 -2 -2 -2 -697 11361 0 0 \n", + "625 2 0 0 0 44766 48047 46640 40551 \n", + "175 0 0 0 2 14603 15661 16394 16723 \n", + "7768 -2 -2 -1 0 0 0 212 3066 \n", + "5682 -2 -2 -1 0 5917 -15910 -15910 -15910 \n", + "4293 -2 -1 -1 -1 -237 -2400 -2400 3990 \n", + "8663 -2 -2 -1 -1 0 0 0 0 \n", + "4563 -2 -2 -1 -1 0 0 0 0 \n", + "5792 -2 -1 -1 -1 0 0 0 12200 \n", + "5874 -2 -1 -1 -1 0 0 0 370 \n", + "2716 -1 -1 2 2 17358 -36 23114 24727 \n", + "3194 0 0 2 0 7979 9046 10279 10610 \n", + "1460 0 0 2 0 25536 26635 27383 29061 \n", + "4075 2 0 0 -2 -235 1765 871 871 \n", + "3036 0 2 0 0 10704 11352 13297 12418 \n", + "8678 0 2 0 0 11471 12188 15074 14426 \n", + "1805 2 -1 -1 -1 248 832 416 2304 \n", + "6543 2 0 0 0 5293 8107 7738 10179 \n", + "7790 -1 2 2 2 12368 1742 40292 39600 \n", + "4080 2 0 0 0 5319 6531 5446 4853 \n", + "3842 2 0 0 0 13043 4111 1522 5198 \n", + "697 2 2 -1 -1 2360 4707 4465 0 \n", + "3316 -1 -1 -1 -1 390 390 390 390 \n", + "8469 2 0 0 0 28693 31854 31065 31976 \n", + "... ... ... ... ... ... ... ... ... \n", + "7802 -1 -1 -1 -2 0 1530 0 1610 \n", + "7904 -1 -2 -2 -2 0 1221 0 0 \n", + "7767 -1 -2 -2 -1 -469 215 -101 -1217 \n", + "1900 0 2 2 2 20936 20647 21561 18438 \n", + "1506 0 2 2 0 36205 38609 42996 43992 \n", + "4801 -1 2 2 2 26465 20435 8475 8162 \n", + "1789 -1 0 0 -2 0 0 3581 3654 \n", + "7500 -1 -2 -2 -1 -5 565 -25 -25 \n", + "4540 0 2 2 2 28652 29869 33071 33531 \n", + "124 -1 -1 -1 -1 0 0 2809 4595 \n", + "4561 -1 -1 -1 -1 -3 -3 2486 2500 \n", + "3494 2 0 0 2 15438 12140 11225 6688 \n", + "5621 0 0 0 0 146853 139756 136012 136466 \n", + "2752 -1 -1 -1 -1 0 323 0 2520 \n", + "2118 -1 -2 -2 -2 290 930 977 0 \n", + "4269 0 2 2 2 26168 27442 30165 29383 \n", + "434 2 0 0 2 11192 15677 15133 15582 \n", + "7922 2 -1 -1 -1 390 780 390 390 \n", + "13 0 0 0 0 41402 41742 42758 43510 \n", + "263 0 0 0 0 79318 79459 79788 29069 \n", + "6989 -1 -1 -1 -2 5909 964 613 1797 \n", + "3508 -1 0 0 -2 0 0 6204 6204 \n", + "1510 -1 -1 -1 -1 0 0 6372 1957 \n", + "5247 -1 -1 -2 -1 -28 -28 3330 0 \n", + "4457 2 -1 -1 -2 180 1582 1402 3244 \n", + "5708 2 -2 -2 -2 7689 2052 1281 3770 \n", + "1306 2 0 0 0 1072 5122 4440 4050 \n", + "1919 2 0 0 0 28392 31439 30394 29994 \n", + "4315 -1 0 0 -1 0 0 12493 28365 \n", + "6739 2 0 0 -1 621 11368 2370 2823 \n", + "\n", + " BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 \\\n", + "5735 2977 1900 0 3504 4272 2977 1900 \n", + "7870 3312 0 0 3458 0 3312 0 \n", + "3093 213 856 792 677 0 214 856 \n", + "2821 11699 10146 3000 1000 1000 1000 2000 \n", + "151 25128 24576 2000 1500 2000 1500 0 \n", + "6536 780 0 0 390 1366 0 0 \n", + "8244 0 0 12058 0 0 0 0 \n", + "625 19398 0 4000 0 811 1000 0 \n", + "175 18056 17618 1600 1300 600 1600 0 \n", + "7768 13206 10583 0 212 3066 13206 212 \n", + "5682 24090 13977 0 0 0 40000 507 \n", + "4293 30050 9993 0 0 6390 30050 9993 \n", + "8663 5889 300 0 0 0 5889 300 \n", + "4563 150000 0 0 0 0 150000 0 \n", + "5792 16961 3000 0 0 12200 16961 3000 \n", + "5874 9301 0 0 0 370 9301 0 \n", + "2716 24192 25905 0 23150 2000 0 2119 \n", + "3194 10339 8710 1200 1423 754 0 1000 \n", + "1460 28492 29850 1800 1500 2100 0 1800 \n", + "4075 -155 -155 2000 0 0 155 0 \n", + "3036 12581 12305 1517 2852 0 500 700 \n", + "8678 14526 15026 1214 3100 0 500 500 \n", + "1805 323 -1092 1416 0 2304 0 0 \n", + "6543 11902 13109 3000 0 3000 2000 1500 \n", + "7790 21304 1185 1742 39600 7 1185 0 \n", + "4080 3965 3310 2300 0 288 143 246 \n", + "3842 3974 -38 2190 1 3980 20 0 \n", + "697 9306 5855 4465 0 0 9306 5855 \n", + "3316 390 390 390 390 390 390 390 \n", + "8469 29300 31151 3644 0 1766 1036 2700 \n", + "... ... ... ... ... ... ... ... \n", + "7802 0 0 1530 0 1610 0 0 \n", + "7904 0 0 1221 0 0 0 0 \n", + "7767 -2333 2351 1000 0 0 0 5000 \n", + "1900 20619 17582 3000 4000 0 5000 0 \n", + "1506 43155 44436 3000 5064 2000 0 2000 \n", + "4801 10962 7350 1000 8475 0 3000 0 \n", + "1789 0 0 0 3581 73 0 0 \n", + "7500 4897 6685 570 25 0 4922 6788 \n", + "4540 34184 33503 2000 4000 1300 1500 0 \n", + "124 792 3404 0 2809 4606 792 3404 \n", + "4561 580 490 0 2489 2500 580 490 \n", + "3494 6908 6438 2620 0 310 400 0 \n", + "5621 136929 132179 1503 5400 5200 5000 4667 \n", + "2752 1651 0 323 0 2520 1651 0 \n", + "2118 0 0 1534 977 0 0 0 \n", + "4269 32739 33143 2000 3500 0 4000 1150 \n", + "434 16677 16259 5000 0 1000 1500 0 \n", + "7922 390 1320 780 0 390 390 1320 \n", + "13 44420 45319 1300 2010 1762 1762 1790 \n", + "263 26551 27109 2000 2201 2000 1000 1000 \n", + "6989 679 10643 968 615 1834 682 10697 \n", + "3508 0 0 0 6204 0 0 0 \n", + "1510 0 596 0 6372 1957 0 596 \n", + "5247 0 300 0 3358 0 0 300 \n", + "4457 -13 -13 1402 0 3244 0 0 \n", + "5708 2056 5628 0 1281 3786 2056 5634 \n", + "1306 3660 3660 4440 0 0 0 0 \n", + "1919 29834 15960 3500 0 0 0 0 \n", + "4315 6300 999 0 12493 16000 0 999 \n", + "6739 2243 4322 11420 0 2243 0 4322 \n", + "\n", + " PAY_AMT6 DEFAULT_NEXT_MONTH p_DEFAULT_NEXT_MONTH \\\n", + "5735 0 0 0.220037 \n", + "7870 0 0 0.220037 \n", + "3093 0 0 0.220296 \n", + "2821 2000 0 0.220327 \n", + "151 1200 0 0.220592 \n", + "6536 431 0 0.220832 \n", + "8244 0 0 0.220909 \n", + "625 0 0 0.220939 \n", + "175 800 0 0.220969 \n", + "7768 0 0 0.221026 \n", + "5682 656 0 0.221026 \n", + "4293 0 0 0.221026 \n", + "8663 165 0 0.221026 \n", + "4563 267 0 0.221026 \n", + "5792 493 0 0.221026 \n", + "5874 0 0 0.221026 \n", + "2716 0 0 0.221036 \n", + "3194 3000 0 0.221815 \n", + "1460 1000 0 0.221815 \n", + "4075 0 0 0.222505 \n", + "3036 500 0 0.223772 \n", + "8678 0 0 0.223772 \n", + "1805 0 0 0.223836 \n", + "6543 1000 0 0.223878 \n", + "7790 54416 0 0.223967 \n", + "4080 33 0 0.224009 \n", + "3842 7762 0 0.224009 \n", + "697 9840 0 0.224440 \n", + "3316 390 0 0.225256 \n", + "8469 0 0 0.225287 \n", + "... ... ... ... \n", + "7802 0 0 0.256474 \n", + "7904 0 0 0.256474 \n", + "7767 12000 0 0.256474 \n", + "1900 8000 0 0.256996 \n", + "1506 2000 0 0.257666 \n", + "4801 7505 0 0.257957 \n", + "1789 0 0 0.258259 \n", + "7500 0 0 0.259063 \n", + "4540 1300 0 0.259240 \n", + "124 0 0 0.259361 \n", + "4561 177 0 0.259361 \n", + "3494 1435 0 0.260427 \n", + "5621 5000 0 0.260522 \n", + "2752 0 0 0.260620 \n", + "2118 0 0 0.261220 \n", + "4269 1000 0 0.261636 \n", + "434 700 0 0.262570 \n", + "7922 0 0 0.263146 \n", + "13 1622 0 0.263551 \n", + "263 2000 0 0.263843 \n", + "6989 30451 0 0.264505 \n", + "3508 0 0 0.264719 \n", + "1510 789 0 0.264719 \n", + "5247 0 0 0.264719 \n", + "4457 0 0 0.265787 \n", + "5708 6013 0 0.265848 \n", + "1306 0 0 0.266314 \n", + "1919 0 0 0.266314 \n", + "4315 0 0 0.267487 \n", + "6739 0 0 0.267769 \n", + "\n", + " d_DEFAULT_NEXT_MONTH r_DEFAULT_NEXT_MONTH \n", + "5735 1 0.248509 \n", + "7870 1 0.248509 \n", + "3093 1 0.248841 \n", + "2821 1 0.248881 \n", + "151 1 0.249220 \n", + "6536 1 0.249528 \n", + "8244 1 0.249628 \n", + "625 1 0.249665 \n", + "175 1 0.249705 \n", + "7768 1 0.249778 \n", + "5682 1 0.249778 \n", + "4293 1 0.249778 \n", + "8663 1 0.249778 \n", + "4563 1 0.249778 \n", + "5792 1 0.249778 \n", + "5874 1 0.249778 \n", + "2716 1 0.249790 \n", + "3194 1 0.250791 \n", + "1460 1 0.250791 \n", + "4075 1 0.251678 \n", + "3036 1 0.253309 \n", + "8678 1 0.253309 \n", + "1805 1 0.253391 \n", + "6543 1 0.253445 \n", + "7790 1 0.253560 \n", + "4080 1 0.253615 \n", + "3842 1 0.253615 \n", + "697 1 0.254171 \n", + "3316 1 0.255223 \n", + "8469 1 0.255263 \n", + "... ... ... \n", + "7802 1 0.296352 \n", + "7904 1 0.296352 \n", + "7767 1 0.296352 \n", + "1900 1 0.297054 \n", + "1506 1 0.297956 \n", + "4801 1 0.298348 \n", + "1789 1 0.298755 \n", + "7500 1 0.299840 \n", + "4540 1 0.300079 \n", + "124 1 0.300241 \n", + "4561 1 0.300241 \n", + "3494 1 0.301682 \n", + "5621 1 0.301810 \n", + "2752 1 0.301943 \n", + "2118 1 0.302756 \n", + "4269 1 0.303319 \n", + "434 1 0.304585 \n", + "7922 1 0.305366 \n", + "13 1 0.305915 \n", + "263 1 0.306312 \n", + "6989 1 0.307211 \n", + "3508 1 0.307503 \n", + "1510 1 0.307503 \n", + "5247 1 0.307503 \n", + "4457 1 0.308956 \n", + "5708 1 0.309039 \n", + "1306 1 0.309674 \n", + "1919 1 0.309674 \n", + "4315 1 0.311274 \n", + "6739 1 0.311659 \n", + "\n", + "[150 rows x 28 columns]" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "valid_yhat_female_fp.sort_values(by='r_DEFAULT_NEXT_MONTH', ascending=True).head(n=150)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Examining the low-residual false positives, it can be seen that the cutoff selected by Youden's J is a bit too conservative. Many women just above the cutoff have missed 0-2 payments, and only been late 1-2 months on the few payments they missed, if any. This potential discrimination problem can be remediated by increasing the cutoff in cell 9." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Simple adversarial de-biasing approach\n", + "Create a dataset with the protected variable and the predictions of the model." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Parse progress: |█████████████████████████████████████████████████████████| 100%\n" + ] + } + ], + "source": [ + "adv_valid = h2o.H2OFrame(valid_yhat[['ID', 'SEX', yhat_name]])\n", + "adv_train, adv_valid = adv_valid.split_frame([0.7])\n", + "adv_train" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Train adversarial GBM\n", + "This adversarial GBM tries to predict whether a customer is a man or a woman just from the predictions of `best_mgbm`. The predictions of this adversarial model give a row-by-row measure for whether a prediction is encoding information about `SEX`." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "gbm Grid Build progress: |████████████████████████████████████████████████| 100%\n", + "Adversarial GBM AUC: 0.51\n" + ] + } + ], + "source": [ + "adv_gbm = model.gbm_grid(yhat_name, 'SEX', adv_train, adv_valid, SEED)\n", + "print('Adversarial GBM AUC: %.2f' % adv_gbm.auc(valid=True))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Because the adversarial GBM cannot predict the `SEX` of the customer from the predictions of `best_mgbm`, this is a good sign that `best_mgbm` is not too discriminatory towards women." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Examine a few predictions from the adversarial GBM" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Parse progress: |█████████████████████████████████████████████████████████| 100%\n", + "gbm prediction progress: |████████████████████████████████████████████████| 100%\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6DEFAULT_NEXT_MONTHp_DEFAULT_NEXT_MONTHp_FEMALE_ADVERSARY
787026436160000femaleuniversitysingle441-2-1-1-1-1-4-4345403312003458033120000.2200370.591124
573519327230000femaleuniversitymarried401-2-1-1-1-100350442722977190003504427229771900000.2200370.591124
30931038720000femalegraduate schoolsingle29-1-1-1-1-1-1342792677-12138567926770214856000.2202960.591124
2821946650000femaleuniversitymarried3400200011367109821024310826116991014630001000100010002000200000.2203270.591124
15150130000femaleuniversitymarried3800002220344217052253724161251282457620001500200015000120000.2205920.591124
855828732240000femaleuniversitymarried35-12-1-1-1-1528264264264264414026426426441426410.2208320.591124
457115332180000femalegraduate schoolmarried45-12-1-1-1-11560316316316316316031631631631631610.2208320.591124
653621992150000femalegraduate schoolsingle29-12-1-1002599153039013667800039013660043100.2208320.591124
82442765250000femalegraduate schoolsingle231-1-1-2-2-2-697113610000120580000000.2209090.591124
37381251750000femalehigh schoolsingle5000200047405501382576826142267712717541300110012001000110010.2209390.591124
625205850000femalehigh schoolmarried51002000447664804746640405511939804000081110000000.2209390.591124
17558620000femalegraduate schoolsingle25000002146031566116394167231805617618160013006001600080000.2209690.591124
456315316300000femalegraduate schoolsingle351-2-2-2-1-100001500000000150000026700.2210260.591124
587419793320000femalegraduate schoolmarried451-2-2-1-1-1000370930100037093010000.2210260.591124
568219140260000femaleuniversitymarried3711-2-2-105917-15910-15910-1591024090139770004000050765600.2210260.591124
429314369450000femalegraduate schoolsingle541-2-2-1-1-1-237-2400-24003990300509993006390300509993000.2210260.591124
579219497150000femalegraduate schoolmarried491-2-2-1-1-100012200169613000001220016961300049300.2210260.591124
866329086220000femalegraduate schoolmarried561-2-2-2-1-100005889300000588930016500.2210260.591124
4651479210000femalegraduate schoolsingle301-2-2-2-1-10000495250000495250010.2210260.591124
776826079210000femaleuniversitysingle361-2-2-2-1000212306613206105830212306613206212000.2210260.591124
2716911630000femaleuniversitysingle2200-1-12217358-3623114247272419225905023150200002119000.2210360.591124
1460490530000femaleuniversitysingle2200002025536266352738329061284922985018001500210001800100000.2218150.591124
31941071530000femaleuniversitymarried380000207979904610279106101033987101200142375401000300000.2218150.591124
52241759930000femaleuniversitymarried3000002024061251562594928478277542818618001500300001001250010.2218150.591124
407513647240000femaleothersingle36-2-1200-2-2351765871871-155-1552000001550000.2225050.591124
60302030730000femaleuniversitymarried56000200102321156414099135801365716356151030420600294260010.2237720.591124
86782912520000femalehigh schoolsingle56000200114711218815074144261452615026121431000500500000.2237720.591124
2893966820000femaleuniversitysingle4800020014218152471698616418166151694415622301061060477910.2237720.591124
30361014820000femalehigh schoolsingle3500020010704113521329712418125811230515172852050070050000.2237720.591124
486916387160000femalegraduate schoolsingle29-1-12-10-11116259913021852736354225990185203542010.2238360.591124
....................................................................................
24148115120000femaleuniversitysingle263322321203412548120561395813468614410000240010005725800.8234980.591124
559418868180000femaleuniversitysingle2887654319723119430918998118555918113718400900006000010.8245480.591124
82692773130000femaleuniversitysingle2532222290951029799881170811223120791500018920104270010.8252120.591124
2314774420000femaleuniversitymarried4232222218464194651964919978205122083116008009501000800010.8253600.591124
560518911110000femaleuniversitysingle2932244360060060060060030000000010.8257840.591124
675522725100000femaleuniversitymarried3832233375075075075075075000000150000.8257840.591124
57401934520000femalehigh schoolmarried482222221383614805158651606215509165581500159575001452010.8265920.591124
47201585460000femaleuniversitysingle2732224356670572525576464522629456170021000970000010.8268120.591124
2866959420000femalegraduate schoolsingle2622222230030030030030030000000010.8272600.591124
52211758620000femalehigh schoolmarried3923223215307147691523216677161191654801000200001000100010.8272600.591124
86742911620000femaleuniversitymarried593232248803111371067211201127211194628000100020000010.8368500.591124
67272262120000femaleuniversitymarried434432222844727721270092717026295341710092409648200010.8369220.591124
64382166820000femaleuniversitysingle2432222232232232232232232200000010.8369220.591124
1223401420000femaleuniversitysingle2243222219529189371833519530190762044400150001700010.8369220.591124
30941039020000femaleuniversitymarried2522444416501650165016501650165000000010.8386490.591124
757025434100000femalehigh schoolsingle283225541250125012501250125065000000000.8386910.591124
20365020000femaleuniversitysingle4687654321075207952020619617187371814800000000.8440510.591124
51201723310000femaleuniversitydivorced463222245997575396299328114111065204000023950010.8465970.591124
48751640110000femalehigh schoolsingle4422222210422977510964111531076210126025001000400067210.8479840.591124
573119316110000femalegraduate schoolmarried4132277715015015015015015000000000.8727020.591124
47461594830000femaleuniversitymarried3032277724002400240024002400240000000010.8740110.591124
83222791650000femalegraduate schoolsingle2732277730030030030030030000000010.8740110.591124
17605916110000femalegraduate schoolmarried4122777715015015015015015000000000.8797770.591124
359114770000femalegraduate schoolsingle3122777724002400240024002400240000000010.8810230.591124
929308730000femaleuniversitysingle2422777730030030030030030000000000.8810230.591124
44541489250000femaleuniversitymarried2922777625502550255025502550195000000010.8810230.591124
36531219210000femalehigh schoolmarried552244444204204204204204200000078010.8912810.591124
83622805520000femaleuniversitymarried3232277724002400240024002400240000000010.9189900.591124
52701775710000femalehigh schoolmarried5132277724002400240024002400240000000010.9470690.591124
1865627210000femalegraduate schoolsingle2322777624002400240024002400180000000010.9502470.591124
\n", + "

1378 rows × 27 columns

\n", + "
" + ], + "text/plain": [ + " ID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 PAY_2 \\\n", + "7870 26436 160000 female university single 44 1 -2 \n", + "5735 19327 230000 female university married 40 1 -2 \n", + "3093 10387 20000 female graduate school single 29 -1 -1 \n", + "2821 9466 50000 female university married 34 0 0 \n", + "151 501 30000 female university married 38 0 0 \n", + "8558 28732 240000 female university married 35 -1 2 \n", + "4571 15332 180000 female graduate school married 45 -1 2 \n", + "6536 21992 150000 female graduate school single 29 -1 2 \n", + "8244 27652 50000 female graduate school single 23 1 -1 \n", + "3738 12517 50000 female high school single 50 0 0 \n", + "625 2058 50000 female high school married 51 0 0 \n", + "175 586 20000 female graduate school single 25 0 0 \n", + "4563 15316 300000 female graduate school single 35 1 -2 \n", + "5874 19793 320000 female graduate school married 45 1 -2 \n", + "5682 19140 260000 female university married 37 1 1 \n", + "4293 14369 450000 female graduate school single 54 1 -2 \n", + "5792 19497 150000 female graduate school married 49 1 -2 \n", + "8663 29086 220000 female graduate school married 56 1 -2 \n", + "465 1479 210000 female graduate school single 30 1 -2 \n", + "7768 26079 210000 female university single 36 1 -2 \n", + "2716 9116 30000 female university single 22 0 0 \n", + "1460 4905 30000 female university single 22 0 0 \n", + "3194 10715 30000 female university married 38 0 0 \n", + "5224 17599 30000 female university married 30 0 0 \n", + "4075 13647 240000 female other single 36 -2 -1 \n", + "6030 20307 30000 female university married 56 0 0 \n", + "8678 29125 20000 female high school single 56 0 0 \n", + "2893 9668 20000 female university single 48 0 0 \n", + "3036 10148 20000 female high school single 35 0 0 \n", + "4869 16387 160000 female graduate school single 29 -1 -1 \n", + "... ... ... ... ... ... ... ... ... \n", + "2414 8115 120000 female university single 26 3 3 \n", + "5594 18868 180000 female university single 28 8 7 \n", + "8269 27731 30000 female university single 25 3 2 \n", + "2314 7744 20000 female university married 42 3 2 \n", + "5605 18911 110000 female university single 29 3 2 \n", + "6755 22725 100000 female university married 38 3 2 \n", + "5740 19345 20000 female high school married 48 2 2 \n", + "4720 15854 60000 female university single 27 3 2 \n", + "2866 9594 20000 female graduate school single 26 2 2 \n", + "5221 17586 20000 female high school married 39 2 3 \n", + "8674 29116 20000 female university married 59 3 2 \n", + "6727 22621 20000 female university married 43 4 4 \n", + "6438 21668 20000 female university single 24 3 2 \n", + "1223 4014 20000 female university single 22 4 3 \n", + "3094 10390 20000 female university married 25 2 2 \n", + "7570 25434 100000 female high school single 28 3 2 \n", + "203 650 20000 female university single 46 8 7 \n", + "5120 17233 10000 female university divorced 46 3 2 \n", + "4875 16401 10000 female high school single 44 2 2 \n", + "5731 19316 110000 female graduate school married 41 3 2 \n", + "4746 15948 30000 female university married 30 3 2 \n", + "8322 27916 50000 female graduate school single 27 3 2 \n", + "1760 5916 110000 female graduate school married 41 2 2 \n", + "359 1147 70000 female graduate school single 31 2 2 \n", + "929 3087 30000 female university single 24 2 2 \n", + "4454 14892 50000 female university married 29 2 2 \n", + "3653 12192 10000 female high school married 55 2 2 \n", + "8362 28055 20000 female university married 32 3 2 \n", + "5270 17757 10000 female high school married 51 3 2 \n", + "1865 6272 10000 female graduate school single 23 2 2 \n", + "\n", + " PAY_3 PAY_4 PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 \\\n", + "7870 -1 -1 -1 -1 -4 -4 3454 0 \n", + "5735 -1 -1 -1 -1 0 0 3504 4272 \n", + "3093 -1 -1 -1 -1 342 792 677 -1 \n", + "2821 2 0 0 0 11367 10982 10243 10826 \n", + "151 0 0 2 2 20344 21705 22537 24161 \n", + "8558 -1 -1 -1 -1 528 264 264 264 \n", + "4571 -1 -1 -1 -1 1560 316 316 316 \n", + "6536 -1 -1 0 0 2599 1530 390 1366 \n", + "8244 -1 -2 -2 -2 -697 11361 0 0 \n", + "3738 2 0 0 0 47405 50138 25768 26142 \n", + "625 2 0 0 0 44766 48047 46640 40551 \n", + "175 0 0 0 2 14603 15661 16394 16723 \n", + "4563 -2 -2 -1 -1 0 0 0 0 \n", + "5874 -2 -1 -1 -1 0 0 0 370 \n", + "5682 -2 -2 -1 0 5917 -15910 -15910 -15910 \n", + "4293 -2 -1 -1 -1 -237 -2400 -2400 3990 \n", + "5792 -2 -1 -1 -1 0 0 0 12200 \n", + "8663 -2 -2 -1 -1 0 0 0 0 \n", + "465 -2 -2 -1 -1 0 0 0 0 \n", + "7768 -2 -2 -1 0 0 0 212 3066 \n", + "2716 -1 -1 2 2 17358 -36 23114 24727 \n", + "1460 0 0 2 0 25536 26635 27383 29061 \n", + "3194 0 0 2 0 7979 9046 10279 10610 \n", + "5224 0 0 2 0 24061 25156 25949 28478 \n", + "4075 2 0 0 -2 -235 1765 871 871 \n", + "6030 0 2 0 0 10232 11564 14099 13580 \n", + "8678 0 2 0 0 11471 12188 15074 14426 \n", + "2893 0 2 0 0 14218 15247 16986 16418 \n", + "3036 0 2 0 0 10704 11352 13297 12418 \n", + "4869 2 -1 0 -1 1116 2599 1302 1852 \n", + "... ... ... ... ... ... ... ... ... \n", + "2414 2 2 3 2 12034 12548 12056 13958 \n", + "5594 6 5 4 3 197231 194309 189981 185559 \n", + "8269 2 2 2 2 9095 10297 9988 11708 \n", + "2314 2 2 2 2 18464 19465 19649 19978 \n", + "5605 2 4 4 3 600 600 600 600 \n", + "6755 2 3 3 3 750 750 750 750 \n", + "5740 2 2 2 2 13836 14805 15865 16062 \n", + "4720 2 2 4 3 56670 57252 55764 64522 \n", + "2866 2 2 2 2 300 300 300 300 \n", + "5221 2 2 3 2 15307 14769 15232 16677 \n", + "8674 3 2 2 4 8803 11137 10672 11201 \n", + "6727 3 2 2 2 28447 27721 27009 27170 \n", + "6438 2 2 2 2 322 322 322 322 \n", + "1223 2 2 2 2 19529 18937 18335 19530 \n", + "3094 4 4 4 4 1650 1650 1650 1650 \n", + "7570 2 5 5 4 1250 1250 1250 1250 \n", + "203 6 5 4 3 21075 20795 20206 19617 \n", + "5120 2 2 2 4 5997 5753 9629 9328 \n", + "4875 2 2 2 2 10422 9775 10964 11153 \n", + "5731 2 7 7 7 150 150 150 150 \n", + "4746 2 7 7 7 2400 2400 2400 2400 \n", + "8322 2 7 7 7 300 300 300 300 \n", + "1760 7 7 7 7 150 150 150 150 \n", + "359 7 7 7 7 2400 2400 2400 2400 \n", + "929 7 7 7 7 300 300 300 300 \n", + "4454 7 7 7 6 2550 2550 2550 2550 \n", + "3653 4 4 4 4 420 420 420 420 \n", + "8362 2 7 7 7 2400 2400 2400 2400 \n", + "5270 2 7 7 7 2400 2400 2400 2400 \n", + "1865 7 7 7 6 2400 2400 2400 2400 \n", + "\n", + " BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 \\\n", + "7870 3312 0 0 3458 0 3312 0 \n", + "5735 2977 1900 0 3504 4272 2977 1900 \n", + "3093 213 856 792 677 0 214 856 \n", + "2821 11699 10146 3000 1000 1000 1000 2000 \n", + "151 25128 24576 2000 1500 2000 1500 0 \n", + "8558 264 414 0 264 264 264 414 \n", + "4571 316 316 0 316 316 316 316 \n", + "6536 780 0 0 390 1366 0 0 \n", + "8244 0 0 12058 0 0 0 0 \n", + "3738 26771 27175 4130 0 1100 1200 1000 \n", + "625 19398 0 4000 0 811 1000 0 \n", + "175 18056 17618 1600 1300 600 1600 0 \n", + "4563 150000 0 0 0 0 150000 0 \n", + "5874 9301 0 0 0 370 9301 0 \n", + "5682 24090 13977 0 0 0 40000 507 \n", + "4293 30050 9993 0 0 6390 30050 9993 \n", + "5792 16961 3000 0 0 12200 16961 3000 \n", + "8663 5889 300 0 0 0 5889 300 \n", + "465 49525 0 0 0 0 49525 0 \n", + "7768 13206 10583 0 212 3066 13206 212 \n", + "2716 24192 25905 0 23150 2000 0 2119 \n", + "1460 28492 29850 1800 1500 2100 0 1800 \n", + "3194 10339 8710 1200 1423 754 0 1000 \n", + "5224 27754 28186 1800 1500 3000 0 1001 \n", + "4075 -155 -155 2000 0 0 155 0 \n", + "6030 13657 16356 1510 3042 0 600 2942 \n", + "8678 14526 15026 1214 3100 0 500 500 \n", + "2893 16615 16944 1562 2301 0 610 604 \n", + "3036 12581 12305 1517 2852 0 500 700 \n", + "4869 736 3542 2599 0 1852 0 3542 \n", + "... ... ... ... ... ... ... ... \n", + "2414 13468 6144 1000 0 2400 100 0 \n", + "5594 181137 184009 0 0 0 0 6000 \n", + "8269 11223 12079 1500 0 1892 0 1042 \n", + "2314 20512 20831 1600 800 950 1000 800 \n", + "5605 600 300 0 0 0 0 0 \n", + "6755 750 750 0 0 0 0 0 \n", + "5740 15509 16558 1500 1595 750 0 1452 \n", + "4720 62945 61700 2100 0 9700 0 0 \n", + "2866 300 300 0 0 0 0 0 \n", + "5221 16119 16548 0 1000 2000 0 1000 \n", + "8674 12721 11946 2800 0 1000 2000 0 \n", + "6727 26295 34171 0 0 924 0 9648 \n", + "6438 322 322 0 0 0 0 0 \n", + "1223 19076 20444 0 0 1500 0 1700 \n", + "3094 1650 1650 0 0 0 0 0 \n", + "7570 1250 650 0 0 0 0 0 \n", + "203 18737 18148 0 0 0 0 0 \n", + "5120 11411 10652 0 4000 0 2395 0 \n", + "4875 10762 10126 0 2500 1000 400 0 \n", + "5731 150 150 0 0 0 0 0 \n", + "4746 2400 2400 0 0 0 0 0 \n", + "8322 300 300 0 0 0 0 0 \n", + "1760 150 150 0 0 0 0 0 \n", + "359 2400 2400 0 0 0 0 0 \n", + "929 300 300 0 0 0 0 0 \n", + "4454 2550 1950 0 0 0 0 0 \n", + "3653 420 420 0 0 0 0 0 \n", + "8362 2400 2400 0 0 0 0 0 \n", + "5270 2400 2400 0 0 0 0 0 \n", + "1865 2400 1800 0 0 0 0 0 \n", + "\n", + " PAY_AMT6 DEFAULT_NEXT_MONTH p_DEFAULT_NEXT_MONTH p_FEMALE_ADVERSARY \n", + "7870 0 0 0.220037 0.591124 \n", + "5735 0 0 0.220037 0.591124 \n", + "3093 0 0 0.220296 0.591124 \n", + "2821 2000 0 0.220327 0.591124 \n", + "151 1200 0 0.220592 0.591124 \n", + "8558 264 1 0.220832 0.591124 \n", + "4571 316 1 0.220832 0.591124 \n", + "6536 431 0 0.220832 0.591124 \n", + "8244 0 0 0.220909 0.591124 \n", + "3738 1100 1 0.220939 0.591124 \n", + "625 0 0 0.220939 0.591124 \n", + "175 800 0 0.220969 0.591124 \n", + "4563 267 0 0.221026 0.591124 \n", + "5874 0 0 0.221026 0.591124 \n", + "5682 656 0 0.221026 0.591124 \n", + "4293 0 0 0.221026 0.591124 \n", + "5792 493 0 0.221026 0.591124 \n", + "8663 165 0 0.221026 0.591124 \n", + "465 0 1 0.221026 0.591124 \n", + "7768 0 0 0.221026 0.591124 \n", + "2716 0 0 0.221036 0.591124 \n", + "1460 1000 0 0.221815 0.591124 \n", + "3194 3000 0 0.221815 0.591124 \n", + "5224 2500 1 0.221815 0.591124 \n", + "4075 0 0 0.222505 0.591124 \n", + "6030 600 1 0.223772 0.591124 \n", + "8678 0 0 0.223772 0.591124 \n", + "2893 779 1 0.223772 0.591124 \n", + "3036 500 0 0.223772 0.591124 \n", + "4869 0 1 0.223836 0.591124 \n", + "... ... ... ... ... \n", + "2414 57258 0 0.823498 0.591124 \n", + "5594 0 1 0.824548 0.591124 \n", + "8269 700 1 0.825212 0.591124 \n", + "2314 0 1 0.825360 0.591124 \n", + "5605 0 1 0.825784 0.591124 \n", + "6755 1500 0 0.825784 0.591124 \n", + "5740 0 1 0.826592 0.591124 \n", + "4720 0 1 0.826812 0.591124 \n", + "2866 0 1 0.827260 0.591124 \n", + "5221 1000 1 0.827260 0.591124 \n", + "8674 0 1 0.836850 0.591124 \n", + "6727 2000 1 0.836922 0.591124 \n", + "6438 0 1 0.836922 0.591124 \n", + "1223 0 1 0.836922 0.591124 \n", + "3094 0 1 0.838649 0.591124 \n", + "7570 0 0 0.838691 0.591124 \n", + "203 0 0 0.844051 0.591124 \n", + "5120 0 1 0.846597 0.591124 \n", + "4875 672 1 0.847984 0.591124 \n", + "5731 0 0 0.872702 0.591124 \n", + "4746 0 1 0.874011 0.591124 \n", + "8322 0 1 0.874011 0.591124 \n", + "1760 0 0 0.879777 0.591124 \n", + "359 0 1 0.881023 0.591124 \n", + "929 0 0 0.881023 0.591124 \n", + "4454 0 1 0.881023 0.591124 \n", + "3653 780 1 0.891281 0.591124 \n", + "8362 0 1 0.918990 0.591124 \n", + "5270 0 1 0.947069 0.591124 \n", + "1865 0 1 0.950247 0.591124 \n", + "\n", + "[1378 rows x 27 columns]" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "valid_yhat['p_FEMALE_ADVERSARY'] = adv_gbm.predict(h2o.H2OFrame(valid_yhat))['female'].as_data_frame()\n", + "valid_yhat[(valid_yhat['SEX'] == 'female') & \n", + " (valid_yhat['p_FEMALE_ADVERSARY'] > 0.58) & \n", + " (valid_yhat[yhat_name] > best_cut)]\\\n", + " .sort_values(by=yhat_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Some of the women in this set also appear to have missed only 0-2 payments, and been late only 1-2 months on the few payments they missed, if any." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Shutdown H2O" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Are you sure you want to shutdown the H2O instance running at http://127.0.0.1:54321 (Y/N)? n\n" + ] + } + ], + "source": [ + "# be careful, this can erase your work!\n", + "h2o.cluster().shutdown(prompt=True)" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/rmltk/debug.py b/rmltk/debug.py index 55728b4..c1f194e 100644 --- a/rmltk/debug.py +++ b/rmltk/debug.py @@ -1,3 +1,5 @@ +import pandas as pd + """ Copyright 2020 - Patrick Hall (jphall@gwu.edu) @@ -31,6 +33,76 @@ # TODO: model documentation # TODO: CV metrics for all measures, not just accuracy +# represent metrics as dictionary for use later +METRIC_DICT = { + +#### overall performance +'Prevalence': '(tp + fn) / (tp + tn +fp + fn)', # how much default actually happens for this group +'Accuracy': '(tp + tn) / (tp + tn + fp + fn)', # how often the model predicts default and non-default correctly for this group + +#### predicting default will happen +# (correctly) +'True Positive Rate': 'tp / (tp + fn)', # out of the people in the group *that did* default, how many the model predicted *correctly* would default +'Precision': 'tp / (tp + fp)', # out of the people in the group the model *predicted* would default, how many the model predicted *correctly* would default + +#### predicting default won't happen +# (correctly) +'Specificity': 'tn / (tn + fp)', # out of the people in the group *that did not* default, how many the model predicted *correctly* would not default +'Negative Predicted Value': 'tn / (tn + fn)', # out of the people in the group the model *predicted* would not default, how many the model predicted *correctly* would not default + +#### analyzing errors - type I +# false accusations +'False Positive Rate': 'fp / (tn + fp)', # out of the people in the group *that did not* default, how many the model predicted *incorrectly* would default +'False Discovery Rate': 'fp / (tp + fp)', # out of the people in the group the model *predicted* would default, how many the model predicted *incorrectly* would default + +#### analyzing errors - type II +# costly ommisions +'False Negative Rate': 'fn / (tp + fn)', # out of the people in the group *that did* default, how many the model predicted *incorrectly* would not default +'False Omissions Rate':'fn / (tn + fn)' # out of the people in the group the model *predicted* would not default, how many the model predicted *incorrectly* would not default +} + + +def get_metrics_ratios(cm_dict, _control_level): + + """ Calculates confusion matrix metrics in METRIC_DICT for each level of demographic feature. + Tightly coupled to cm_dict. + + :param cm_dict: Dictionary of Pandas confusion matrices, one matrix for each level. + :param _control_level: Control level in cm_dict. + :return: Tuple, Pandas frame of metrics for each level of demographic feature, Pandas frame of ratio metrics for + each level of demographic feature. + + """ + + levels = sorted(list(cm_dict.keys())) + + eps = 1e-20 # for safe numerical operations + + # init return frames + metrics_frame = pd.DataFrame(index=levels) # frame for metrics + + # nested loop through: + # - levels + # - metrics + for level in levels: + + for metric in METRIC_DICT.keys(): + + # parse metric expressions into executable Pandas statements + expression = METRIC_DICT[metric].replace('tp', 'cm_dict[level].iat[0, 0]') \ + .replace('fp', 'cm_dict[level].iat[0, 1]') \ + .replace('fn', 'cm_dict[level].iat[1, 0]') \ + .replace('tn', 'cm_dict[level].iat[1, 1]') + + # dynamically evaluate metrics to avoid code duplication + metrics_frame.loc[level, metric] = eval(expression) + + # calculate metric ratios + ratios_frame = (metrics_frame.loc[:, :] + eps) / (metrics_frame.loc[_control_level, :] + eps) + ratios_frame.columns = [col + ' Ratio' for col in ratios_frame.columns] + + return metrics_frame, ratios_frame + def air(cm_dict, reference, protected): @@ -121,3 +193,4 @@ def smd(valid, x_name, yhat_name, reference, protected): print(yhat_name.title() + ' std. dev.: %.2f' % sigma) return (protected_yhat_mean - reference_yhat_mean) / sigma + diff --git a/rmltk/evaluate.py b/rmltk/evaluate.py index ce20910..b296a65 100644 --- a/rmltk/evaluate.py +++ b/rmltk/evaluate.py @@ -188,6 +188,111 @@ def cv_model_rank_select(valid, seed_, train_results, model_prefix, 'METRICS': best_model_frame} +def get_prauc(frame, y, yhat, pos=1, neg=0, res=0.01): + + """ Calculates precision, recall, and f1 for a pandas dataframe of y and yhat values. + + Args: + frame: Pandas dataframe of actual (y) and predicted (yhat) values. + y: Name of actual value column. + yhat: Name of predicted value column. + pos: Primary target value, default 1. + neg: Secondary target value, default 0. + res: Resolution by which to loop through cutoffs, default 0.01. + + Returns: + Pandas dataframe of precision, recall, and f1 values. + """ + + frame_ = frame.copy(deep=True) # don't destroy original data + dname = 'd_' + str(y) # column for predicted decisions + eps = 1e-20 # for safe numerical operations + + # init p-r roc frame + prauc_frame = pd.DataFrame(columns=['cutoff', 'recall', 'precision', 'f1']) + + # loop through cutoffs to create p-r roc frame + for cutoff in np.arange(0, 1 + res, res): + # binarize decision to create confusion matrix values + frame_[dname] = np.where(frame_[yhat] > cutoff, 1, 0) + + # calculate confusion matrix values + tp = frame_[(frame_[dname] == pos) & (frame_[y] == pos)].shape[0] + fp = frame_[(frame_[dname] == pos) & (frame_[y] == neg)].shape[0] + fn = frame_[(frame_[dname] == neg) & (frame_[y] == pos)].shape[0] + + # calculate precision, recall, and f1 + recall = (tp + eps) / ((tp + fn) + eps) + precision = (tp + eps) / ((tp + fp) + eps) + f1 = 2 / ((1 / (recall + eps)) + (1 / (precision + eps))) + + # add new values to frame + prauc_frame = prauc_frame.append({'cutoff': cutoff, + 'recall': recall, + 'precision': precision, + 'f1': f1}, + ignore_index=True) + + # housekeeping + del frame_ + + return prauc_frame + + +def get_youdens_j(frame, y, yhat, pos=1, neg=0, res=0.01): + + """ Calculates TPR, TNR, and Youden's J for a Pandas DataFrame of actual (_y_name) and predicted (_yhat_name) values + to select best cutoff for AUC-optimized classifier. + + :param frame: Pandas DataFrame of actual (_y_name) and predicted (_yhat_name) values. + :param y: Name of actual value column. + :param yhat: Name of predicted value column. + :param pos: Primary target value, default 1. + :param neg: Secondary target value, default 0. + :param res: Resolution by which to loop through cutoffs, default 0.01. + :return: Pandas DataFrame of sensitivity, specificity, and Youden's J values. + + """ + + frame_ = frame.copy(deep=True) # don't destroy original data + dname = 'd_' + str(y) # column for predicted decisions + eps = 1e-20 # for safe numerical operations + + # init j_frame + j_frame = pd.DataFrame(columns=['cutoff', 'TPR', 'TNR', 'J']) + + # loop through cutoffs to create j_frame + for cutoff in np.arange(0, 1 + res, res): + + # binarize decision to create confusion matrix values + frame_[dname] = np.where(frame_[yhat] > cutoff, 1, 0) + + # calculate confusion matrix values + tp = frame_[(frame_[dname] == pos) & (frame_[y] == pos)].shape[0] + fp = frame_[(frame_[dname] == pos) & (frame_[y] == neg)].shape[0] + tn = frame_[(frame_[dname] == neg) & (frame_[y] == neg)].shape[0] + fn = frame_[(frame_[dname] == neg) & (frame_[y] == pos)].shape[0] + + # calculate precision, recall, and Youden's J + tpr = (tp + eps) / ((tp + fn) + eps) + tnr = (tn + eps) / ((tn + fp) + eps) + fnr = 1 - tnr + j = tpr + tnr - 1 + + # add new values to frame + j_frame = j_frame.append({'cutoff': cutoff, + 'TPR': tpr, + 'TNR': tnr, + 'FNR': fnr, + 'J': j}, + ignore_index=True) + + # housekeeping + del frame_ + + return j_frame + + def get_confusion_matrix(valid, y_name, yhat_name, by=None, level=None, cutoff=0.5): """ Creates confusion matrix from pandas DataFrame of y and yhat values, can be sliced @@ -199,8 +304,8 @@ def get_confusion_matrix(valid, y_name, yhat_name, by=None, level=None, cutoff=0 :param by: By variable to slice frame before creating confusion matrix, default None. :param level: Value of by variable to slice frame before creating confusion matrix, default None. :param cutoff: Cutoff threshold for confusion matrix, default 0.5. - :return: Confusion matrix as pandas DataFrame. + """ # determine levels of target (y) variable From 9c82319ecb0919ea7d7a927609b6f66aacd84d2e Mon Sep 17 00:00:00 2001 From: patrickh Date: Wed, 3 Jun 2020 21:25:21 -0400 Subject: [PATCH 3/3] add more lecture 3 materials --- README.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/README.md b/README.md index d155306..235032d 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,8 @@ Materials for a technical, nuts-and-bolts course about increasing transparency, Corrections or suggestions? Please file a [GitHub issue](https://github.com/jphall663/GWU_rml/issues/new). +*** + ## Lecture 1: Interpretable Machine Learning Models ![Histogram, partial dependence, and ICE for a monotonic GBM and a credit card customer's most recent repayment status](/img/lecture_1.png) @@ -53,6 +55,8 @@ Corrections or suggestions? Please file a [GitHub issue](https://github.com/jpha * [When a Computer Program Keeps You in Jail](https://www.nytimes.com/2017/06/13/opinion/how-computers-are-harming-criminal-justice.html) * [When an Algorithm Helps Send You to Prison](https://www.nytimes.com/2017/10/26/opinion/algorithm-compas-sentencing-bias.html) + + ## Lecture 2: Post-hoc Explanation ![A decision tree surrogate model forms a flow chart of a more complex monotonic GBM](/img/lecture_2.png) @@ -93,6 +97,33 @@ Corrections or suggestions? Please file a [GitHub issue](https://github.com/jpha * [Machine Bias](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing) * [Gender Shades](http://gendershades.org/) * [Explainable Neural Networks based on Additive Index Models](https://arxiv.org/pdf/1806.01933.pdf) + + + +## Lecture 3: Discrimination Testing and Remediation + +### Lecture 3 Class Materials + +* [Lecture Notes]() +* [Lecture Video]() +* Software Example: [Testing a Constrained Model for Discrimination and Remediating Discovered Discrimination](https://nbviewer.jupyter.org/github/jphall663/GWU_rml/blob/master/lecture_3.ipynb) + +### Lecture 3 Suggested Software + +* Python: + * [`aequitas`](https://github.com/dssg/aequitas) + * [`AIF360`](https://github.com/IBM/AIF360) + * [`Themis`](https://github.com/LASER-UMASS/Themis) + +### Lecture 3 Suggested Reading + +* **Introduction and Background**: + +* **Post-hoc Explanation Techniques**: + +* **Links from Lecture**: + + ## Using Class Software Resources