diff --git a/.gitignore b/.gitignore
index f4f35fd..b0943e8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,4 @@
+env
env_rml
.idea
.ipynb_checkpoints
diff --git a/lecture_4.ipynb b/lecture_4.ipynb
new file mode 100644
index 0000000..b9cdaad
--- /dev/null
+++ b/lecture_4.ipynb
@@ -0,0 +1,2390 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## License \n",
+ "\n",
+ "Copyright 2020 Patrick Hall (jphall@gwu.edu)\n",
+ "\n",
+ "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "you may not use this file except in compliance with the License.\n",
+ "You may obtain a copy of the License at\n",
+ "\n",
+ " http://www.apache.org/licenses/LICENSE-2.0\n",
+ "\n",
+ "Unless required by applicable law or agreed to in writing, software\n",
+ "distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "See the License for the specific language governing permissions and\n",
+ "limitations under the License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**DISCLAIMER:** This notebook is not legal compliance advice."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "***"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Attacking a Constrained Machine Learning Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Global hyperpameters"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "SEED = 12345 # global random seed for better reproducibility"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Python imports and inits"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.\n",
+ "Attempting to start a local H2O server...\n",
+ " Java Version: openjdk version \"1.8.0_252\"; OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~18.04-b09); OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)\n",
+ " Starting server from /home/patrickh/Workspace/GWU_rml/env_rml/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar\n",
+ " Ice root: /tmp/tmpwbigsuw9\n",
+ " JVM stdout: /tmp/tmpwbigsuw9/h2o_patrickh_started_from_python.out\n",
+ " JVM stderr: /tmp/tmpwbigsuw9/h2o_patrickh_started_from_python.err\n",
+ " Server is running at http://127.0.0.1:54321\n",
+ "Connecting to H2O server at http://127.0.0.1:54321 ... successful.\n",
+ "Warning: Your H2O cluster version is too old (9 months and 10 days)! Please download and install the latest version from http://h2o.ai/download/\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "
H2O cluster uptime: | \n",
+ "00 secs |
\n",
+ "H2O cluster timezone: | \n",
+ "America/New_York |
\n",
+ "H2O data parsing timezone: | \n",
+ "UTC |
\n",
+ "H2O cluster version: | \n",
+ "3.26.0.3 |
\n",
+ "H2O cluster version age: | \n",
+ "9 months and 10 days !!! |
\n",
+ "H2O cluster name: | \n",
+ "H2O_from_python_patrickh_8fev5r |
\n",
+ "H2O cluster total nodes: | \n",
+ "1 |
\n",
+ "H2O cluster free memory: | \n",
+ "1.879 Gb |
\n",
+ "H2O cluster total cores: | \n",
+ "24 |
\n",
+ "H2O cluster allowed cores: | \n",
+ "24 |
\n",
+ "H2O cluster status: | \n",
+ "accepting new members, healthy |
\n",
+ "H2O connection url: | \n",
+ "http://127.0.0.1:54321 |
\n",
+ "H2O connection proxy: | \n",
+ "None |
\n",
+ "H2O internal security: | \n",
+ "False |
\n",
+ "H2O API Extensions: | \n",
+ "Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4 |
\n",
+ "Python version: | \n",
+ "3.6.9 final |
"
+ ],
+ "text/plain": [
+ "-------------------------- ---------------------------------------------------\n",
+ "H2O cluster uptime: 00 secs\n",
+ "H2O cluster timezone: America/New_York\n",
+ "H2O data parsing timezone: UTC\n",
+ "H2O cluster version: 3.26.0.3\n",
+ "H2O cluster version age: 9 months and 10 days !!!\n",
+ "H2O cluster name: H2O_from_python_patrickh_8fev5r\n",
+ "H2O cluster total nodes: 1\n",
+ "H2O cluster free memory: 1.879 Gb\n",
+ "H2O cluster total cores: 24\n",
+ "H2O cluster allowed cores: 24\n",
+ "H2O cluster status: accepting new members, healthy\n",
+ "H2O connection url: http://127.0.0.1:54321\n",
+ "H2O connection proxy:\n",
+ "H2O internal security: False\n",
+ "H2O API Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4\n",
+ "Python version: 3.6.9 final\n",
+ "-------------------------- ---------------------------------------------------"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from rmltk import debug, evaluate, model # simple module for evaluating, debugging, and training models\n",
+ "\n",
+ "# h2o Python API with specific classes\n",
+ "import h2o \n",
+ "from h2o.estimators.gbm import H2OGradientBoostingEstimator # for GBM\n",
+ "\n",
+ "import numpy as np # array, vector, matrix calculations\n",
+ "import pandas as pd # DataFrame handling\n",
+ "\n",
+ "import matplotlib.pyplot as plt # general plotting\n",
+ "pd.options.display.max_columns = 999 # enable display of all columns in notebook\n",
+ "\n",
+ "# display plots in-notebook\n",
+ "%matplotlib inline \n",
+ "\n",
+ "h2o.init(max_mem_size='2G') # start h2o\n",
+ "h2o.remove_all() # remove any existing data structures from h2o memory"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Download, Explore, and Prepare UCI Credit Card Default Data\n",
+ "\n",
+ "UCI credit card default data: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients\n",
+ "\n",
+ "The UCI credit card default data contains demographic and payment information about credit card customers in Taiwan in the year 2005. The data set contains 23 input variables: \n",
+ "\n",
+ "* **`LIMIT_BAL`**: Amount of given credit (NT dollar)\n",
+ "* **`SEX`**: 1 = male; 2 = female\n",
+ "* **`EDUCATION`**: 1 = graduate school; 2 = university; 3 = high school; 4 = others \n",
+ "* **`MARRIAGE`**: 1 = married; 2 = single; 3 = others\n",
+ "* **`AGE`**: Age in years \n",
+ "* **`PAY_0`, `PAY_2` - `PAY_6`**: History of past payment; `PAY_0` = the repayment status in September, 2005; `PAY_2` = the repayment status in August, 2005; ...; `PAY_6` = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; ...; 8 = payment delay for eight months; 9 = payment delay for nine months and above. \n",
+ "* **`BILL_AMT1` - `BILL_AMT6`**: Amount of bill statement (NT dollar). `BILL_AMNT1` = amount of bill statement in September, 2005; `BILL_AMT2` = amount of bill statement in August, 2005; ...; `BILL_AMT6` = amount of bill statement in April, 2005. \n",
+ "* **`PAY_AMT1` - `PAY_AMT6`**: Amount of previous payment (NT dollar). `PAY_AMT1` = amount paid in September, 2005; `PAY_AMT2` = amount paid in August, 2005; ...; `PAY_AMT6` = amount paid in April, 2005. \n",
+ "\n",
+ "Demographic variables will not be used as model inputs as is common in credit scoring models. However, demographic variables will be used after model training to test for disparate impact."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Import data and clean"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# import XLS file\n",
+ "path = 'default_of_credit_card_clients.xls'\n",
+ "data = pd.read_excel(path,\n",
+ " skiprows=1)\n",
+ "\n",
+ "# remove spaces from target column name \n",
+ "data = data.rename(columns={'default payment next month': 'DEFAULT_NEXT_MONTH'}) "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Assign modeling roles"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "y = DEFAULT_NEXT_MONTH\n",
+ "X = ['LIMIT_BAL', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6']\n"
+ ]
+ }
+ ],
+ "source": [
+ "# assign target and inputs for GBM\n",
+ "y_name = 'DEFAULT_NEXT_MONTH'\n",
+ "x_names = [name for name in data.columns if name not in [y_name, 'ID', 'AGE', 'EDUCATION', 'MARRIAGE', 'SEX']]\n",
+ "print('y =', y_name)\n",
+ "print('X =', x_names)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Helper function for recoding values in the UCI credict card default data\n",
+ "This simple function maps longer, more understandable character string values from the UCI credit card default data dictionary to the original integer values of the input variables found in the dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Parse progress: |█████████████████████████████████████████████████████████| 100%\n"
+ ]
+ }
+ ],
+ "source": [
+ "def recode_cc_data(frame):\n",
+ " \n",
+ " \"\"\" Recodes numeric categorical variables into categorical character variables\n",
+ " with more transparent values. \n",
+ " \n",
+ " Args:\n",
+ " frame: Pandas DataFrame version of UCI credit card default data.\n",
+ " \n",
+ " Returns: \n",
+ " H2OFrame with recoded values.\n",
+ " \n",
+ " \"\"\"\n",
+ " \n",
+ " # define recoded values\n",
+ " sex_dict = {1:'male', 2:'female'}\n",
+ " education_dict = {0:'other', 1:'graduate school', 2:'university', 3:'high school', \n",
+ " 4:'other', 5:'other', 6:'other'}\n",
+ " marriage_dict = {0:'other', 1:'married', 2:'single', 3:'divorced'}\n",
+ " \n",
+ " # recode values using apply() and lambda function\n",
+ " frame['SEX'] = frame['SEX'].apply(lambda i: sex_dict[i])\n",
+ " frame['EDUCATION'] = frame['EDUCATION'].apply(lambda i: education_dict[i]) \n",
+ " frame['MARRIAGE'] = frame['MARRIAGE'].apply(lambda i: marriage_dict[i]) \n",
+ " \n",
+ " return h2o.H2OFrame(frame)\n",
+ "\n",
+ "data = recode_cc_data(data)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Split data into training and validation partitions\n",
+ "Fairness metrics will be calculated for the validation data to give a better idea of how explanations will look on future unseen data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Train data rows = 21060, columns = 25\n",
+ "Validation data rows = 8940, columns = 25\n"
+ ]
+ }
+ ],
+ "source": [
+ "# split into training and validation\n",
+ "train, valid = data.split_frame([0.7], seed=12345)\n",
+ "\n",
+ "# summarize split\n",
+ "print('Train data rows = %d, columns = %d' % (train.shape[0], train.shape[1]))\n",
+ "print('Validation data rows = %d, columns = %d' % (valid.shape[0], valid.shape[1]))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2. Load Pre-trained Monotonic GBM\n",
+ "Load the model known as `mgbm5` from the first lecture."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "scrolled": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Model Details\n",
+ "=============\n",
+ "H2OGradientBoostingEstimator : Gradient Boosting Machine\n",
+ "Model Key: best_mgbm\n",
+ "\n",
+ "\n",
+ "Model Summary: "
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " | \n",
+ " number_of_trees | \n",
+ " number_of_internal_trees | \n",
+ " model_size_in_bytes | \n",
+ " min_depth | \n",
+ " max_depth | \n",
+ " mean_depth | \n",
+ " min_leaves | \n",
+ " max_leaves | \n",
+ " mean_leaves | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " | \n",
+ " 46.0 | \n",
+ " 46.0 | \n",
+ " 6939.0 | \n",
+ " 3.0 | \n",
+ " 3.0 | \n",
+ " 3.0 | \n",
+ " 5.0 | \n",
+ " 8.0 | \n",
+ " 7.369565 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " number_of_trees number_of_internal_trees model_size_in_bytes \\\n",
+ "0 46.0 46.0 6939.0 \n",
+ "\n",
+ " min_depth max_depth mean_depth min_leaves max_leaves mean_leaves \n",
+ "0 3.0 3.0 3.0 5.0 8.0 7.369565 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "\n",
+ "ModelMetricsBinomial: gbm\n",
+ "** Reported on train data. **\n",
+ "\n",
+ "MSE: 0.13637719864300343\n",
+ "RMSE: 0.3692928358945018\n",
+ "LogLoss: 0.4351274080189972\n",
+ "Mean Per-Class Error: 0.2913939696264273\n",
+ "AUC: 0.7716491282246187\n",
+ "pr_auc: 0.5471826859054356\n",
+ "Gini: 0.5432982564492375\n",
+ "\n",
+ "Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.21968260039166268: "
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " Error | \n",
+ " Rate | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 13482.0 | \n",
+ " 2814.0 | \n",
+ " 0.1727 | \n",
+ " (2814.0/16296.0) | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 1907.0 | \n",
+ " 2743.0 | \n",
+ " 0.4101 | \n",
+ " (1907.0/4650.0) | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Total | \n",
+ " 15389.0 | \n",
+ " 5557.0 | \n",
+ " 0.2254 | \n",
+ " (4721.0/20946.0) | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 Error Rate\n",
+ "0 0 13482.0 2814.0 0.1727 (2814.0/16296.0)\n",
+ "1 1 1907.0 2743.0 0.4101 (1907.0/4650.0)\n",
+ "2 Total 15389.0 5557.0 0.2254 (4721.0/20946.0)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "Maximum Metrics: Maximum metrics at their respective thresholds\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " metric | \n",
+ " threshold | \n",
+ " value | \n",
+ " idx | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " max f1 | \n",
+ " 0.219683 | \n",
+ " 0.537474 | \n",
+ " 248.0 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " max f2 | \n",
+ " 0.127859 | \n",
+ " 0.630227 | \n",
+ " 329.0 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " max f0point5 | \n",
+ " 0.446699 | \n",
+ " 0.583033 | \n",
+ " 147.0 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " max accuracy | \n",
+ " 0.446699 | \n",
+ " 0.821493 | \n",
+ " 147.0 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " max precision | \n",
+ " 0.950247 | \n",
+ " 1.000000 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " max recall | \n",
+ " 0.050609 | \n",
+ " 1.000000 | \n",
+ " 395.0 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " max specificity | \n",
+ " 0.950247 | \n",
+ " 1.000000 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " max absolute_mcc | \n",
+ " 0.325159 | \n",
+ " 0.413494 | \n",
+ " 194.0 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " max min_per_class_accuracy | \n",
+ " 0.177542 | \n",
+ " 0.698495 | \n",
+ " 281.0 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " max mean_per_class_accuracy | \n",
+ " 0.219683 | \n",
+ " 0.708606 | \n",
+ " 248.0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " metric threshold value idx\n",
+ "0 max f1 0.219683 0.537474 248.0\n",
+ "1 max f2 0.127859 0.630227 329.0\n",
+ "2 max f0point5 0.446699 0.583033 147.0\n",
+ "3 max accuracy 0.446699 0.821493 147.0\n",
+ "4 max precision 0.950247 1.000000 0.0\n",
+ "5 max recall 0.050609 1.000000 395.0\n",
+ "6 max specificity 0.950247 1.000000 0.0\n",
+ "7 max absolute_mcc 0.325159 0.413494 194.0\n",
+ "8 max min_per_class_accuracy 0.177542 0.698495 281.0\n",
+ "9 max mean_per_class_accuracy 0.219683 0.708606 248.0"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "Gains/Lift Table: Avg response rate: 22.20 %, avg score: 22.00 %\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " | \n",
+ " group | \n",
+ " cumulative_data_fraction | \n",
+ " lower_threshold | \n",
+ " lift | \n",
+ " cumulative_lift | \n",
+ " response_rate | \n",
+ " score | \n",
+ " cumulative_response_rate | \n",
+ " cumulative_score | \n",
+ " capture_rate | \n",
+ " cumulative_capture_rate | \n",
+ " gain | \n",
+ " cumulative_gain | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " | \n",
+ " 1 | \n",
+ " 0.010074 | \n",
+ " 0.813927 | \n",
+ " 3.607883 | \n",
+ " 3.607883 | \n",
+ " 0.800948 | \n",
+ " 0.843446 | \n",
+ " 0.800948 | \n",
+ " 0.843446 | \n",
+ " 0.036344 | \n",
+ " 0.036344 | \n",
+ " 260.788259 | \n",
+ " 260.788259 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " | \n",
+ " 2 | \n",
+ " 0.020338 | \n",
+ " 0.795575 | \n",
+ " 3.519808 | \n",
+ " 3.563432 | \n",
+ " 0.781395 | \n",
+ " 0.805153 | \n",
+ " 0.791080 | \n",
+ " 0.824119 | \n",
+ " 0.036129 | \n",
+ " 0.072473 | \n",
+ " 251.980795 | \n",
+ " 256.343177 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " | \n",
+ " 3 | \n",
+ " 0.030316 | \n",
+ " 0.763679 | \n",
+ " 3.405328 | \n",
+ " 3.511394 | \n",
+ " 0.755981 | \n",
+ " 0.783970 | \n",
+ " 0.779528 | \n",
+ " 0.810905 | \n",
+ " 0.033978 | \n",
+ " 0.106452 | \n",
+ " 240.532798 | \n",
+ " 251.139446 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " | \n",
+ " 4 | \n",
+ " 0.040008 | \n",
+ " 0.715138 | \n",
+ " 3.261891 | \n",
+ " 3.450954 | \n",
+ " 0.724138 | \n",
+ " 0.739815 | \n",
+ " 0.766110 | \n",
+ " 0.793684 | \n",
+ " 0.031613 | \n",
+ " 0.138065 | \n",
+ " 226.189099 | \n",
+ " 245.095388 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " | \n",
+ " 5 | \n",
+ " 0.050081 | \n",
+ " 0.664416 | \n",
+ " 3.116869 | \n",
+ " 3.383755 | \n",
+ " 0.691943 | \n",
+ " 0.686695 | \n",
+ " 0.751192 | \n",
+ " 0.772164 | \n",
+ " 0.031398 | \n",
+ " 0.169462 | \n",
+ " 211.686898 | \n",
+ " 238.375473 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " | \n",
+ " 6 | \n",
+ " 0.100019 | \n",
+ " 0.543384 | \n",
+ " 2.859463 | \n",
+ " 3.121984 | \n",
+ " 0.634799 | \n",
+ " 0.601794 | \n",
+ " 0.693079 | \n",
+ " 0.687101 | \n",
+ " 0.142796 | \n",
+ " 0.312258 | \n",
+ " 185.946339 | \n",
+ " 212.198445 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " | \n",
+ " 7 | \n",
+ " 0.150005 | \n",
+ " 0.366237 | \n",
+ " 2.224293 | \n",
+ " 2.822849 | \n",
+ " 0.493792 | \n",
+ " 0.446951 | \n",
+ " 0.626671 | \n",
+ " 0.607076 | \n",
+ " 0.111183 | \n",
+ " 0.423441 | \n",
+ " 122.429306 | \n",
+ " 182.284922 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " | \n",
+ " 8 | \n",
+ " 0.205672 | \n",
+ " 0.292765 | \n",
+ " 1.595510 | \n",
+ " 2.490659 | \n",
+ " 0.354202 | \n",
+ " 0.312777 | \n",
+ " 0.552925 | \n",
+ " 0.527422 | \n",
+ " 0.088817 | \n",
+ " 0.512258 | \n",
+ " 59.551043 | \n",
+ " 149.065864 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " | \n",
+ " 9 | \n",
+ " 0.301251 | \n",
+ " 0.196648 | \n",
+ " 1.174504 | \n",
+ " 2.073077 | \n",
+ " 0.260739 | \n",
+ " 0.234499 | \n",
+ " 0.460222 | \n",
+ " 0.434485 | \n",
+ " 0.112258 | \n",
+ " 0.624516 | \n",
+ " 17.450421 | \n",
+ " 107.307684 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " | \n",
+ " 10 | \n",
+ " 0.400029 | \n",
+ " 0.173817 | \n",
+ " 0.864327 | \n",
+ " 1.774604 | \n",
+ " 0.191880 | \n",
+ " 0.184844 | \n",
+ " 0.393961 | \n",
+ " 0.372842 | \n",
+ " 0.085376 | \n",
+ " 0.709892 | \n",
+ " -13.567284 | \n",
+ " 77.460410 | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " | \n",
+ " 11 | \n",
+ " 0.500286 | \n",
+ " 0.151431 | \n",
+ " 0.701418 | \n",
+ " 1.559537 | \n",
+ " 0.155714 | \n",
+ " 0.161335 | \n",
+ " 0.346216 | \n",
+ " 0.330455 | \n",
+ " 0.070323 | \n",
+ " 0.780215 | \n",
+ " -29.858249 | \n",
+ " 55.953665 | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " | \n",
+ " 12 | \n",
+ " 0.600306 | \n",
+ " 0.131214 | \n",
+ " 0.619237 | \n",
+ " 1.402870 | \n",
+ " 0.137470 | \n",
+ " 0.140709 | \n",
+ " 0.311436 | \n",
+ " 0.298841 | \n",
+ " 0.061935 | \n",
+ " 0.842151 | \n",
+ " -38.076342 | \n",
+ " 40.286982 | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " | \n",
+ " 13 | \n",
+ " 0.700659 | \n",
+ " 0.114794 | \n",
+ " 0.559314 | \n",
+ " 1.282050 | \n",
+ " 0.124167 | \n",
+ " 0.122817 | \n",
+ " 0.284614 | \n",
+ " 0.273630 | \n",
+ " 0.056129 | \n",
+ " 0.898280 | \n",
+ " -44.068568 | \n",
+ " 28.204987 | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " | \n",
+ " 14 | \n",
+ " 0.800821 | \n",
+ " 0.102226 | \n",
+ " 0.369293 | \n",
+ " 1.167887 | \n",
+ " 0.081983 | \n",
+ " 0.108062 | \n",
+ " 0.259270 | \n",
+ " 0.252921 | \n",
+ " 0.036989 | \n",
+ " 0.935269 | \n",
+ " -63.070697 | \n",
+ " 16.788724 | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " | \n",
+ " 15 | \n",
+ " 0.904564 | \n",
+ " 0.091861 | \n",
+ " 0.402152 | \n",
+ " 1.080066 | \n",
+ " 0.089277 | \n",
+ " 0.097524 | \n",
+ " 0.239774 | \n",
+ " 0.235099 | \n",
+ " 0.041720 | \n",
+ " 0.976989 | \n",
+ " -59.784808 | \n",
+ " 8.006633 | \n",
+ "
\n",
+ " \n",
+ " 15 | \n",
+ " | \n",
+ " 16 | \n",
+ " 1.000000 | \n",
+ " 0.034810 | \n",
+ " 0.241112 | \n",
+ " 1.000000 | \n",
+ " 0.053527 | \n",
+ " 0.076989 | \n",
+ " 0.221999 | \n",
+ " 0.220010 | \n",
+ " 0.023011 | \n",
+ " 1.000000 | \n",
+ " -75.888783 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " group cumulative_data_fraction lower_threshold lift \\\n",
+ "0 1 0.010074 0.813927 3.607883 \n",
+ "1 2 0.020338 0.795575 3.519808 \n",
+ "2 3 0.030316 0.763679 3.405328 \n",
+ "3 4 0.040008 0.715138 3.261891 \n",
+ "4 5 0.050081 0.664416 3.116869 \n",
+ "5 6 0.100019 0.543384 2.859463 \n",
+ "6 7 0.150005 0.366237 2.224293 \n",
+ "7 8 0.205672 0.292765 1.595510 \n",
+ "8 9 0.301251 0.196648 1.174504 \n",
+ "9 10 0.400029 0.173817 0.864327 \n",
+ "10 11 0.500286 0.151431 0.701418 \n",
+ "11 12 0.600306 0.131214 0.619237 \n",
+ "12 13 0.700659 0.114794 0.559314 \n",
+ "13 14 0.800821 0.102226 0.369293 \n",
+ "14 15 0.904564 0.091861 0.402152 \n",
+ "15 16 1.000000 0.034810 0.241112 \n",
+ "\n",
+ " cumulative_lift response_rate score cumulative_response_rate \\\n",
+ "0 3.607883 0.800948 0.843446 0.800948 \n",
+ "1 3.563432 0.781395 0.805153 0.791080 \n",
+ "2 3.511394 0.755981 0.783970 0.779528 \n",
+ "3 3.450954 0.724138 0.739815 0.766110 \n",
+ "4 3.383755 0.691943 0.686695 0.751192 \n",
+ "5 3.121984 0.634799 0.601794 0.693079 \n",
+ "6 2.822849 0.493792 0.446951 0.626671 \n",
+ "7 2.490659 0.354202 0.312777 0.552925 \n",
+ "8 2.073077 0.260739 0.234499 0.460222 \n",
+ "9 1.774604 0.191880 0.184844 0.393961 \n",
+ "10 1.559537 0.155714 0.161335 0.346216 \n",
+ "11 1.402870 0.137470 0.140709 0.311436 \n",
+ "12 1.282050 0.124167 0.122817 0.284614 \n",
+ "13 1.167887 0.081983 0.108062 0.259270 \n",
+ "14 1.080066 0.089277 0.097524 0.239774 \n",
+ "15 1.000000 0.053527 0.076989 0.221999 \n",
+ "\n",
+ " cumulative_score capture_rate cumulative_capture_rate gain \\\n",
+ "0 0.843446 0.036344 0.036344 260.788259 \n",
+ "1 0.824119 0.036129 0.072473 251.980795 \n",
+ "2 0.810905 0.033978 0.106452 240.532798 \n",
+ "3 0.793684 0.031613 0.138065 226.189099 \n",
+ "4 0.772164 0.031398 0.169462 211.686898 \n",
+ "5 0.687101 0.142796 0.312258 185.946339 \n",
+ "6 0.607076 0.111183 0.423441 122.429306 \n",
+ "7 0.527422 0.088817 0.512258 59.551043 \n",
+ "8 0.434485 0.112258 0.624516 17.450421 \n",
+ "9 0.372842 0.085376 0.709892 -13.567284 \n",
+ "10 0.330455 0.070323 0.780215 -29.858249 \n",
+ "11 0.298841 0.061935 0.842151 -38.076342 \n",
+ "12 0.273630 0.056129 0.898280 -44.068568 \n",
+ "13 0.252921 0.036989 0.935269 -63.070697 \n",
+ "14 0.235099 0.041720 0.976989 -59.784808 \n",
+ "15 0.220010 0.023011 1.000000 -75.888783 \n",
+ "\n",
+ " cumulative_gain \n",
+ "0 260.788259 \n",
+ "1 256.343177 \n",
+ "2 251.139446 \n",
+ "3 245.095388 \n",
+ "4 238.375473 \n",
+ "5 212.198445 \n",
+ "6 182.284922 \n",
+ "7 149.065864 \n",
+ "8 107.307684 \n",
+ "9 77.460410 \n",
+ "10 55.953665 \n",
+ "11 40.286982 \n",
+ "12 28.204987 \n",
+ "13 16.788724 \n",
+ "14 8.006633 \n",
+ "15 0.000000 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "\n",
+ "ModelMetricsBinomial: gbm\n",
+ "** Reported on validation data. **\n",
+ "\n",
+ "MSE: 0.13326994104124376\n",
+ "RMSE: 0.3650615578792757\n",
+ "LogLoss: 0.4278285715046422\n",
+ "Mean Per-Class Error: 0.2856607030196092\n",
+ "AUC: 0.7776380047998697\n",
+ "pr_auc: 0.5486322626112021\n",
+ "Gini: 0.5552760095997393\n",
+ "\n",
+ "Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.27397344199105433: "
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " Error | \n",
+ " Rate | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 6093.0 | \n",
+ " 975.0 | \n",
+ " 0.1379 | \n",
+ " (975.0/7068.0) | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 863.0 | \n",
+ " 1123.0 | \n",
+ " 0.4345 | \n",
+ " (863.0/1986.0) | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Total | \n",
+ " 6956.0 | \n",
+ " 2098.0 | \n",
+ " 0.203 | \n",
+ " (1838.0/9054.0) | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 Error Rate\n",
+ "0 0 6093.0 975.0 0.1379 (975.0/7068.0)\n",
+ "1 1 863.0 1123.0 0.4345 (863.0/1986.0)\n",
+ "2 Total 6956.0 2098.0 0.203 (1838.0/9054.0)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "Maximum Metrics: Maximum metrics at their respective thresholds\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " metric | \n",
+ " threshold | \n",
+ " value | \n",
+ " idx | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " max f1 | \n",
+ " 0.273973 | \n",
+ " 0.549951 | \n",
+ " 217.0 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " max f2 | \n",
+ " 0.147835 | \n",
+ " 0.634488 | \n",
+ " 307.0 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " max f0point5 | \n",
+ " 0.436620 | \n",
+ " 0.590736 | \n",
+ " 153.0 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " max accuracy | \n",
+ " 0.456963 | \n",
+ " 0.825271 | \n",
+ " 147.0 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " max precision | \n",
+ " 0.947069 | \n",
+ " 1.000000 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " max recall | \n",
+ " 0.045106 | \n",
+ " 1.000000 | \n",
+ " 397.0 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " max specificity | \n",
+ " 0.947069 | \n",
+ " 1.000000 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " max absolute_mcc | \n",
+ " 0.347246 | \n",
+ " 0.429999 | \n",
+ " 184.0 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " max min_per_class_accuracy | \n",
+ " 0.181585 | \n",
+ " 0.709970 | \n",
+ " 275.0 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " max mean_per_class_accuracy | \n",
+ " 0.230518 | \n",
+ " 0.714339 | \n",
+ " 240.0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " metric threshold value idx\n",
+ "0 max f1 0.273973 0.549951 217.0\n",
+ "1 max f2 0.147835 0.634488 307.0\n",
+ "2 max f0point5 0.436620 0.590736 153.0\n",
+ "3 max accuracy 0.456963 0.825271 147.0\n",
+ "4 max precision 0.947069 1.000000 0.0\n",
+ "5 max recall 0.045106 1.000000 397.0\n",
+ "6 max specificity 0.947069 1.000000 0.0\n",
+ "7 max absolute_mcc 0.347246 0.429999 184.0\n",
+ "8 max min_per_class_accuracy 0.181585 0.709970 275.0\n",
+ "9 max mean_per_class_accuracy 0.230518 0.714339 240.0"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "Gains/Lift Table: Avg response rate: 21.94 %, avg score: 22.52 %\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " | \n",
+ " group | \n",
+ " cumulative_data_fraction | \n",
+ " lower_threshold | \n",
+ " lift | \n",
+ " cumulative_lift | \n",
+ " response_rate | \n",
+ " score | \n",
+ " cumulative_response_rate | \n",
+ " cumulative_score | \n",
+ " capture_rate | \n",
+ " cumulative_capture_rate | \n",
+ " gain | \n",
+ " cumulative_gain | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " | \n",
+ " 1 | \n",
+ " 0.011155 | \n",
+ " 0.815010 | \n",
+ " 3.295055 | \n",
+ " 3.295055 | \n",
+ " 0.722772 | \n",
+ " 0.839858 | \n",
+ " 0.722772 | \n",
+ " 0.839858 | \n",
+ " 0.036757 | \n",
+ " 0.036757 | \n",
+ " 229.505549 | \n",
+ " 229.505549 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " | \n",
+ " 2 | \n",
+ " 0.020543 | \n",
+ " 0.795575 | \n",
+ " 3.700764 | \n",
+ " 3.480460 | \n",
+ " 0.811765 | \n",
+ " 0.805631 | \n",
+ " 0.763441 | \n",
+ " 0.824217 | \n",
+ " 0.034743 | \n",
+ " 0.071501 | \n",
+ " 270.076417 | \n",
+ " 248.045999 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " | \n",
+ " 3 | \n",
+ " 0.030042 | \n",
+ " 0.783550 | \n",
+ " 3.604721 | \n",
+ " 3.519749 | \n",
+ " 0.790698 | \n",
+ " 0.792441 | \n",
+ " 0.772059 | \n",
+ " 0.814170 | \n",
+ " 0.034240 | \n",
+ " 0.105740 | \n",
+ " 260.472142 | \n",
+ " 251.974853 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " | \n",
+ " 4 | \n",
+ " 0.040093 | \n",
+ " 0.743192 | \n",
+ " 3.005876 | \n",
+ " 3.390927 | \n",
+ " 0.659341 | \n",
+ " 0.761335 | \n",
+ " 0.743802 | \n",
+ " 0.800925 | \n",
+ " 0.030211 | \n",
+ " 0.135952 | \n",
+ " 200.587630 | \n",
+ " 239.092657 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " | \n",
+ " 5 | \n",
+ " 0.050033 | \n",
+ " 0.697702 | \n",
+ " 3.444512 | \n",
+ " 3.401573 | \n",
+ " 0.755556 | \n",
+ " 0.723091 | \n",
+ " 0.746137 | \n",
+ " 0.785461 | \n",
+ " 0.034240 | \n",
+ " 0.170191 | \n",
+ " 244.451158 | \n",
+ " 240.157260 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " | \n",
+ " 6 | \n",
+ " 0.101281 | \n",
+ " 0.553193 | \n",
+ " 3.104777 | \n",
+ " 3.251394 | \n",
+ " 0.681034 | \n",
+ " 0.614736 | \n",
+ " 0.713195 | \n",
+ " 0.699075 | \n",
+ " 0.159114 | \n",
+ " 0.329305 | \n",
+ " 210.477654 | \n",
+ " 225.139444 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " | \n",
+ " 7 | \n",
+ " 0.150320 | \n",
+ " 0.383564 | \n",
+ " 2.187046 | \n",
+ " 2.904171 | \n",
+ " 0.479730 | \n",
+ " 0.466067 | \n",
+ " 0.637032 | \n",
+ " 0.623061 | \n",
+ " 0.107251 | \n",
+ " 0.436556 | \n",
+ " 118.704581 | \n",
+ " 190.417123 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " | \n",
+ " 8 | \n",
+ " 0.200022 | \n",
+ " 0.296915 | \n",
+ " 1.580423 | \n",
+ " 2.575244 | \n",
+ " 0.346667 | \n",
+ " 0.327817 | \n",
+ " 0.564881 | \n",
+ " 0.549698 | \n",
+ " 0.078550 | \n",
+ " 0.515106 | \n",
+ " 58.042296 | \n",
+ " 157.524427 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " | \n",
+ " 9 | \n",
+ " 0.301303 | \n",
+ " 0.203539 | \n",
+ " 1.133514 | \n",
+ " 2.090616 | \n",
+ " 0.248637 | \n",
+ " 0.250648 | \n",
+ " 0.458578 | \n",
+ " 0.449174 | \n",
+ " 0.114804 | \n",
+ " 0.629909 | \n",
+ " 13.351366 | \n",
+ " 109.061561 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " | \n",
+ " 10 | \n",
+ " 0.403468 | \n",
+ " 0.176970 | \n",
+ " 0.961068 | \n",
+ " 1.804595 | \n",
+ " 0.210811 | \n",
+ " 0.187190 | \n",
+ " 0.395839 | \n",
+ " 0.382836 | \n",
+ " 0.098187 | \n",
+ " 0.728097 | \n",
+ " -3.893198 | \n",
+ " 80.459549 | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " | \n",
+ " 11 | \n",
+ " 0.500221 | \n",
+ " 0.152028 | \n",
+ " 0.655734 | \n",
+ " 1.582382 | \n",
+ " 0.143836 | \n",
+ " 0.163566 | \n",
+ " 0.347096 | \n",
+ " 0.340424 | \n",
+ " 0.063444 | \n",
+ " 0.791541 | \n",
+ " -34.426603 | \n",
+ " 58.238248 | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " | \n",
+ " 12 | \n",
+ " 0.599956 | \n",
+ " 0.133009 | \n",
+ " 0.555349 | \n",
+ " 1.411651 | \n",
+ " 0.121816 | \n",
+ " 0.141651 | \n",
+ " 0.309647 | \n",
+ " 0.307381 | \n",
+ " 0.055388 | \n",
+ " 0.846928 | \n",
+ " -44.465076 | \n",
+ " 41.165144 | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " | \n",
+ " 13 | \n",
+ " 0.702231 | \n",
+ " 0.115062 | \n",
+ " 0.492323 | \n",
+ " 1.277757 | \n",
+ " 0.107991 | \n",
+ " 0.123549 | \n",
+ " 0.280277 | \n",
+ " 0.280607 | \n",
+ " 0.050352 | \n",
+ " 0.897281 | \n",
+ " -50.767685 | \n",
+ " 27.775745 | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " | \n",
+ " 14 | \n",
+ " 0.801966 | \n",
+ " 0.102380 | \n",
+ " 0.353404 | \n",
+ " 1.162802 | \n",
+ " 0.077519 | \n",
+ " 0.107834 | \n",
+ " 0.255061 | \n",
+ " 0.259121 | \n",
+ " 0.035247 | \n",
+ " 0.932528 | \n",
+ " -64.659594 | \n",
+ " 16.280206 | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " | \n",
+ " 15 | \n",
+ " 0.905346 | \n",
+ " 0.091861 | \n",
+ " 0.379909 | \n",
+ " 1.073405 | \n",
+ " 0.083333 | \n",
+ " 0.097585 | \n",
+ " 0.235452 | \n",
+ " 0.240675 | \n",
+ " 0.039275 | \n",
+ " 0.971803 | \n",
+ " -62.009063 | \n",
+ " 7.340501 | \n",
+ "
\n",
+ " \n",
+ " 15 | \n",
+ " | \n",
+ " 16 | \n",
+ " 1.000000 | \n",
+ " 0.034810 | \n",
+ " 0.297899 | \n",
+ " 1.000000 | \n",
+ " 0.065344 | \n",
+ " 0.076884 | \n",
+ " 0.219351 | \n",
+ " 0.225172 | \n",
+ " 0.028197 | \n",
+ " 1.000000 | \n",
+ " -70.210141 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " group cumulative_data_fraction lower_threshold lift \\\n",
+ "0 1 0.011155 0.815010 3.295055 \n",
+ "1 2 0.020543 0.795575 3.700764 \n",
+ "2 3 0.030042 0.783550 3.604721 \n",
+ "3 4 0.040093 0.743192 3.005876 \n",
+ "4 5 0.050033 0.697702 3.444512 \n",
+ "5 6 0.101281 0.553193 3.104777 \n",
+ "6 7 0.150320 0.383564 2.187046 \n",
+ "7 8 0.200022 0.296915 1.580423 \n",
+ "8 9 0.301303 0.203539 1.133514 \n",
+ "9 10 0.403468 0.176970 0.961068 \n",
+ "10 11 0.500221 0.152028 0.655734 \n",
+ "11 12 0.599956 0.133009 0.555349 \n",
+ "12 13 0.702231 0.115062 0.492323 \n",
+ "13 14 0.801966 0.102380 0.353404 \n",
+ "14 15 0.905346 0.091861 0.379909 \n",
+ "15 16 1.000000 0.034810 0.297899 \n",
+ "\n",
+ " cumulative_lift response_rate score cumulative_response_rate \\\n",
+ "0 3.295055 0.722772 0.839858 0.722772 \n",
+ "1 3.480460 0.811765 0.805631 0.763441 \n",
+ "2 3.519749 0.790698 0.792441 0.772059 \n",
+ "3 3.390927 0.659341 0.761335 0.743802 \n",
+ "4 3.401573 0.755556 0.723091 0.746137 \n",
+ "5 3.251394 0.681034 0.614736 0.713195 \n",
+ "6 2.904171 0.479730 0.466067 0.637032 \n",
+ "7 2.575244 0.346667 0.327817 0.564881 \n",
+ "8 2.090616 0.248637 0.250648 0.458578 \n",
+ "9 1.804595 0.210811 0.187190 0.395839 \n",
+ "10 1.582382 0.143836 0.163566 0.347096 \n",
+ "11 1.411651 0.121816 0.141651 0.309647 \n",
+ "12 1.277757 0.107991 0.123549 0.280277 \n",
+ "13 1.162802 0.077519 0.107834 0.255061 \n",
+ "14 1.073405 0.083333 0.097585 0.235452 \n",
+ "15 1.000000 0.065344 0.076884 0.219351 \n",
+ "\n",
+ " cumulative_score capture_rate cumulative_capture_rate gain \\\n",
+ "0 0.839858 0.036757 0.036757 229.505549 \n",
+ "1 0.824217 0.034743 0.071501 270.076417 \n",
+ "2 0.814170 0.034240 0.105740 260.472142 \n",
+ "3 0.800925 0.030211 0.135952 200.587630 \n",
+ "4 0.785461 0.034240 0.170191 244.451158 \n",
+ "5 0.699075 0.159114 0.329305 210.477654 \n",
+ "6 0.623061 0.107251 0.436556 118.704581 \n",
+ "7 0.549698 0.078550 0.515106 58.042296 \n",
+ "8 0.449174 0.114804 0.629909 13.351366 \n",
+ "9 0.382836 0.098187 0.728097 -3.893198 \n",
+ "10 0.340424 0.063444 0.791541 -34.426603 \n",
+ "11 0.307381 0.055388 0.846928 -44.465076 \n",
+ "12 0.280607 0.050352 0.897281 -50.767685 \n",
+ "13 0.259121 0.035247 0.932528 -64.659594 \n",
+ "14 0.240675 0.039275 0.971803 -62.009063 \n",
+ "15 0.225172 0.028197 1.000000 -70.210141 \n",
+ "\n",
+ " cumulative_gain \n",
+ "0 229.505549 \n",
+ "1 248.045999 \n",
+ "2 251.974853 \n",
+ "3 239.092657 \n",
+ "4 240.157260 \n",
+ "5 225.139444 \n",
+ "6 190.417123 \n",
+ "7 157.524427 \n",
+ "8 109.061561 \n",
+ "9 80.459549 \n",
+ "10 58.238248 \n",
+ "11 41.165144 \n",
+ "12 27.775745 \n",
+ "13 16.280206 \n",
+ "14 7.340501 \n",
+ "15 0.000000 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "\n",
+ "Scoring History: "
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " | \n",
+ " timestamp | \n",
+ " duration | \n",
+ " number_of_trees | \n",
+ " training_rmse | \n",
+ " training_logloss | \n",
+ " training_auc | \n",
+ " training_pr_auc | \n",
+ " training_lift | \n",
+ " training_classification_error | \n",
+ " validation_rmse | \n",
+ " validation_logloss | \n",
+ " validation_auc | \n",
+ " validation_pr_auc | \n",
+ " validation_lift | \n",
+ " validation_classification_error | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.415 sec | \n",
+ " 0.0 | \n",
+ " 0.415591 | \n",
+ " 0.529427 | \n",
+ " 0.500000 | \n",
+ " 0.000000 | \n",
+ " 1.000000 | \n",
+ " 0.778001 | \n",
+ " 0.413815 | \n",
+ " 0.526105 | \n",
+ " 0.500000 | \n",
+ " 0.000000 | \n",
+ " 1.000000 | \n",
+ " 0.780649 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.443 sec | \n",
+ " 1.0 | \n",
+ " 0.407822 | \n",
+ " 0.511864 | \n",
+ " 0.716131 | \n",
+ " 0.534717 | \n",
+ " 3.474912 | \n",
+ " 0.236370 | \n",
+ " 0.405538 | \n",
+ " 0.507496 | \n",
+ " 0.726731 | \n",
+ " 0.537125 | \n",
+ " 3.444264 | \n",
+ " 0.187652 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.467 sec | \n",
+ " 2.0 | \n",
+ " 0.401483 | \n",
+ " 0.498746 | \n",
+ " 0.744646 | \n",
+ " 0.532172 | \n",
+ " 3.529706 | \n",
+ " 0.228731 | \n",
+ " 0.398808 | \n",
+ " 0.493698 | \n",
+ " 0.752909 | \n",
+ " 0.534588 | \n",
+ " 3.422307 | \n",
+ " 0.232825 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.489 sec | \n",
+ " 3.0 | \n",
+ " 0.396471 | \n",
+ " 0.489013 | \n",
+ " 0.748189 | \n",
+ " 0.535621 | \n",
+ " 3.529706 | \n",
+ " 0.228636 | \n",
+ " 0.393394 | \n",
+ " 0.483273 | \n",
+ " 0.756448 | \n",
+ " 0.535692 | \n",
+ " 3.422307 | \n",
+ " 0.214491 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.515 sec | \n",
+ " 4.0 | \n",
+ " 0.392442 | \n",
+ " 0.481430 | \n",
+ " 0.750121 | \n",
+ " 0.535358 | \n",
+ " 3.529706 | \n",
+ " 0.210780 | \n",
+ " 0.389030 | \n",
+ " 0.475135 | \n",
+ " 0.758511 | \n",
+ " 0.536095 | \n",
+ " 3.422307 | \n",
+ " 0.217915 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.535 sec | \n",
+ " 5.0 | \n",
+ " 0.389141 | \n",
+ " 0.475375 | \n",
+ " 0.750058 | \n",
+ " 0.535198 | \n",
+ " 3.529706 | \n",
+ " 0.245059 | \n",
+ " 0.385453 | \n",
+ " 0.468630 | \n",
+ " 0.758505 | \n",
+ " 0.535659 | \n",
+ " 3.422307 | \n",
+ " 0.214270 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.570 sec | \n",
+ " 6.0 | \n",
+ " 0.386399 | \n",
+ " 0.470332 | \n",
+ " 0.756986 | \n",
+ " 0.535024 | \n",
+ " 3.529706 | \n",
+ " 0.243961 | \n",
+ " 0.382447 | \n",
+ " 0.463157 | \n",
+ " 0.764722 | \n",
+ " 0.536039 | \n",
+ " 3.422307 | \n",
+ " 0.229843 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.592 sec | \n",
+ " 7.0 | \n",
+ " 0.384191 | \n",
+ " 0.466316 | \n",
+ " 0.757005 | \n",
+ " 0.535418 | \n",
+ " 3.529706 | \n",
+ " 0.243961 | \n",
+ " 0.380045 | \n",
+ " 0.458834 | \n",
+ " 0.764634 | \n",
+ " 0.536411 | \n",
+ " 3.422307 | \n",
+ " 0.220013 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.614 sec | \n",
+ " 8.0 | \n",
+ " 0.382341 | \n",
+ " 0.462760 | \n",
+ " 0.761106 | \n",
+ " 0.540176 | \n",
+ " 3.514359 | \n",
+ " 0.247446 | \n",
+ " 0.378063 | \n",
+ " 0.455049 | \n",
+ " 0.770340 | \n",
+ " 0.542043 | \n",
+ " 3.457524 | \n",
+ " 0.204330 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.639 sec | \n",
+ " 9.0 | \n",
+ " 0.380701 | \n",
+ " 0.459589 | \n",
+ " 0.762515 | \n",
+ " 0.540880 | \n",
+ " 3.518279 | \n",
+ " 0.235654 | \n",
+ " 0.376184 | \n",
+ " 0.451464 | \n",
+ " 0.772358 | \n",
+ " 0.543522 | \n",
+ " 3.457524 | \n",
+ " 0.223548 | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.668 sec | \n",
+ " 10.0 | \n",
+ " 0.379202 | \n",
+ " 0.456705 | \n",
+ " 0.762522 | \n",
+ " 0.541424 | \n",
+ " 3.518279 | \n",
+ " 0.235606 | \n",
+ " 0.374583 | \n",
+ " 0.448380 | \n",
+ " 0.772982 | \n",
+ " 0.543893 | \n",
+ " 3.457524 | \n",
+ " 0.226309 | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.697 sec | \n",
+ " 11.0 | \n",
+ " 0.378052 | \n",
+ " 0.454467 | \n",
+ " 0.761648 | \n",
+ " 0.541505 | \n",
+ " 3.521332 | \n",
+ " 0.231023 | \n",
+ " 0.373354 | \n",
+ " 0.445973 | \n",
+ " 0.772925 | \n",
+ " 0.544553 | \n",
+ " 3.460882 | \n",
+ " 0.228960 | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.729 sec | \n",
+ " 12.0 | \n",
+ " 0.377043 | \n",
+ " 0.452420 | \n",
+ " 0.762767 | \n",
+ " 0.541658 | \n",
+ " 3.521332 | \n",
+ " 0.229972 | \n",
+ " 0.372199 | \n",
+ " 0.443670 | \n",
+ " 0.773412 | \n",
+ " 0.543195 | \n",
+ " 3.460882 | \n",
+ " 0.224542 | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.762 sec | \n",
+ " 13.0 | \n",
+ " 0.376137 | \n",
+ " 0.450517 | \n",
+ " 0.764795 | \n",
+ " 0.543264 | \n",
+ " 3.525899 | \n",
+ " 0.234317 | \n",
+ " 0.371369 | \n",
+ " 0.441932 | \n",
+ " 0.774161 | \n",
+ " 0.543632 | \n",
+ " 3.448038 | \n",
+ " 0.227413 | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.796 sec | \n",
+ " 14.0 | \n",
+ " 0.375357 | \n",
+ " 0.448963 | \n",
+ " 0.765145 | \n",
+ " 0.543113 | \n",
+ " 3.525899 | \n",
+ " 0.235654 | \n",
+ " 0.370549 | \n",
+ " 0.440335 | \n",
+ " 0.774176 | \n",
+ " 0.543202 | \n",
+ " 3.448038 | \n",
+ " 0.228076 | \n",
+ "
\n",
+ " \n",
+ " 15 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.848 sec | \n",
+ " 15.0 | \n",
+ " 0.374699 | \n",
+ " 0.447543 | \n",
+ " 0.766118 | \n",
+ " 0.544037 | \n",
+ " 3.528417 | \n",
+ " 0.233219 | \n",
+ " 0.369999 | \n",
+ " 0.439161 | \n",
+ " 0.774592 | \n",
+ " 0.543709 | \n",
+ " 3.448038 | \n",
+ " 0.228297 | \n",
+ "
\n",
+ " \n",
+ " 16 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.903 sec | \n",
+ " 16.0 | \n",
+ " 0.374098 | \n",
+ " 0.446341 | \n",
+ " 0.766529 | \n",
+ " 0.543896 | \n",
+ " 3.560713 | \n",
+ " 0.229161 | \n",
+ " 0.369390 | \n",
+ " 0.437926 | \n",
+ " 0.775021 | \n",
+ " 0.544851 | \n",
+ " 3.424855 | \n",
+ " 0.226751 | \n",
+ "
\n",
+ " \n",
+ " 17 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 43.949 sec | \n",
+ " 17.0 | \n",
+ " 0.373534 | \n",
+ " 0.445115 | \n",
+ " 0.766312 | \n",
+ " 0.544208 | \n",
+ " 3.568370 | \n",
+ " 0.231452 | \n",
+ " 0.368810 | \n",
+ " 0.436669 | \n",
+ " 0.774927 | \n",
+ " 0.545957 | \n",
+ " 3.442929 | \n",
+ " 0.225425 | \n",
+ "
\n",
+ " \n",
+ " 18 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 44.004 sec | \n",
+ " 18.0 | \n",
+ " 0.373121 | \n",
+ " 0.444171 | \n",
+ " 0.766785 | \n",
+ " 0.544720 | \n",
+ " 3.568370 | \n",
+ " 0.229352 | \n",
+ " 0.368496 | \n",
+ " 0.435909 | \n",
+ " 0.775256 | \n",
+ " 0.545586 | \n",
+ " 3.442929 | \n",
+ " 0.226530 | \n",
+ "
\n",
+ " \n",
+ " 19 | \n",
+ " | \n",
+ " 2020-05-28 14:33:23 | \n",
+ " 44.054 sec | \n",
+ " 19.0 | \n",
+ " 0.372722 | \n",
+ " 0.443360 | \n",
+ " 0.767145 | \n",
+ " 0.545059 | \n",
+ " 3.568370 | \n",
+ " 0.226439 | \n",
+ " 0.368047 | \n",
+ " 0.435006 | \n",
+ " 0.775474 | \n",
+ " 0.545922 | \n",
+ " 3.442929 | \n",
+ " 0.224652 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " timestamp duration number_of_trees training_rmse \\\n",
+ "0 2020-05-28 14:33:23 43.415 sec 0.0 0.415591 \n",
+ "1 2020-05-28 14:33:23 43.443 sec 1.0 0.407822 \n",
+ "2 2020-05-28 14:33:23 43.467 sec 2.0 0.401483 \n",
+ "3 2020-05-28 14:33:23 43.489 sec 3.0 0.396471 \n",
+ "4 2020-05-28 14:33:23 43.515 sec 4.0 0.392442 \n",
+ "5 2020-05-28 14:33:23 43.535 sec 5.0 0.389141 \n",
+ "6 2020-05-28 14:33:23 43.570 sec 6.0 0.386399 \n",
+ "7 2020-05-28 14:33:23 43.592 sec 7.0 0.384191 \n",
+ "8 2020-05-28 14:33:23 43.614 sec 8.0 0.382341 \n",
+ "9 2020-05-28 14:33:23 43.639 sec 9.0 0.380701 \n",
+ "10 2020-05-28 14:33:23 43.668 sec 10.0 0.379202 \n",
+ "11 2020-05-28 14:33:23 43.697 sec 11.0 0.378052 \n",
+ "12 2020-05-28 14:33:23 43.729 sec 12.0 0.377043 \n",
+ "13 2020-05-28 14:33:23 43.762 sec 13.0 0.376137 \n",
+ "14 2020-05-28 14:33:23 43.796 sec 14.0 0.375357 \n",
+ "15 2020-05-28 14:33:23 43.848 sec 15.0 0.374699 \n",
+ "16 2020-05-28 14:33:23 43.903 sec 16.0 0.374098 \n",
+ "17 2020-05-28 14:33:23 43.949 sec 17.0 0.373534 \n",
+ "18 2020-05-28 14:33:23 44.004 sec 18.0 0.373121 \n",
+ "19 2020-05-28 14:33:23 44.054 sec 19.0 0.372722 \n",
+ "\n",
+ " training_logloss training_auc training_pr_auc training_lift \\\n",
+ "0 0.529427 0.500000 0.000000 1.000000 \n",
+ "1 0.511864 0.716131 0.534717 3.474912 \n",
+ "2 0.498746 0.744646 0.532172 3.529706 \n",
+ "3 0.489013 0.748189 0.535621 3.529706 \n",
+ "4 0.481430 0.750121 0.535358 3.529706 \n",
+ "5 0.475375 0.750058 0.535198 3.529706 \n",
+ "6 0.470332 0.756986 0.535024 3.529706 \n",
+ "7 0.466316 0.757005 0.535418 3.529706 \n",
+ "8 0.462760 0.761106 0.540176 3.514359 \n",
+ "9 0.459589 0.762515 0.540880 3.518279 \n",
+ "10 0.456705 0.762522 0.541424 3.518279 \n",
+ "11 0.454467 0.761648 0.541505 3.521332 \n",
+ "12 0.452420 0.762767 0.541658 3.521332 \n",
+ "13 0.450517 0.764795 0.543264 3.525899 \n",
+ "14 0.448963 0.765145 0.543113 3.525899 \n",
+ "15 0.447543 0.766118 0.544037 3.528417 \n",
+ "16 0.446341 0.766529 0.543896 3.560713 \n",
+ "17 0.445115 0.766312 0.544208 3.568370 \n",
+ "18 0.444171 0.766785 0.544720 3.568370 \n",
+ "19 0.443360 0.767145 0.545059 3.568370 \n",
+ "\n",
+ " training_classification_error validation_rmse validation_logloss \\\n",
+ "0 0.778001 0.413815 0.526105 \n",
+ "1 0.236370 0.405538 0.507496 \n",
+ "2 0.228731 0.398808 0.493698 \n",
+ "3 0.228636 0.393394 0.483273 \n",
+ "4 0.210780 0.389030 0.475135 \n",
+ "5 0.245059 0.385453 0.468630 \n",
+ "6 0.243961 0.382447 0.463157 \n",
+ "7 0.243961 0.380045 0.458834 \n",
+ "8 0.247446 0.378063 0.455049 \n",
+ "9 0.235654 0.376184 0.451464 \n",
+ "10 0.235606 0.374583 0.448380 \n",
+ "11 0.231023 0.373354 0.445973 \n",
+ "12 0.229972 0.372199 0.443670 \n",
+ "13 0.234317 0.371369 0.441932 \n",
+ "14 0.235654 0.370549 0.440335 \n",
+ "15 0.233219 0.369999 0.439161 \n",
+ "16 0.229161 0.369390 0.437926 \n",
+ "17 0.231452 0.368810 0.436669 \n",
+ "18 0.229352 0.368496 0.435909 \n",
+ "19 0.226439 0.368047 0.435006 \n",
+ "\n",
+ " validation_auc validation_pr_auc validation_lift \\\n",
+ "0 0.500000 0.000000 1.000000 \n",
+ "1 0.726731 0.537125 3.444264 \n",
+ "2 0.752909 0.534588 3.422307 \n",
+ "3 0.756448 0.535692 3.422307 \n",
+ "4 0.758511 0.536095 3.422307 \n",
+ "5 0.758505 0.535659 3.422307 \n",
+ "6 0.764722 0.536039 3.422307 \n",
+ "7 0.764634 0.536411 3.422307 \n",
+ "8 0.770340 0.542043 3.457524 \n",
+ "9 0.772358 0.543522 3.457524 \n",
+ "10 0.772982 0.543893 3.457524 \n",
+ "11 0.772925 0.544553 3.460882 \n",
+ "12 0.773412 0.543195 3.460882 \n",
+ "13 0.774161 0.543632 3.448038 \n",
+ "14 0.774176 0.543202 3.448038 \n",
+ "15 0.774592 0.543709 3.448038 \n",
+ "16 0.775021 0.544851 3.424855 \n",
+ "17 0.774927 0.545957 3.442929 \n",
+ "18 0.775256 0.545586 3.442929 \n",
+ "19 0.775474 0.545922 3.442929 \n",
+ "\n",
+ " validation_classification_error \n",
+ "0 0.780649 \n",
+ "1 0.187652 \n",
+ "2 0.232825 \n",
+ "3 0.214491 \n",
+ "4 0.217915 \n",
+ "5 0.214270 \n",
+ "6 0.229843 \n",
+ "7 0.220013 \n",
+ "8 0.204330 \n",
+ "9 0.223548 \n",
+ "10 0.226309 \n",
+ "11 0.228960 \n",
+ "12 0.224542 \n",
+ "13 0.227413 \n",
+ "14 0.228076 \n",
+ "15 0.228297 \n",
+ "16 0.226751 \n",
+ "17 0.225425 \n",
+ "18 0.226530 \n",
+ "19 0.224652 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "See the whole table with table.as_data_frame()\n",
+ "\n",
+ "Variable Importances: "
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " variable | \n",
+ " relative_importance | \n",
+ " scaled_importance | \n",
+ " percentage | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " PAY_0 | \n",
+ " 2794.444824 | \n",
+ " 1.000000 | \n",
+ " 0.693347 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " PAY_2 | \n",
+ " 307.237366 | \n",
+ " 0.109946 | \n",
+ " 0.076231 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " PAY_3 | \n",
+ " 215.152893 | \n",
+ " 0.076993 | \n",
+ " 0.053383 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " PAY_4 | \n",
+ " 155.434448 | \n",
+ " 0.055623 | \n",
+ " 0.038566 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " PAY_AMT1 | \n",
+ " 127.986313 | \n",
+ " 0.045800 | \n",
+ " 0.031755 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " PAY_5 | \n",
+ " 127.538628 | \n",
+ " 0.045640 | \n",
+ " 0.031644 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " PAY_6 | \n",
+ " 102.351601 | \n",
+ " 0.036627 | \n",
+ " 0.025395 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " LIMIT_BAL | \n",
+ " 82.432350 | \n",
+ " 0.029499 | \n",
+ " 0.020453 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " PAY_AMT2 | \n",
+ " 58.934135 | \n",
+ " 0.021090 | \n",
+ " 0.014623 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " PAY_AMT4 | \n",
+ " 58.858047 | \n",
+ " 0.021063 | \n",
+ " 0.014604 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " variable relative_importance scaled_importance percentage\n",
+ "0 PAY_0 2794.444824 1.000000 0.693347\n",
+ "1 PAY_2 307.237366 0.109946 0.076231\n",
+ "2 PAY_3 215.152893 0.076993 0.053383\n",
+ "3 PAY_4 155.434448 0.055623 0.038566\n",
+ "4 PAY_AMT1 127.986313 0.045800 0.031755\n",
+ "5 PAY_5 127.538628 0.045640 0.031644\n",
+ "6 PAY_6 102.351601 0.036627 0.025395\n",
+ "7 LIMIT_BAL 82.432350 0.029499 0.020453\n",
+ "8 PAY_AMT2 58.934135 0.021090 0.014623\n",
+ "9 PAY_AMT4 58.858047 0.021063 0.014604"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": []
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# load saved best model from lecture 1 \n",
+ "best_mgbm = h2o.load_model('best_mgbm')\n",
+ "\n",
+ "# display model details\n",
+ "best_mgbm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Shutdown H2O"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Are you sure you want to shutdown the H2O instance running at http://127.0.0.1:54321 (Y/N)? n\n"
+ ]
+ }
+ ],
+ "source": [
+ "# be careful, this can erase your work!\n",
+ "h2o.cluster().shutdown(prompt=True)"
+ ]
+ }
+ ],
+ "metadata": {
+ "anaconda-cloud": {},
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/tex/lecture_4.pdf b/tex/lecture_4.pdf
index 8d12b5d..26805ac 100644
Binary files a/tex/lecture_4.pdf and b/tex/lecture_4.pdf differ
diff --git a/tex/lecture_4.tex b/tex/lecture_4.tex
index 0b5217e..c9249d8 100644
--- a/tex/lecture_4.tex
+++ b/tex/lecture_4.tex
@@ -166,12 +166,11 @@
\frametitle{Backdoors and Watermarks: \textbf{What?}}
\begin{itemize}
- \item Hackers gain unauthorized access to your production scoring code OR ...
- \item Malicious or extorted data science or IT insiders change your production scoring code ...
- \end{itemize}
- \vspace{20pt}
-\hspace{10pt} ... adding a backdoor that can be exploited using water-marked data.
-
+ \Large
+ \item Hackers gain unauthorized access to your production scoring code \\ OR ...
+ \item Malicious or extorted data science or IT insiders change your production scoring code
+ \item{Also adding a backdoor that can be exploited using water-marked data}
+ \end{itemize}
\end{frame}
\begin{frame}
@@ -208,7 +207,6 @@
\frametitle{Surrogate Model Inversion Attacks: \textbf{What?}}
Due to lax security or a distributed attack on your model API or other model endpoint, hackers or competitors simulate data, submit it, receive predictions, and train a surrogate model between their simulated data and your model predictions. This surrogate can ...
- \vspace{10pt}
\begin{itemize}
\item expose your proprietary business logic, i.e. ``model stealing'' \cite{model_stealing}.
\item reveal sensitive aspects of your training data.
@@ -279,6 +277,7 @@
\frametitle{Membership Inference Attacks: \textbf{Defenses}}
\begin{itemize}
+ \Large
\item See Slide \ref{slide:inversion_defense}.
\item \textbf{Monitor for training data}: Monitor your production scoring queue for data that closely resembles any individual used to train your model. Real-time scoring of rows that are extremely similar or identical to data used in training, validation, or testing should be recorded and investigated.
\end{itemize}
@@ -341,6 +340,7 @@
\frametitle{Impersonation Attacks: \textbf{What?}}
Bad actors learn ...
\begin{itemize}
+ \large
\item by inversion or adversarial example attacks (see Slides \ref{slide:inversion}, \ref{slide:adversary}), the attributes favored by your model and then impersonate them.
\item by disparate impact analysis (see Slide \ref{slide:data_poisoning_defense}), that your model is discriminatory (e.g. \href{https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing}{Propublica and COMPAS}, \href{https://medium.com/@Joy.Buolamwini/response-racial-and-gender-bias-in-amazon-rekognition-commercial-ai-system-for-analyzing-faces-a289222eeced}{Gendershades and Rekognition}), and impersonate your model's privileged class to receive a favorable outcome.\footnote{This presentation makes no claim on the quality of the analysis in Angwin et al. (2016), which has been criticized, but is simply stating that such cracking is possible \cite{angwin16,}, \cite{flores2016false}.}
\end{itemize}
@@ -364,6 +364,7 @@
\frametitle{Impersonation Attacks: \textbf{Defenses}}
\begin{itemize}
+ \Large
\item \textbf{Authentication}: See Slide \ref{slide:inversion_defense}.
\item \textbf{Disparate impact analysis}: See Slide \ref{slide:data_poisoning_defense}.
\item \textbf{Model monitoring}: Watch for too many similar predictions in real-time. Watch for too many similar input rows in real-time.
@@ -372,12 +373,12 @@
\end{frame}
%-------------------------------------------------------------------------------
- \section{General Concerns}
+ \section{General Concerns \& Solutions}
%-------------------------------------------------------------------------------
-
+ \subsection{Concerns}
\begin{frame}[t, allowframebreaks]
- \frametitle{General concerns}
+ \frametitle{General Concerns}
\footnotesize
\begin{itemize}
\item \textbf{Black-box models}: Over time a motivated, malicious actor could learn more about your own black-box model than you know and use this knowledge imbalance to attack your model \cite{papernot2018marauder}.
@@ -390,10 +391,6 @@
\end{frame}
-%-------------------------------------------------------------------------------
- \section{General Solutions}
-%-------------------------------------------------------------------------------
-
%-------------------------------------------------------------------------------
\subsection{General Solutions}
%-------------------------------------------------------------------------------
@@ -401,14 +398,22 @@
\begin{frame}[t, allowframebreaks]
\frametitle{General Solutions}
- \footnotesize
+ %\footnotesize
+ \small
\begin{itemize}
\item \textbf{Authenticated access and prediction throttling}: for prediction APIs and other model endpoints.
\item \textbf{Benchmark models}: Compare complex model predictions to less complex (and hopefully less hackable) model predictions. For traditional, low signal-to-noise data mining problems, predictions should not be too different. If they are, investigate them.
\item \textbf{Encrypted, differentially private, or federated training data}: Properly implemented, these technologies can thwart many types of attacks. Improperly implemented, they simply create a broader attack surface or hinder forensic efforts.
\item \textbf{Interpretable, fair, or private models}: In addition to models like LFR and PATE, also checkout \href{https://github.com/h2oai/h2o-3/blob/master/h2o-py/demos/H2O_tutorial_gbm_monotonicity.ipynb}{monotonic GBMs}, \href{https://christophm.github.io/interpretable-ml-book/rulefit.html}{Rulefit}, \href{https://github.com/IBM/AIF360}{AIF360}, and the \href{https://users.cs.duke.edu/~cynthia/code.html}{Rudin group} at Duke.
- \framebreak
- \item \textbf{Model documentation, management, and monitoring}:
+ \end{itemize}
+ \end{frame}
+
+ \begin{frame}
+ \frametitle{General Solutions}
+ \normalsize
+ \begin{itemize}
+ %\framebreak
+ \item \textbf{Model documentation, management, and monitoring}:
\begin{itemize}
\item Take an inventory of your predictive models.
\item Document production models well-enough that a new employee can diagnose whether their current behavior is notably different from their intended behavior.
@@ -419,7 +424,6 @@
\item \textbf{System monitoring and profiling}: Use a meta anomaly detection system on your entire production modeling system’s operating statistics — e.g. number of predictions in some time period, latency, CPU, memory and disk loads, number of concurrent users, etc. — then closely monitor for anomalies.
\end{itemize}
\normalsize
-
\end{frame}
%-------------------------------------------------------------------------------