You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered situations where the returned counterfactuals have not the desired class. It happens only sometimes so I needed to play with seeds to get a reproducible example. I boiled it down to a simple example based on the getting started notebook.
This is the output the code produces:
Query instance (original outcome : 0)
age workclass education marital_status occupation race gender hours_per_week income
0 32 Private HS-grad Married White-Collar White Male 60 0
Diverse Counterfactual set (new outcome: 1)
age workclass education marital_status occupation race gender hours_per_week income
0 61 Private HS-grad Married Professional White Male 60 0
1 32 Private Bachelors Married White-Collar White Male 60 1
The code to reprdocue:
# Sklearn imports
from sklearn.compose import ColumnTransformer
from sklearn.discriminant_analysis import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
# DiCE imports
import dice_ml
from dice_ml.utils import helpers # helper functions
dataset = helpers.load_adult_income_dataset()
dataset = dataset.sample(1000, random_state=1)
y_train = dataset["income"]
x_train = dataset.drop('income', axis=1)
# Step 1: dice_ml.Data
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')
numerical = ["age", "hours_per_week"]
categorical = x_train.columns.difference(numerical)
# We create the preprocessing pipelines for both numeric and categorical data.
numeric_transformer = Pipeline(steps=[("scaler", StandardScaler())])
categorical_transformer = Pipeline(steps=[("onehot", OneHotEncoder(handle_unknown="ignore"))])
transformations = ColumnTransformer(
transformers=[
("num", numeric_transformer, numerical),
("cat", categorical_transformer, categorical),
]
)
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(
steps=[("preprocessor", transformations), ("classifier", RandomForestClassifier(random_state=1))]
)
model = clf.fit(x_train, y_train)
# Using sklearn backend
m = dice_ml.Model(model=model, backend="sklearn")
# Using method=random for generating CFs
exp = dice_ml.Dice(d, m, method="random")
e1 = exp.generate_counterfactuals(x_train[4:5], total_CFs=2, desired_class="opposite", random_seed = 6)
e1.visualize_as_dataframe()
The text was updated successfully, but these errors were encountered:
I further tested it and it also happens for method="genetic". It is a bit harder to catch since random_seed = ... doesn't work for other methods than random (which is by the way also not documented, so I consider this a bug too). But the method has still some randomness so to find occurrences of this bug I run generate_counterfactuals multiple times until the bug occurs once:
# Sklearn imports
from sklearn.compose import ColumnTransformer
from sklearn.discriminant_analysis import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
# DiCE imports
import dice_ml
from dice_ml.utils import helpers # helper functions
dataset = helpers.load_adult_income_dataset()
dataset = dataset.sample(1000, random_state=1)
y_train = dataset["income"]
x_train = dataset.drop('income', axis=1)
# Step 1: dice_ml.Data
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')
numerical = ["age", "hours_per_week"]
categorical = x_train.columns.difference(numerical)
# We create the preprocessing pipelines for both numeric and categorical data.
numeric_transformer = Pipeline(steps=[("scaler", StandardScaler())])
categorical_transformer = Pipeline(steps=[("onehot", OneHotEncoder(handle_unknown="ignore"))])
transformations = ColumnTransformer(
transformers=[
("num", numeric_transformer, numerical),
("cat", categorical_transformer, categorical),
]
)
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(
steps=[("preprocessor", transformations), ("classifier", RandomForestClassifier(random_state=1))]
)
model = clf.fit(x_train, y_train)
# Using sklearn backend
m = dice_ml.Model(model=model, backend="sklearn")
# Using method=random for generating CFs
exp = dice_ml.Dice(d, m, method="genetic")
for i in range(1000):
e1 = exp.generate_counterfactuals(x_train[4:5], total_CFs=10, desired_class="opposite")
print(i)
if (e1.cf_examples_list[0].final_cfs_df["income"].nunique() > 1):
e1.visualize_as_dataframe()
break
If you run this script it will eventually give you some counterfactuals where the class of at least one counterfactual is wrong.
I encountered situations where the returned counterfactuals have not the desired class. It happens only sometimes so I needed to play with seeds to get a reproducible example. I boiled it down to a simple example based on the getting started notebook.
This is the output the code produces:
The code to reprdocue:
The text was updated successfully, but these errors were encountered: