Skip to content

Version 0.3.0

Compare
Choose a tag to compare
@paulbkoch paulbkoch released this 19 Nov 20:49
· 1426 commits to develop since this release

[v0.3.0] - 2022-11-16

Added

  • Full Complexity EBMs with higher order interactions supported: GA3M, GA4M, GA5M, etc...
    3-way and higher-level interactions lose exact global interpretability, but retain exact local explanations
    Higher level interactions need to be explicitly specified. No automatic FAST detection yet
  • Mac m1 support
  • support for ordinals
  • merge_ebms now supports merging models with interactions, including higher-level interactions
  • added classic composition option during Differentially Private binning
  • support for different kinds of feature importances (avg_weight, min_max)
  • exposed interaction detection API (FAST algorithm)
  • API to calculate and show the importances of groups of features and terms.

Changed

  • memory efficiency: About 20x less memory is required during fitting
  • predict time speed improvements. About 50x faster for Pandas CategoricalDType,
    and varying levels of improvements for other data types
  • handling of the differential privacy DPOther bin, and non-DP unknowns has been unified by having a universal unknown bin
  • bin weights have been changed from per-feature to per-term and are now multi-dimensional
  • improved scikit-learn compliance: We now conform to the scikit-learn 1.0 feature names API by using
    self.feature_names_in_ for the X column names and self.n_features_in_.
    We use the matching self.feature_types_in_ for feature types, and self.term_names_ for the additive term names.

Fixed

  • merge_ebms now distributes bin weights proportionally according to volume when splitting bins
  • DP-EBMs now use sample weights instead of bin counts, which preserves privacy budget
  • improved scikit-learn compliance: The following init attributes are no longer overwritten
    during calls to fit: self.interactions, self.feature_names, self.feature_types
  • better handling of floating point overflows when calculating gain and validation metrics

Breaking Changes

  • EBMUtils.merge_models function has been renamed to merge_ebms
  • renamed binning type 'quantile_humanized' to 'rounded_quantile'
  • feature type 'categorical' has been specialized into separate 'nominal' and 'ordinal' types
  • EBM models have changed public attributes:
    • feature_groups_ -> term_features_
      global_selector -> n_samples_, unique_val_counts_, and zero_val_counts_
      domain_size_ -> min_target_, max_target_
      additive_terms_ -> term_scores_
      bagged_models_ -> BaseCoreEBM has been depricated and the only useful attribute has been moved 
                        into the main EBM class (bagged_models_.model_ -> bagged_scores_)
      feature_importances_ -> has been changed into the function term_importances(), which can now also 
                              generate different types of importances
      preprocessor_ & pair_preprocessor_ -> attributes have been moved into the main EBM model class (details below)
      
  • EBMPreprocessor attributes have been moved to the main EBM model class
    • col_names_ -> feature_names_in_
      col_types_ -> feature_types_in_
      col_min_ -> feature_bounds_
      col_max_ -> feature_bounds_
      col_bin_edges_ -> bins_
      col_mapping_ -> bins_
      hist_counts_ -> histogram_counts_
      hist_edges_ -> histogram_edges_
      col_bin_counts_ -> bin_weights_ (and is now a per-term tensor)