`Splitset.plot_stratification()` #107

aiqc · 2022-05-08T15:47:25Z

Problem

We put all of this effort into stratifying [and balancing] samples, but we have no way to visualize that for end users.

This post demonstrates visualizing the distribution of each split: https://towardsdatascience.com/straightforward-stratification-bb0dcfcaf9ef
The sample indices are in splitset.samples you can use those to fetch the label classes with Label.to_numpy(sample_indices) and Label.to_pandas(sample_indices).
Don't worry about folds for now.
Reference the lines surrounding np.issubdtype related to bin_count for help determining categorical vs continuous labels

The text was updated successfully, but these errors were encountered:

aiqc self-assigned this May 8, 2022

aiqc changed the title ~~Visualize stratification~~ Splitset.plot_stratification() May 8, 2022