Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitset.plot_stratification() #107

Open
aiqc opened this issue May 8, 2022 · 0 comments
Open

Splitset.plot_stratification() #107

aiqc opened this issue May 8, 2022 · 0 comments
Assignees

Comments

@aiqc
Copy link
Owner

aiqc commented May 8, 2022

Problem

We put all of this effort into stratifying [and balancing] samples, but we have no way to visualize that for end users.

Solution

  • This post demonstrates visualizing the distribution of each split: https://towardsdatascience.com/straightforward-stratification-bb0dcfcaf9ef
  • The sample indices are in splitset.samples you can use those to fetch the label classes with Label.to_numpy(sample_indices) and Label.to_pandas(sample_indices).
  • Don't worry about folds for now.
  • Reference the lines surrounding np.issubdtype related to bin_count for help determining categorical vs continuous labels
@aiqc aiqc self-assigned this May 8, 2022
@aiqc aiqc changed the title Visualize stratification Splitset.plot_stratification() May 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant