THE helper package for Quick Visualization that you need
qv_helper
is a newly designed package to faciliate data visualization. As researchers or analysts, we often need to perform preliminary visualizations before analysis. While the plots may show some seemingly promosing effects, a statistical test may reveal the other wise.
Sometimes, we will want to look at both the plots and the statistical tests results. Currently, there is no single package or function that performs both visualization and tests simutaneously. Having statistical tests results automatically generated can facilitate the pipeline of exploratory data analysis (EDA) while helping researchers to quickly grap a better sense of the data with statistics as supplements to plots. This is why I wanto to build qv_helper
. With qv_helper
, visualizations and statistical tests are no longer separated by parallel processes, achieving by just 1 line of code.
While the package is specially designed for EDA, it is more recommended to be used in notebook documents instead of in the terminal.
Installation is easy as the package is published in PyPI.
$ pip install qv_helper
To illustrate the functionalities, the Palmer penguins dataset will be used.
qv_groups
take 1 numeric variable and 1 categorical variable to build a histogram for the numeric variable and a boxplot of the numeric variable grouped by the categorical variable. When there are more than 2 classes in the grouping variable, a one-way ANOVA test will be performed.
from palmerpenguins import load_penguins
df = load_penguins()
from qv_helper.qv_helper import qv_groups
qv_groups(value='bill_length_mm', group='species', data=df, title='Bill Length in Different Species', xlabel='Bill length (mm)', ylabel='Species')
Null values are dropped in statistical tests.
Test F p
------------- ------ ------
One-way ANOVA 397.30 0.0000
When there are only 2 groups, t-tests will be performed automatically.
qv_groups(value='bill_length_mm', group='sex', data=df, title='Bill Length in 2 Sex', xlabel='Bill length (mm)', ylabel='Sex')
Null values are dropped in statistical tests.
Test t p
---------------------- ---- ------
Equal var. assumed 0.00 1.0000
Equal var. not assumed 0.00 1.0000
qv_scatter
takes 2 numeric values as arguments and plot the corresponding scatter plot. 2 correlation statistics will be printed based on the needs of users.
qv_scatter(valuex='bill_length_mm', valuey='bill_depth_mm', data=df, title='Relationship between Bill Length and Bill Depth', xlabel='Bill Length (mm)', ylabel='Bill Depth (mm)')
Null values are dropped in statistical tests.
Test r p
------------ ------- ------
Pearson's r -0.2286 0.0000
Spearman's r -0.2139 0.0001
qv_2cat
takes 2 categorical variables as arguments and plot the corresponding heatmap and a stacked barchart for to illustrate the proportion of each class in groupx
in groupy
. When both of the categorical variables are with exactly 2 classes, Barnard's exact test and Fisher's exact test will also be performed.
qv_2cat(groupx='species', groupy='island', data=df, title_heatmap='Count of each Species on each Island',
title_bar='Proportion of each Species on each Island', xlabel='Species', ylabel='Island')
Test Test statistic Value df p
---------------- ---------------- ------- ---- ------
Chi-squared test Chi-squared 299.55 4 0.0000
qv_count
takes 1 categorical variable as argument and plot a barchart. The count in numeric values will also be printed and supplemented by the the number of null values.
qv_count(value='species', data=df, title='Count of each Species', label='Species')
Group Count
--------- -------
Adelie 152
Gentoo 124
Chinstrap 68
NA 0
qv_dist
takes 1 numeric variable as argument and plot a histogram. Summary statistics will be printed as well.
qv_dist(value='bill_length_mm', data=df, title='Distribution of Bill Length', label='Bill Length (mm)')
Null values are dropped in the chart and statistics.
Statistics Value
------------ -------
Mean 43.99
Variance 29.82
Sample size 333.00
# of NAs 0.00
Skewness 0.05
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
qv_helper
was created by Morris Chan. It is licensed under the terms of the MIT license.
qv_helper
was created with cookiecutter
and the py-pkgs-cookiecutter
template.
Special thanks go to @MNBhat, @Lorraine97, @austin-shih, who finished an academic project in prelim_eda_helper
with me. Quick Visualizaiont Helper
is inspired by prelim_eda_helper
. The development of the current project is agreed by all authors of prelim_eda_helper
.