Skip to content

A visualization tool designed to help Data Scientist with their datasets

License

Notifications You must be signed in to change notification settings

Datajacker/Data-Science-Helper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Helper

A visualization tool designed to help data scientists better examine their data sets.

Installation (its on PYPI !!)

pip install dshelper

Usage

import dshelper
dshelper.dshelp(df)

Feature

  • Default view with raw data, dataframe info and describe
  • Drag on the header to re-arrange columns
  • Left click on the right panel to show/hide columns
  • Various plots
  • Bottom right buttons to hide panels and focus on data set
  • Dockerized with make commands

Plots

  • Histogram
  • Heatmap
  • Correlation
  • Scatter Plot
  • Box Plot
  • Violin Plot
  • Pair plot

Demo with Titanic data

The default view, main panel displays the dataset. The bottom panel displays the statistics of the dataset The right panel has two tabs, the first one displays the stats for all the columns, the second one displays the system logs.

Main View

The bottom and right panels can be hidden by clicking the buttons located on the bottom right of the window. This will allow data scientists to focus on the dataset and plots

Focus View

You can also drag and drop to re-arrange the column orders, click on the right column tab to hide columns in the main view.

Hide View

And below are a few plots:

Histogram Heatmap Heatmap with Correlation plot Scatter Box and violin plots Pair plots

Dependencies

  • wxpython
  • matplotlib
  • seaborn
  • pandas
  • numpy
  • sciki-tlearn
  • scipy
  • statsmodels

How to run locally

  • git clone [email protected]:zmcddn/Data-Science-Helper.git
  • conda create -n py36 python=3.6 or use virtualenv or pipenv
  • activate py36 (windwos) or source activate py36 (mac, linux)
  • conda install --yes --file requirements.txt or pip install -r requirements.txt
  • In case the PyPubSub is not installed with conda, you can do pip install PyPubSub
  • cd dshelper
  • python dshelper.py (windwos, linux) or pythonw dshelper.py (mac)

For help with any dataframe, you can follow the following steps:

  • import dshelper
  • dshelper.dshelp(df)

Run with docker

  • make build to build the project
  • make runlinux to run in Linux
  • WIP for mac

To-do

  • next version
    • Sort by columns
    • Import file (csv, excl)
    • Add menu
    • export file
    • ability to change cells
    • standalone version
  • next big version
    • correlation analysis
    • feature importance
    • support large file (sampling)
  • next next big version
    • Support for multiple index
    • Time series analysis
    • Optimization

If you like this project, please distribute it and star it for more people to see. Any suggestions and contributions are very welcomed.

ALL RIGHTS RESERVED

About

A visualization tool designed to help Data Scientist with their datasets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.6%
  • Dockerfile 1.1%
  • Makefile 0.3%