Skip to content

๐Ÿ’ฅ ๐Ÿ“ˆ A curated list of data science, analysis and visualization tools

Notifications You must be signed in to change notification settings

quantmind/awesome-data-science-viz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

67 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Data Science & Visualization Awesome

A curated list of data science, machine learning and visualization tools with emphasis on python, d3 and web applications.

CONTRIBUTING

Contents

Machine Learning

Resources

Frameworks

  • Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently
  • TensorFlow library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.
  • Keras Deep Learning library for Theano, TensorFlow and CNTK.
  • Caffe deep learning framework made with expression, speed, and modularity in mind. Written in C++ and has python bindings.
  • Torch provides several tools for fast tensor mathematics, storage interfaces and machine learning models. Written in C with Lua interface.
  • PyTorch tensors and dynamic neural networks in Python with strong GPU acceleration
  • Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. Writtent in C++ with bindings for python and other languages.
  • Scikit Learn is a Python module for machine learning built on top of SciPy
  • CNTK computational network toolkit. A C++ library by Microsoft Research.
  • XGboost an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. Written in C++ with python integration.
  • Tpot is a python tool that automatically creates and optimizes machine learning pipelines using genetic programming.

Neural networks

  • Brainforge A Neural Networking library based on NumPy only
  • deeplearn.js a neural network library for the web
  • OpenNN a neural network C++ library

Reinforcement Learning

  • Keras-rl Deep Reinforcement Learning for Keras.
  • Gym A toolkit for developing and comparing reinforcement learning algorithms. Written in Python.
  • TFLearn is a deep learning library featuring a higher-level API for TensorFlow.
  • Tensorforce a TensorFlow library for applied reinforcement learning

Examples

NLP

Natural Language processing benefits from Recurrent Neural Network algorithms.

Analysis

  • huggingface/transformers State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0
  • Natural Language Toolkit (NLTK) is a suite of python modules, data sets and tutorials supporting research and development in NLP. Some of its modules are out of date but still a useful resource nonetheless.
  • SpaCy is a powerful, production ready, NLP library for python
  • fastText a C++ library for sentence classification
  • TextBlob is a python library for processing textual data. It provides a simple API for diving into common NLP tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
  • simhash a python implementation of Simhash Algorithm for detecting near-duplicate web documents
  • langdetect is a port of Google's language-detection library to Python.

Tools

  • inflect.py Correctly generate plurals, ordinals, indefinite articles; convert numbers to words
  • dataprofiler The DataProfiler is a Python library designed to make data analysis, monitoring and sensitive data detection easy. NLP processing is accomplished using a character-level CNN.

Resources

Images

Resources

  • Convolutional neural network In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural network that have successfully been applied to analyzing visual imagery.

Frameworks

  • tesseract-ocr well tested OCR engine written in C++
  • OpenCV computer vision and machine learning software library. The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc. Written in C++ with bindins for most languages including python.
  • SimpleCV is a framework for machine vision, using OpenCV and Python. It provides a concise, readable interface for cameras, image manipulation, feature extraction, and format conversion.
  • match makes it easy to search for images that look similar to each other
  • Noteshrink Convert scans of handwritten notes to beautiful, compact PDFs
  • srez Image super-resolution through deep learning
  • CovNetJS train Convolutional Neural Networks (or ordinary ones) in the browser

Data

Sources

Aggregators

  • pyspider a web crawler system in python.
  • Newspaper News, full-text, and article metadata extraction in Python 3.

Explore

  • Crossfilter is a JavaScript library for exploring large multivariate datasets in the browser.

Storage

  • pytables a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data. It is built on top of the HDF5 library and the NumPy package.

Visualization

Resources

JavaScript Libraries

  • Chart.js HTML5 Charts using the canvas tag
  • G2 is a visualization grammar, a data-driven visual language with a high level of usability and scalability
  • plotly.js charting library built on top of d3 and stack.gl
  • frappe/charts Simple, responsive, modern SVG Charts with zero dependencies
  • GraphicsJS A lightweight JavaScript graphics library with the intuitive API, based on SVG/VML technology.

Python Libraries

  • bokeh an interactive visualization library that targets modern web browsers for presentation
  • bqplot plotting library for IPython/Jupyter notebooks - front-end in d3
  • dash Dash is a Python framework for building analytical web applications
  • Altair declarative statistical visualization library for Python, based on Vega and Vega-Lite

D3 based libraries

Digital Art

Languages

Python

  • Awesome Python A curated list of awesome Python frameworks, libraries, software and resources.
  • Interactive coding challenges which focus on algorithms and data structures that are typically found in coding interviews

JavaScript

License

CC0

To the extent possible under law, Quantmind has waived all copyright and related or neighboring rights to this work.

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •