forked from minimaxir/automl-gs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsetup.py
43 lines (36 loc) · 3.51 KB
/
setup.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from setuptools import setup, find_packages
long_description = '''
Give an input CSV file and a target field you want to predict to automl-gs, and get a trained high-performing machine learning or deep learning model plus native code pipelines allowing you to integrate that model into any prediction workflow. No black box: you can see *exactly* how the data is processed, how the model is constructed, and you can make tweaks as necessary.
automl-gs is an AutoML tool which, unlike Microsoft's [NNI](https://github.com/Microsoft/nni), Uber's [Ludwig](https://github.com/uber/ludwig), and [TPOT](https://github.com/EpistasisLab/tpot), offers a *zero code/model definition interface* to getting an optimized model and data transformation pipeline in multiple popular ML/DL frameworks, with minimal Python dependencies (pandas + scikit-learn + your framework of choice). automl-gs is designed for citizen data scientists and engineers without a deep statistical background under the philosophy that you don't need to know any modern data preprocessing and machine learning engineering techniques to create a powerful prediction workflow.
Nowadays, the cost of computing many different models and hyperparameters is much lower than the oppertunity cost of an data scientist's time. automl-gs is a Python 3 module designed to abstract away the common approaches to transforming tabular data, architecting machine learning/deep learning models, and performing random hyperparameter searches to identify the best-performing model. This allows data scientists and researchers to better utilize their time on model performance optimization.
* Generates native Python code; no platform lock-in, and no need to use automl-gs after the model script is created.
* Train model configurations super-fast *for free* using a **TPU** in Google Colaboratory.
* Handles messy datasets that normally require manual intervention, such as datetime/categorical encoding and spaced/parathesized column names.
* Each part of the generated model pipeline is its own function w/ docstrings, making it much easier to integrate into production workflows.
* Extremely detailed metrics reporting for every trial stored in a tidy CSV, allowing you to identify and visualize model strengths and weaknesses.
* Correct serialization of data pipeline encoders on disk (i.e. no pickled Python objects!)
* Retrain the generated model on new data without making any code/pipeline changes.
* Quit the hyperparameter search at any time, as the results are saved after each trial.
The models generated by automl-gs are intended to give a very strong *baseline* for solving a given problem; they're not the end-all-be-all that often accompanies the AutoML hype, but the resulting code is easily tweakable to improve from the baseline.
'''
setup(
name='automl_gs',
packages=['automl_gs'], # this must be the same as the name above
version='0.2',
description='Provide an input CSV and a target field to predict, ' \
'generate a model + code to run it.',
long_description=long_description,
long_description_content_type='text/markdown',
author='Max Woolf',
author_email='[email protected]',
url='https://github.com/minimaxir/automl-gs',
keywords=['deep learning', 'tensorflow', 'keras', 'automl', 'xgboost'],
classifiers=[],
license='MIT',
entry_points = {
'console_scripts': ['automl_gs=automl_gs.automl_gs:cmd'],
},
python_requires='>=3.5',
include_package_data=True,
install_requires=['pandas', 'scikit-learn', 'autopep8', 'tqdm', 'jinja2>=2.8']
)