⬆️ Higher Priority
⬇️ Lower Priority
🔵 In Progress
🔷 On Deck
- Basic implementation of table namespace: C:03/02/2020
- Basic Configuration: C:03/03/2020
- Basic Operator Registration: C:03/04/2020
- Package configuration: C:03/03/2020
- Basic Documentation: C:03/04/2020
- Prettify outputs and update documentation: C:03/04/2020
- Validate mason configuration file using json_schema: C:03/04/2020
- Validate operators according to json_schema C:03/04/2020
- Add logger with log levels. C:03/05/2020
- ⬆️ Validate client compatability with operators C:03/06/2020
- ⬆️ Catch up old rest api interface (migrate https://github.com/samtecspg/data/tree/master/catalog/api to operators): C:/03/09/2020
- ⬆️ Move over tests and mocks. C: 03/11/2020
- ⬆️ More test coverage on basic funcitonality:
- Parameters C: 03/11/2020
- Configurations C:03/11/2020
- Operators C: 03/12/2020
- Engines C: 03/13/2020
- Clients C: 03/23/2020
- Cli (started, progress made)
- ⬆️ Clean up rest api implementation C:03/13/2020
- create "mason run" cli command C: 03/09/2020
Pull rest api responses through to swagger spec (200 status example)Redid rest api interface to not need this - ⬆️ Advanced Operator Registration Documentation
- ⬆️ New Client Documentation
- ⬇️ New Engine Documentation
- ⬆️ Dockerize mason implementation C: 03/09/2020
- Build and refine "Engines" first order concept C: 03/06/2020
- Establish docker style sha registration for installed operators to fix version conflicts
Explore graphql for the api? Note found a way around this for now.wont do - Generalize Engines to be "registerable" and serial
- Support multiple clients for a single engine type.
- Establish common interfaces for metastore engine objects. Metastore engine models, IE Table, Database, Schedule, etc C: 03/20/2020
- Allow operator definitions to have valid "engine set" configurations C:03/22/2020
- Allow for multiple configurations C: 04/08/2020
- Clean up multiple configurations -> add id, don't use enumerate. C:05/10/2020
Allow operators to only be one level deep, ie not have a namespace (both in definition and folder configuration): not going to do right now - ⬆️ Consolidate response.add_ actions and logger._ actions into one command C: 04/13/2020
- Interpolate environment variables into config and have that affect config validation: C: 04/23/2020
- Clean up mock implementations: C: 04/23/2020
- Consolidate all AWS response error parsing methods.
- Improve performance by moving around imports. C: 06/15/2020
- Version checking in installed operators
- Replace operator installation method with something more robust (Done, kind of)
- Parameter type inference and checking
- Parameter aliases: ex: database_name -> bucket_name
- Malformed Parameters, extraneous ":". Improve parameter specification. Make docs more explicit C: 03/11/2020
- Extraneous parameters. Showing up in "required parameters" error return incorrectly. C: 03/11/2020
- Better errors around Permission errors C: 03/13/2020
- Look into using calcite or coral to extend spark operators to presto and hive (***)
- Look into using protos to communicate metastore schema to execution engine or possibly look into other serialization formats (avro)
- 'job_proxy' execution client which hits a mason client running against a job queue for requests
- Look into datahub internal schema representation
- Validated Infer Workflow (5-step)
- Infer Workflow (1-step)
- Glue Support C: 05/10/2020
- Athena Support with local scheduler (this ended up just being local instantiating of underlying infer operator)
- Athena Support with airflow scheduler
- Allow run flag to trigger existing workflow
- table summary operator
- Infer operator
- Glue Support: C: long time ago
- Schema merge operator C:09/04/2020
- JSON explode operator
- S3 -> ES egress operator
- 🔵 Table Format operator (reformats and repartitions data)
- 🔷 Table "join" operator (on set of columns)
- Dedupe Operator
- Table Operators
- Query (requires metastore and execution engine) C:04/28/2020
- Delete. C: 04/29/2020
- Delete Database
- Seperate out database operator?
- Metastore Database operator
- List databases (~= s3 list buckets)
- Jobs operators (scheduler):
- Get C: 04/08/2020
- List
- Scheduler operators:
- Delete C: 04/29/2020
- Create
- List
- ⬇️ Smart cast operator --> all partitions but 1 have Int, but one has String, cast the string partition
- Basic setup. C: 4/04/2020
- Fix conflicting schemas error with differing partition data. C: 04/2020
- ⬇️ Basic setup
- ⬆️ Basic Setup C: 3/20/2020
- Schema implementations
- ParquetSchema C: 3/17/2020
- CSV Schema C: 4/16/2020
- JsonL schema
- Json schema
- ⬇️ Avro schema
- ⬇️ Msgpack pack schema
- DDL Generation
- Add partitioning concepts to DDL generation
- ⬆️ Basic setup
- ⬆️ Basic setup
- ⬆️ Papermill integration
- Basic setup C: 04/26/2020
- Basic setup
- Kubernetes Operator Runner C: 4/06/2020
- EMR Runner
- Local Runner
- Check that file format is supported
- ⬆️ Basic setup
- Basic Setup
- Kubernetes Runner
- Multiple step workflow implementation
- DAG validation (validate that it is a Directed Acyclic Dag, not that its valid, thats already done)
- Basic set up
- Workflow Implementation
- Basic setup
- Basic setup: C: 06/10/2020
- Move some metastore concepts over here like "paths"
- Basic Setup
- Redshift
- Elasticsearch
- Remove samtec specific examples from examples/ files. Use public examples