This release is the first release that is usable as a fully open source package. That being said it is still a beta! To celebrate its independence, we have named it after extremely independent artist "Nicolas Jaar.
IMPORTANT!
- This release got a huge version bump and breaks all functionality of previously installed
sheetwork
- Check documentation for Installation and Configuration
- Refactor your jobs according to Usage (see documentation)
- Solid management of credentials, target schemas, database interaction with
Project
,Config
, andProfile
concepts. INSERT LINK TO DOC. (#81, #76, #91, #80, #73) - Allows ability to convert
CamelCased
columns in the original sheet to `snake_case (#112) - Ability to call sheetwork from anywhere on disc provided paths to config, profiles, and projects files are provided at runtime (#82)
- Adds support for short flag names for commonly used arguments (#70)
- Adds
sheetwork init
task to automatically set up a new sheetwork project and folders for you INSERT LINKS TO DOC
- Checks for duplicate columns in sheet (#145, #151)
- Checks that column up for exclusion are in df otherwise throws errors (#145)
- Improve interactive clean up (#156)
- Fixes a bug with default target schemas (#155)
- Fixes
--help
formatting (#147) - Connects to Snowflake via SQLAlchemy (#102)
- Implements its own logger and logs to file (#98, #121)
- Is case insensitive, except when referring to columns in sheet via
identifier:
for renames (#63) - Implements an Adaptor/Plugin design to allow for adapting to other databases (#173)
Documentation Missed and was not usable.
This is the very first version of sheetwork. It is named after the German electronic pioneers "Kraftwerk" as it is the pioneer (first version) of the package that will set and pave the way for future generations to come and we will all live happy forever ever after...
It loads Google Sheets into Snowflake, from the comand line and avoids the fast multiplication of <insert_non_creative_name>_sheet_importer.py
type scripts.
- Loads google sheet into a pandas DataFrame. #9
- Performs basic cleaning.#10
- Pushes data to Snowflake using data_tools.push_pandas_to_snowflake(). #9
--dry_run
functionality skips pushes to database and offers preview of datatypes, and head of dataframe that would be uploaded to database. #12--mode dev
overrides target schema to "sand" to avoid pushing the wrong data to a production table or to allow for full flow testing while behind user permissions.#11--force
can be added when running indev
mode to force a push to otherwise overidden target schema (see bullet above). #12
This release focuses on making sheetwork
a lot more flexible and robust. The main addition lies with the ability to read from a config file columns which data types need to be changed in table creation when pushed to database.
- Is able to read configuratio info from a configuration file (
sheets.yml
) which contains (amongst other things) column typing often required after pulling a sheet from google as the data loaded by pandas is often interpreted as strings. #20 - Allows for interactive CLI interface on whether to run default cleanups when passing
--i
. This cleanup was previously applied to all sheets by default. #17 - Under the hood: validates the config file in a pretty strict way for missing tags which are required when reading from config or for unalowable tags which could potentially break things down the line. #24