Skip to content

Analyticsphere/flatteningRequests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

flatteningRequests

A place to communicate about the team flattened data needs.

Use the issues tools to make and manage requests.

The flattening workflow:

flowchart LR
    subgraph BQ [BigQuery]
        SRC[[Source Table]]:::table
        QUERY(Scheduled Query):::table
        FLAT[[Flattened Table]]:::table
    end
    class BQ subgraphTitle;

    SCH[/schema.json/]:::file
    DEV[/dev.sql/]:::file
    STG[/stg.sql/]:::file
    PROD[/prod.sql/]:::file

    subgraph LOC ["flatten.Rmd"]
        FETCH("fetch_schema()"):::function
        SCH_CSV[/"schema.csv"/]:::file
        FILT("sort_variables()"):::function
        ARR[/arrays.json/]:::file
        VAR[/variables.csv/]:::file
        GEN("generate_queries()"):::function
    end
    class LOC subgraphTitle;

    SRC --- |API| SCH:::link
    SCH --> FETCH:::link
    FETCH --> SCH_CSV:::link
    SCH_CSV --> FILT:::link
    FILT --> ARR:::link & VAR:::link
    ARR & VAR --> GEN:::link
    GEN --> DEV:::link & STG:::link & PROD:::link
    DEV & STG & PROD --> |API| QUERY:::link
    QUERY --> FLAT:::link

    classDef table link stroke:#000,stroke-width:1.5px;
    classDef schema fill:#ff9,stroke:#333,stroke-width:2px;
    classDef function fill:#ffc,stroke:#333,stroke-width:2px;
    classDef file fill:#cfc,stroke:#333,stroke-width:2px;
    classDef subgraphTitle font-size:24px;
Loading

flatten.Rmd is an R notebook that simplifies the process of flattening the Connect4Cancer source tables:

  1. Retrieve the schema of the source table using the BigQuery API.
  2. Convert the schema from JSON to CSV.
  3. Sort the variables in the source table by their data structure: arrays and non-array variables and stores them in appropriate files.
  4. Generate SQL queries for each environment.
  5. Update the Scheduled Query which generates the flattened table daily.

Notes:

  • The user can ignore the bash/, python/ and js/ directories which store legacy code that could be referenced later.
  • The core functions of this workflow are in teh r/ directory

Query configurations

The query_configs/ directory stores a metadata for each table to be flattened and is referenced by the flatten.Rmd notebook. To set up a new table for flattening, a new config folder needs to be setup for the new table with meta data about the table.

Here is an example for the PROMIS survey table:

query_configs/promis_v1/
├── dev
│   ├── FlatConnect.promis_v1_JP-dev.sql
│   ├── promis_v1-lists.json
│   ├── promis_v1-variables.csv
│   └── query-config.json
├── prod
│   ├── FlatConnect.promis_v1_JP-prod.sql
│   ├── promis_v1-lists.json
│   ├── promis_v1-variables.csv
│   └── query-config.json
└── stg
    ├── FlatConnect.promis_v1_JP-stg.sql
    ├── promis_v1-lists.json
    ├── promis_v1-variables.csv
    └── query-config.json

Each tier dev, stg and prod has a *.sql, *-lists.json, *-variables.csv, and a query-config.json file.

  • query-config.json: Specifies key information about the table, including the name of the source table, the name of the resulting SQL query, the name of target table and the resource_name (containing the transfer_config_id) of the scheduled query to be updated. These bit of information must be updated manually.
  • *-lists.json: A JSON file key/value pairs are the names of "array variables" and an array of possible responses for these variables, respectively.
  • *-variables.csv: A CSV file listing the names of the variables for which a single value can be given.
  • *.sql: The resulting SQL query that is generated by this workflow, which is delivered to BigQuery via the BQ Transfer Config API.

About

A place to communicate about the team flattened data needs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •