|
1 |
| -<p align="center"> |
2 |
| - <img alt="Datafold" src="https://user-images.githubusercontent.com/1799931/196497110-d3de1113-a97f-4322-b531-026d859b867a.png" width="50%" /> |
| 1 | +<p align="left"> |
| 2 | + <img alt="Datafold" src="https://user-images.githubusercontent.com/1799931/196497110-d3de1113-a97f-4322-b531-026d859b867a.png" width="30%" /> |
3 | 3 | </p>
|
4 | 4 |
|
5 |
| -<h1 align="center"> |
6 |
| -data-diff |
| 5 | +<h1 align="left"> |
| 6 | +data-diff: compare datasets fast, within or across SQL databases |
7 | 7 | </h1>
|
8 | 8 |
|
9 |
| -<h2 align="center"> |
10 |
| -Develop dbt models faster by testing as you code. |
11 |
| -</h2> |
12 |
| -<h4 align="center"> |
13 |
| -See how every change to dbt code affects the data produced in the modified model and downstream. |
14 |
| -</h4> |
15 | 9 | <br>
|
16 | 10 |
|
17 |
| -## What is `data-diff`? |
18 | 11 |
|
19 |
| -data-diff is an open source package that you can use to see the impact of your dbt code changes on your dbt models as you code. |
| 12 | +# Use cases |
20 | 13 |
|
21 |
| -<div align="center"> |
| 14 | +## Data Migration & Replication Testing |
| 15 | +Compare source to target and check for discrepancies when moving data between systems: |
| 16 | +- Migrating to a new data warehouse (e.g., Oracle > Snowflake) |
| 17 | +- Converting SQL to a new transformation framework (e.g., stored procedures > dbt) |
| 18 | +- Continuously replicating data from an OLTP DB to OLAP DWH (e.g., MySQL > Redshift) |
22 | 19 |
|
23 |
| - |
24 | 20 |
|
25 |
| -</div> |
| 21 | +Install `data-diff` with specific database adapters, e.g.: |
26 | 22 |
|
27 |
| -<br> |
28 |
| - |
29 |
| -:eyes: **Watch 4-min demo video [here](https://www.loom.com/share/ad3df969ba6b4298939efb2fbcc14cde)** |
30 |
| - |
31 |
| -## Getting Started |
32 |
| - |
33 |
| -**Install `data-diff`** |
34 |
| - |
35 |
| -Install `data-diff` with the command that is specific to the database you use with dbt. |
36 |
| - |
37 |
| -### Snowflake |
38 | 23 | ```
|
39 |
| -pip install data-diff 'data-diff[snowflake,dbt]' -U |
| 24 | +pip install data-diff 'data-diff[postgresql,snowflake ]' -U |
40 | 25 | ```
|
41 |
| - |
42 |
| -### BigQuery |
| 26 | +Run `data-diff` with connection URIs to compare tables: |
43 | 27 | ```
|
44 |
| -pip install data-diff 'data-diff[dbt]' google-cloud-bigquery -U |
| 28 | +data-diff \ |
| 29 | + postgresql://<username>:'<password>'@localhost:5432/<database> \ |
| 30 | + <table> \ |
| 31 | + "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \ |
| 32 | + <TABLE> \ |
| 33 | + -k activity_id \ |
| 34 | + -c activity \ |
| 35 | + -w "event_timestamp < '2022-10-10'" |
45 | 36 | ```
|
| 37 | +Check out [documentation](https://docs.datafold.com/reference/open_source/cli) for full command reference. |
46 | 38 |
|
47 |
| -### Redshift |
48 |
| -``` |
49 |
| -pip install data-diff 'data-diff[redshift,dbt]' -U |
50 |
| -``` |
| 39 | +## Data Development Testing |
| 40 | +Test SQL code and preview changes by comparing development/staging environment data to production: |
| 41 | +1. Make a change to some SQL code |
| 42 | +2. Run the SQL code to create a new dataset |
| 43 | +3. Compare the dataset with its production version or another iteration |
51 | 44 |
|
52 |
| -### Postgres |
53 |
| -``` |
54 |
| -pip install data-diff 'data-diff[postgres,dbt]' -U |
55 |
| -``` |
| 45 | + <p align="left"> |
| 46 | + <img alt="dbt" src="https://seeklogo.com/images/D/dbt-logo-E4B0ED72A2-seeklogo.com.png" width="10%" /> |
| 47 | + </p> |
| 48 | + |
| 49 | +`data-diff` integrates with dbt Core and dbt Cloud to seamlessly compare local development to production datasets. |
56 | 50 |
|
57 |
| -### Databricks |
58 |
| -``` |
59 |
| -pip install data-diff 'data-diff[databricks,dbt]' -U |
60 |
| -``` |
61 |
| - |
62 |
| -### DuckDB |
63 |
| -``` |
64 |
| -pip install data-diff 'data-diff[duckdb,dbt]' -U |
65 |
| -``` |
66 |
| - |
67 |
| -**Update a few lines in your `dbt_project.yml`**. |
68 |
| -``` |
69 |
| -#dbt_project.yml |
70 |
| -vars: |
71 |
| - data_diff: |
72 |
| - prod_database: my_database |
73 |
| - prod_schema: my_default_schema |
74 |
| -``` |
75 |
| - |
76 |
| -**Run your first data diff!** |
77 |
| - |
78 |
| -``` |
79 |
| -dbt run && data-diff --dbt |
80 |
| -``` |
| 51 | +:eyes: **Watch [4-min demo video](https://www.loom.com/share/ad3df969ba6b4298939efb2fbcc14cde)** |
81 | 52 |
|
82 |
| -We recommend you get started by walking through [our simple setup instructions](https://docs.datafold.com/development_testing/open_source) which contain examples and details. |
| 53 | +**[Get started with data-diff & dbt](https://docs.datafold.com/development_testing/open_source)** |
83 | 54 |
|
84 |
| -Please reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) if you have any trouble whatsoever getting started! |
| 55 | +Reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) for advice and support |
85 | 56 |
|
86 |
| -<br><br> |
| 57 | +## Supported databases |
87 | 58 |
|
88 |
| -### Diffing between databases |
| 59 | +- PostgreSQL >=10 |
| 60 | +- MySQL |
| 61 | +- Snowflake |
| 62 | +- BigQuery |
| 63 | +- Redshift |
| 64 | +- Oracle |
| 65 | +- Presto |
| 66 | +- Databricks |
| 67 | +- Trino |
| 68 | +- Clickhouse |
| 69 | +- Vertica |
| 70 | +- DuckDB >=0.6 |
| 71 | +- SQLite (coming soon) |
89 | 72 |
|
90 |
| -Check out our [documentation](https://docs.datafold.com/reference/open_source/cli) if you're looking to compare data across databases (for example, between Postgres and Snowflake). |
91 | 73 |
|
92 | 74 | <br>
|
93 | 75 |
|
|
0 commit comments