Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Commit 530d70a

Browse files
authored
Update README.md to cover migration use case
1 parent d5946a7 commit 530d70a

File tree

1 file changed

+49
-67
lines changed

1 file changed

+49
-67
lines changed

README.md

Lines changed: 49 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,75 @@
1-
<p align="center">
2-
<img alt="Datafold" src="https://user-images.githubusercontent.com/1799931/196497110-d3de1113-a97f-4322-b531-026d859b867a.png" width="50%" />
1+
<p align="left">
2+
<img alt="Datafold" src="https://user-images.githubusercontent.com/1799931/196497110-d3de1113-a97f-4322-b531-026d859b867a.png" width="30%" />
33
</p>
44

5-
<h1 align="center">
6-
data-diff
5+
<h1 align="left">
6+
data-diff: compare datasets fast, within or across SQL databases
77
</h1>
88

9-
<h2 align="center">
10-
Develop dbt models faster by testing as you code.
11-
</h2>
12-
<h4 align="center">
13-
See how every change to dbt code affects the data produced in the modified model and downstream.
14-
</h4>
159
<br>
1610

17-
## What is `data-diff`?
1811

19-
data-diff is an open source package that you can use to see the impact of your dbt code changes on your dbt models as you code.
12+
# Use cases
2013

21-
<div align="center">
14+
## Data Migration & Replication Testing
15+
Compare source to target and check for discrepancies when moving data between systems:
16+
- Migrating to a new data warehouse (e.g., Oracle > Snowflake)
17+
- Converting SQL to a new transformation framework (e.g., stored procedures > dbt)
18+
- Continuously replicating data from an OLTP DB to OLAP DWH (e.g., MySQL > Redshift)
2219

23-
![development_testing_gif](https://user-images.githubusercontent.com/1799931/236354286-d1d044cf-2168-4128-8a21-8c8ca7fd494c.gif)
2420

25-
</div>
21+
Install `data-diff` with specific database adapters, e.g.:
2622

27-
<br>
28-
29-
:eyes: **Watch 4-min demo video [here](https://www.loom.com/share/ad3df969ba6b4298939efb2fbcc14cde)**
30-
31-
## Getting Started
32-
33-
**Install `data-diff`**
34-
35-
Install `data-diff` with the command that is specific to the database you use with dbt.
36-
37-
### Snowflake
3823
```
39-
pip install data-diff 'data-diff[snowflake,dbt]' -U
24+
pip install data-diff 'data-diff[postgresql,snowflake ]' -U
4025
```
41-
42-
### BigQuery
26+
Run `data-diff` with connection URIs to compare tables:
4327
```
44-
pip install data-diff 'data-diff[dbt]' google-cloud-bigquery -U
28+
data-diff \
29+
postgresql://<username>:'<password>'@localhost:5432/<database> \
30+
<table> \
31+
"snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \
32+
<TABLE> \
33+
-k activity_id \
34+
-c activity \
35+
-w "event_timestamp < '2022-10-10'"
4536
```
37+
Check out [documentation](https://docs.datafold.com/reference/open_source/cli) for full command reference.
4638

47-
### Redshift
48-
```
49-
pip install data-diff 'data-diff[redshift,dbt]' -U
50-
```
39+
## Data Development Testing
40+
Test SQL code and preview changes by comparing development/staging environment data to production:
41+
1. Make a change to some SQL code
42+
2. Run the SQL code to create a new dataset
43+
3. Compare the dataset with its production version or another iteration
5144

52-
### Postgres
53-
```
54-
pip install data-diff 'data-diff[postgres,dbt]' -U
55-
```
45+
<p align="left">
46+
<img alt="dbt" src="https://seeklogo.com/images/D/dbt-logo-E4B0ED72A2-seeklogo.com.png" width="10%" />
47+
</p>
48+
49+
`data-diff` integrates with dbt Core and dbt Cloud to seamlessly compare local development to production datasets.
5650

57-
### Databricks
58-
```
59-
pip install data-diff 'data-diff[databricks,dbt]' -U
60-
```
61-
62-
### DuckDB
63-
```
64-
pip install data-diff 'data-diff[duckdb,dbt]' -U
65-
```
66-
67-
**Update a few lines in your `dbt_project.yml`**.
68-
```
69-
#dbt_project.yml
70-
vars:
71-
data_diff:
72-
prod_database: my_database
73-
prod_schema: my_default_schema
74-
```
75-
76-
**Run your first data diff!**
77-
78-
```
79-
dbt run && data-diff --dbt
80-
```
51+
:eyes: **Watch [4-min demo video](https://www.loom.com/share/ad3df969ba6b4298939efb2fbcc14cde)**
8152

82-
We recommend you get started by walking through [our simple setup instructions](https://docs.datafold.com/development_testing/open_source) which contain examples and details.
53+
**[Get started with data-diff & dbt](https://docs.datafold.com/development_testing/open_source)**
8354

84-
Please reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) if you have any trouble whatsoever getting started!
55+
Reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) for advice and support
8556

86-
<br><br>
57+
## Supported databases
8758

88-
### Diffing between databases
59+
- PostgreSQL >=10
60+
- MySQL
61+
- Snowflake
62+
- BigQuery
63+
- Redshift
64+
- Oracle
65+
- Presto
66+
- Databricks
67+
- Trino
68+
- Clickhouse
69+
- Vertica
70+
- DuckDB >=0.6
71+
- SQLite (coming soon)
8972

90-
Check out our [documentation](https://docs.datafold.com/reference/open_source/cli) if you're looking to compare data across databases (for example, between Postgres and Snowflake).
9173

9274
<br>
9375

0 commit comments

Comments
 (0)