Generate fake/test/demo/sample data directly from dbt. dbt_faker is a python model generator for generating data within a dbt project using the Python Faker project.
Include in packages.yml
:
packages:
- git: "https://github.com/dbt-labs/dbt_faker.git"
revision: main
- dbt version >= 1.3
Create the file macro/dbt_faker_source_override.sql
that looks like this:
{% macro source(source_name, table_name) %}
{{ return(dbt_faker.dbt_faker_source(source_name, table_name)) }}
{% endmacro %}
Activate the faker_enabled variable in your project.yml
vars:
faker_enabled: true
including columns and faker_providers, and add the meta config faker_enabled:true
.
version: 2
sources:
- name: tpch
meta:
faker_enabled: true
- name: fake_tpch
tables:
- name: orders
meta:
faker_enabled: true
faker_rows: 250
columns:
- name: o_orderkey
meta:
faker_provider: pyint
- name: o_order_date
meta:
faker_provider: date
Execute the command dbt run-operation generate_faker_model
Create a file (e.g. dbt_faker.py) with the code generated from step #2
For example dbt run -m dbt_faker.py
. This will create a table called fake__source_table for each source you have defined as fake-able
Run the models depending on the fake sources and be amazed
dbt_faker relies on Faker's robust data providers. In order to use them, simply include the name of the provider in the faker_provider
meta tag. A full list of providers is here. Some examples you can use:
- faker_provider.address (48764 Howard Forge Apt. 421 Vanessaside, PA 19763)
- faker_provider.name ( Diego Maradona)
- faker_provider.pyint (1234)
If a fake_provider has not been defined for a column, dbt faker will generate a string by default.
You should check that your sources have:
- Columns defined in the sources.yml
- the meta field faker_enabled: true either at the source name level or source table name level
- the meta field faker_enabled:false not defined at the source table level
You may not be running dbt 1.3, needed to be able to execute dbt python models