Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added info about how to reorder the columns to adjust a schema #1364

Merged
merged 6 commits into from
May 23, 2024
26 changes: 26 additions & 0 deletions docs/website/docs/walkthroughs/adjust-a-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,32 @@ Do not rename the tables or columns in the yaml file. `dlt` infers those from th
You can [adjust the schema](../general-usage/resource.md#adjust-schema) in Python before resource is loaded.
:::

### Reorder columns
To reorder the columns in your dataset, follow these steps:

1. Initial Run: Execute the pipeline to obtain the import and export schemas.
1. Modify Export Schema: Adjust the column order as desired in the export schema.
1. Sync Import Schema: Ensure that these changes are mirrored in the import schema to maintain consistency.
1. Delete Dataset: Remove the existing dataset to prepare for the reload.
1. Reload Data: Reload the data. The dataset should now reflect the new column order as specified in the import YAML.

These steps ensure that the column order in your dataset matches your specifications.

**Another approach** to reorder columns is to use the `add_map` function. For instance, to rearrange ‘column1’, ‘column2’, and ‘column3’, you can proceed as follows:

```py
# Define the data source and reorder columns using add_map
data_source = resource().add_map(lambda row: {
'column3': row['column3'],
'column1': row['column1'],
'column2': row['column2']
})

# Run the pipeline
load_info = pipeline.run(data_source)
```

In this example, the `add_map` function reorders columns by defining a new mapping. The lambda function specifies the desired order by rearranging the key-value pairs. When the pipeline runs, the data will load with the columns in the new order.

### Load data as json instead of generating child table or columns from flattened dicts

Expand Down
Loading