Skip to content

Update readme #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 82 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,13 +140,16 @@ logging.info(ci.configuration.parameters[SOME_PARAMETER])

## Processing input tables - Manifest vs I/O mapping

Input and output tables specified by user are listed in the [configuration file](/extend/common-interface/config-file/).
Apart from that, all input tables provided by user also include manifest file with additional metadata.
All input tables provided by user include manifest file with additional metadata.

Tables and their manifest files are represented by the `keboola.component.dao.TableDefinition` object and may be loaded
using the convenience method `get_input_tables_definitions()`. The result object contains all metadata about the table,
such as manifest file representations, system path and name.

Apart from that, input and output tables specified by user are listed in
the [configuration file](/extend/common-interface/config-file/).
In most cases it is not recommended to use this option, as it is not compatible with Processors and component chaining.

### Manifest & input folder content

```python
Expand All @@ -167,7 +170,10 @@ logging.info(f'The first table has following columns defined in the manifest {fi

```

### Using I/O mapping
### (Alternative) Using I/O mapping

**NOTE** This is a legacy option, it is recommended to use the manifest files instead. This is useful only in cases when
you need to access the User configuration of the Input or Output mapping. E.g. in transformation components.

```python
import csv
Expand All @@ -193,7 +199,7 @@ for table in tables:
outDestination = ci.configuration.tables_output_mapping[j]['destination']
```

## I/O table manifests and processing results
## Output Table Manifests and storing tabular results

The component may define
output [manifest files](https://developers.keboola.com/extend/common-interface/manifest-files/#dataouttables-manifests)
Expand Down Expand Up @@ -222,7 +228,8 @@ from keboola.component import dao
ci = CommonInterface()

# create container for the result
result_table = ci.create_out_table_definition('my_new_result_table', primary_key=['id'], incremental=True)
result_table = ci.create_out_table_definition('my_new_result_table', primary_key=['id'], incremental=True,
write_always=False)

# write some content
with open(result_table.full_path, 'w') as result:
Expand All @@ -240,51 +247,72 @@ result_table.table_metadata.add_column_data_type('id', dao.SupportedDataTypes.ST
ci.write_manifest(result_table)
```

### Get input table by name
### Retrieve raw manifest file definition (CommonInterface compatible)

To retrieve the manifest file representation that is compliant with Keboola Connection Common Interface use
the `table_def.get_manifest_dictionary()` method.

```python
from keboola.component import CommonInterface
from keboola.component import dao, CommonInterface

# init the interface
ci = CommonInterface()
table_def = ci.get_input_table_definition_by_name('input.csv')
table_def = ci.create_out_table_definition('test.csv')

# get the manifest file representation
manifest_dict = table_def.get_manifest_dictionary()

```

### Initializing TableDefinition object from the manifest file
## Input Table Manifests and working with input tables

```python
from keboola.component import dao
All input tables can be retrieved via the `get_input_tables_definitions()` method. The result is a list of
`keboola.component.dao.TableDefinition` objects that contain all metadata about the table, such as manifest file
representations, system path and name. In some cases the input tables may not have a manifest file, in such cases the
`TableDefinition` object is initialized with default values.

table_def = dao.TableDefinition.build_from_manifest('data/in/tables/table.csv.manifest')
**NOTE:**

# print table.csv full-path if present:
The input table manifests are different from the output table manifests. The input table manifests are
automatically generated by the Keboola Connection. The output table manifests are created by the component or a user.

print(table_def.full_path)
The `CommonInterface.write_manifest` method can be used to write the input table manifest into the out stage,
however the `stage` attribute needs to be changed explicitly to `out`.
Some of the Input Manifest specific attributes are not supported and will be ignored when storing on out stage.

# rows count
## Get all input tables

print(table_def.rows_count)
```
All input tables can be retrieved via the `get_input_tables_definitions()` method. The result is a list of
`keboola.component.dao.TableDefinition` objects that contain all metadata about the table, such as manifest file
representations, system path and name. In some cases the input tables may not have a manifest file, in such cases the
`TableDefinition` object is initialized with default values.

### Retrieve raw manifest file definition (CommonInterface compatible)
```python
from keboola.component import CommonInterface

To retrieve the manifest file representation that is compliant with Keboola Connection Common Interface use
the `table_def.get_manifest_dictionary()` method.
# init the interface
ci = CommonInterface()
input_tables = ci.get_input_tables_definitions()

```python
from keboola.component import dao
for table in input_tables:
print(table.full_path)

table_def = dao.TableDefinition.build_from_manifest('data/in/tables/table.csv.manifest')
```

# get the manifest file representation
manifest_dict = table_def.get_manifest_dictionary()
### Get input table by name

```python
from keboola.component import CommonInterface

# init the interface
ci = CommonInterface()
table_def = ci.get_input_table_definition_by_name('input.csv')

```

## Processing input files
## Input File Manifests and working with input files

Similarly as tables, files and their manifest files are represented by the `keboola.component.dao.FileDefinition` object
Similarly to tables, files and their manifest files are represented by the `keboola.component.dao.FileDefinition` object
and may be loaded using the convenience method `get_input_files_definitions()`. The result object contains all metadata
about the file, such as manifest file representations, system path and name.

Expand Down Expand Up @@ -333,7 +361,33 @@ logging.info(input_files_by_name['image.jpg'])

```

## Processing state files
## Output File Manifests and storing files

The component may store results also into the [File Storage](https://help.keboola.com/storage/files/).
The library provides methods that simplifies the manifest file creation and allows defining the export options
and metadata of the result file using helper object `FileDefinition`.

## Storing files

```python
from keboola.component import CommonInterface

# init the interface
ci = CommonInterface()

# create metadata container for the result. This file will be stored temporarily with tags 'my_tag' and 'my_tag2'
result_file = ci.create_out_file_definition('my_new_result_file.dat', tags=['my_tag', 'my_tag2'], is_public=False,
is_permanent=False)

with open(result_file.full_path, 'w+') as result:
result.write('something')

ci.write_manifest(result_file)


```

## Component state and State Files

[State files](https://developers.keboola.com/extend/common-interface/config-file/#state-file) can be easily written and
loaded using the `get_state_file()` and `write_state_file()` methods:
Expand Down