- Plugin type: filter
This plugin allows users to translate rows flexibly by SQL queries specified by them.
This plugin allows translating rows by SQL queries in Pages received from input plugin and sending the query results to next filter or output plugin as modified Pages. It uses Apache Calcite, which is the foundation for your next high-performance database and enbles executing SQL queries to customized storage by the custom adaptor. The plugin applies Page storage adaptor to Apache Calcite and then enables executing SQL queries to Pages via JDBC Driver provided.
Here is Embulk config example for this plugin:
filters:
- type: calcite
query: SELECT * FROM $PAGES
Users can define SELECT
query as query option in the filter config section. $PAGES
represents Pages that input plugin creates and sends. $PAGES
schema is Embulk input schema given. On the other hand, the output schema of the plugin is built from the metadata of query result. Embulk types are converted into Apache Calcite types internally. This is type mapping between Embulk and Apache Calcite.
Embulk type | Apache Calcite type | JDBC type |
---|---|---|
boolean | BOOLEAN | java.lang.Boolean |
long | BIGINT | java.lang.Long |
double | DOUBLE | java.lang.Double |
timestamp | TIMESTAMP | java.sql.Timestamp |
string | VARCHAR | java.lang.String |
json | VARCHAR | java.lang.String |
Data types by Apache Calcite: https://calcite.apache.org/docs/reference.html#data-types
- query: SQL to run (string, required)
- default_timezone: Configure timezone that is used for JDBC connection properties and Calcite engine. This option is one of JDBC connect parameters provided by Apache Calcite. java.util.TimeZone's AvailableIDs can be specified. (string, default: 'UTC')
- options: extra JDBC properties. See JDBC connect parameters. (hash, default: {})
This config enables removing rows not associated to id 1 and 2 from Pages.
filters:
- type: calcite
query: SELECT * FROM $PAGES WHERE id IN (1, 2)
The following is an example by LIKE operator and enables removing rows not matched at a specified pattern from Pages.
filters:
- type: calcite
query: SELECT * FROM $PAGES WHERE message LIKE '%EMBULK%'
This enables adding new column and inserting the value combined 2 string column values.
filters:
- type: calcite
query: SELECT first_name || last_name AS name, * FROM $PAGES
Adds the new column by CURRENT_TIMESTAMP function.
filters:
- type: calcite
query: SELECT CURRENT_TIMESTAMP, * FROM $PAGES
default_timezone: 'America/Los_Angeles'
SQL language provided by Apache Calcite: https://calcite.apache.org/docs/reference.html
$ ./gradlew gem
$ ./gradlew gemPush