Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce Iceberg table type using metadata file #758

Merged
merged 5 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ message ReadRel {
LocalFiles local_files = 6;
NamedTable named_table = 7;
ExtensionTable extension_table = 8;
IcebergTable iceberg_table = 9;
}

// A base table. The list of string is used to represent namespacing (e.g., mydb.mytable).
Expand All @@ -123,6 +124,29 @@ message ReadRel {
substrait.extensions.AdvancedExtension advanced_extension = 10;
}

// Read an Iceberg Table
message IcebergTable {
oneof table_type {
MetadataFileRead direct = 1;
// future: add catalog table types (e.g. rest api, latest metadata in path, etc)
}

// Read an Iceberg table using a metadata file. Implicit assumption: required credentials are already known by plan consumer.
message MetadataFileRead {
// the specific uri of a metadata file (e.g. s3://mybucket/mytable/<ver>-<uuid>.metadata.json)
jacques-n marked this conversation as resolved.
Show resolved Hide resolved
string metadata_uri = 1;

// snapshot options. if none set, uses the current snapshot listed in the metadata file
oneof snapshot {
// the snapshot id to read.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Other sections capitalize the first letter of a comment.

string snapshot_id = 2;

// the timestamp that should be used to select the snapshot (Time passed in microseconds since 1970-01-01 00:00:00.000000 in UTC)
int64 snapshot_timestamp = 3;
}
}
}

// A table composed of expressions.
message VirtualTable {
repeated Expression.Literal.Struct values = 1 [deprecated = true];
Expand Down
14 changes: 14 additions & 0 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,20 @@ possible approach is that a chunk should only be read if the midpoint of the chu
%%% proto.algebra.ReadRel %%%
```

#### Iceberg Table Type

A Iceberg Table is a table built on [Apache Iceberg](https://iceberg.apache.org/). Iceberg tables can be read by either directly reading a [metadata file](https://iceberg.apache.org/spec/#table-metadata) or by consulting a [catalog](https://iceberg.apache.org/concepts/catalog/).

##### Metadata File Reading

Points to an [Iceberg metadata file](https://iceberg.apache.org/spec/#table-metadata) and uses that as a starting point for reading an Iceberg table. This is the simplest form of Iceberg table access but should be limited to use for reads. (Writes often also need to update an external catalog.)

| Property | Description | Required |
| -------- | ---------------------------------------------------------------- | ----------------------- |
| metadata_uri | A URI for an Iceberg metadata file. This current snapshot will be read from this file. | Required |
| snapshot_id | The snapshot that should be read using id. If not provided, the current snapshot is read. Only one of snapshot_id or snapshot_timestamp should be set. | Optional |
| snapshot_timestamp | The snapshot that should be read using timestamp. If not provided, the current snapshot is read. | Optional |


## Filter Operation

Expand Down
Loading