feat: DH-18143: Improve handling of sort order for Iceberg tables #6646

malhotrashivam · 2025-02-13T21:53:18Z

Related to DH-18143

malhotrashivam · 2025-02-14T20:27:17Z

This PR has three commits as follows:

Commit1: If the manifest file indicates that the data file is sorted by specific columns, we should return those column names in TableLocation::getSortedColumns. This can help later for pushdown predicate-style filtering.
Commit2: If a Deephaven table being written is sorted on a column, we should set the corresponding sort order in the manifest file.
Commit3: If the table has a default sort order, we should allow user to opt into the sorting on default sort columns during writing, or opt out.

I am not a fan of Commit 2 for two reasons (please review the PR so the following make more sense):

Commit 2 adds extra complexity to iceberg writing code, which is not needed right now.
Commit 2 can only add a single column name to the data file, since we have a constraint (https://deephaven.atlassian.net/browse/DH-18700). Now a single column name is also added to the parquet file as a sort column. Therefore, even if we don't add the column name to data file, deephaven would still be read as a sort column through the parquet reading code and all tests will pass. Although, other iceberg reading tools will not know about such sort columns if we don't add it to the data file.
So I think we can remove commit 2 from this PR.

devinrsmith · 2025-02-24T15:46:44Z

extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocation.java

+        final Schema schema = sortOrder.schema();
+        final List<SortColumn> sortColumns = new ArrayList<>(sortOrder.fields().size());
+        for (final SortField field : sortOrder.fields()) {
+            final ColumnName columnName = ColumnName.of(schema.findColumnName(field.sourceId()));


This might throw an InvalidNameException; we might need to wait for some Resolver work I'm doing in https://deephaven.atlassian.net/browse/DH-18365 to land so we can properly map field ids.

I would not mark this as resolved yet

extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocation.java

devinrsmith · 2025-02-24T16:05:03Z

extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocation.java

+            if (field.nullOrder() == NullOrder.NULLS_FIRST && field.direction() == SortDirection.ASC) {
+                sortColumn = SortColumn.asc(columnName);
+            } else if (field.nullOrder() == NullOrder.NULLS_LAST && field.direction() == SortDirection.DESC) {
+                sortColumn = SortColumn.desc(columnName);
+            } else {


We should raise the issue of null-first, nulls-last with the engine team. Arguably, this is something we should want to support.

Additionally, we may need to hold of on handling any floating point columns.

-NaN < -Infinity < -value < -0 < 0 < value < Infinity < NaN, https://iceberg.apache.org/spec/#sorting

The -NaN v NaN is something I have not seen before, but another issue to raise w/ engine team.

In the meantime, I think the strategy of breaking and returning what we have so far should be OK.

https://lists.apache.org/thread/sm88n9vshy0zfo8s48mls7qjk0yfjzyt

devinrsmith · 2025-02-24T16:14:29Z

extensions/iceberg/src/main/java/io/deephaven/iceberg/base/IcebergUtils.java

+     * @return A stream of {@link DataFile} objects.
+     */
+    public static Stream<DataFile> allDataFiles(@NotNull final Table table, @NotNull ManifestFile manifestFile) {
+        return toStream(ManifestFiles.read(manifestFile, table.io()));


I recently learned that the files themselves may have metadata, ie org.apache.iceberg.ManifestReader#spec. It makes me want to add caution extending these helper methods too far. While we aren't passing along ManifestReader#spec today, we may need to in the future and might need to model it as appropriate.

extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocation.java

devinrsmith · 2025-02-24T17:45:49Z

extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocation.java

+    private List<SortColumn> computeSortedColumns() {
+        final Integer sortOrderId = dataFile.sortOrderId();
+        // If sort order ID is missing, unknown or unsorted, we fall back to reading sort columns from the parquet file
+        if (sortOrderId == null) {
+            return super.getSortedColumns();
+        }
+        final SortOrder sortOrder = tableAdapter.icebergTable().sortOrders().get(sortOrderId);
+        if (sortOrder == null || sortOrder.isUnsorted()) {
+            return super.getSortedColumns();
+        }


It's an interesting question: if the metadata exists on the file itself, should we prefer it? I can imagine a case where we are setting more specific sort column information in the file itself.

For example, maybe Iceberg knows this table is sorted on columns [A, B], but the parquet metadata gives us more information that it is sorted on columns [A, B, C].

There's also an argument to be made that we should completely ignore the metadata from the file itself, and only rely on Iceberg. This saves us from needing to materialize the parquet file metadata (at least from this code path). In particular, if Iceberg explicitly gives us back sortOrder.isUnsorted(), maybe we should be okay just returning an empty list?

It's also possible that we want this to be configurable... it's not obvious to me what the best course of action is.

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergWriteInstructions.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTableWriter.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/SortOrderProviderInternal.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/TableWriterOptions.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/SortOrderProvider.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTableWriter.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/SortOrderProvider.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/SortOrderProviderInternal.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/SortOrderProvider.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocation.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/SortOrderProviderInternal.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/SortOrderProvider.java

devinrsmith · 2025-03-05T21:30:01Z

extensions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocation.java

+    public List<SortColumn> getSortedColumns() {
+        return sortedColumns == null ? super.getSortedColumns() : sortedColumns;
+    }
+
+    @Nullable
+    private static List<SortColumn> computeSortedColumns(
+            @NotNull final IcebergTableAdapter tableAdapter,
+            @NotNull final DataFile dataFile) {
+        final Integer sortOrderId = dataFile.sortOrderId();


Need to think about behavior when unsorted (either b/c null or explicitly set to unsorted)...

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/SortOrderProvider.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/SchemaProviderInternal.java

extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTableWriter.java