[BUG](python): The order of the table was changed after executing the `compact_files` operation #3465

ZhuochengZhang98 · 2025-02-20T02:26:58Z

Lance Version

pylance 0.20.0

What happened

The order of the dataset was changed after executing the compact_files operation.

My code is as follows:

import lance
from lance.dataset import DatasetOptimizer

DB_PATH = "<path-to-my-dataset>"

def main():
    dataset = lance.dataset(DB_PATH)
    print(dataset.take([0,1,2,3,4,5,6,7,8,9])  # show the first ten elements

    # compact the dataset
    optim = DatasetOptimizer(dataset)
    optim.compact_files(num_threads=8)
    print(dataset.take([0,1,2,3,4,5,6,7,8,9])  # the first ten elements was changed
    return

if __name__ == "__main__":
    main()

My enviroment

I am using a ubuntu server with 64 cores and 512G memory.
The dataset has 5 columns: title(str), section(str), text(str), id(str), and vector(list[float]).

How to reproduce

This dataset has 38 Million records of 768 dim vector and payload. I'm not sure if its feasible to share the dataset.

The text was updated successfully, but these errors were encountered:

westonpace · 2025-02-20T02:57:25Z

If there are enough files to justify multiple concurrent compaction tasks (by default this would mean at least 2Mi uncompacted rows) then we run compaction tasks in parallel.

I'm not sure whether or not we sequence the results but this seems a likely candidate for the reordering.

wjones127 · 2025-02-20T03:04:29Z

In general, we don’t guarantee order of rows stays the same. They will also change order if you update rows.

ZhuochengZhang98 · 2025-02-20T06:34:32Z

Thank you for your reply. I think I have to find another way to take the row I needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG](python): The order of the table was changed after executing the `compact_files` operation #3465

[BUG](python): The order of the table was changed after executing the `compact_files` operation #3465

ZhuochengZhang98 commented Feb 20, 2025

westonpace commented Feb 20, 2025

wjones127 commented Feb 20, 2025

ZhuochengZhang98 commented Feb 20, 2025

[BUG](python): The order of the table was changed after executing the compact_files operation #3465

[BUG](python): The order of the table was changed after executing the compact_files operation #3465

Comments

ZhuochengZhang98 commented Feb 20, 2025

Lance Version

What happened

My enviroment

How to reproduce

westonpace commented Feb 20, 2025

wjones127 commented Feb 20, 2025

ZhuochengZhang98 commented Feb 20, 2025

[BUG](python): The order of the table was changed after executing the `compact_files` operation #3465

[BUG](python): The order of the table was changed after executing the `compact_files` operation #3465