Skip to content

Commit 5fbea37

Browse files
authoredDec 3, 2024··
Early return for num_shards==0 in the Beam pipeline. (#778)
1 parent f533cee commit 5fbea37

File tree

1 file changed

+4
-0
lines changed
  • python/mlcroissant/mlcroissant/_src/operation_graph

1 file changed

+4
-0
lines changed
 

‎python/mlcroissant/mlcroissant/_src/operation_graph/execute.py

+4
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,10 @@ def execute_operations_in_beam(
195195
enumerate(files)
196196
)
197197
num_shards = len(files)
198+
if not num_shards:
199+
raise ValueError(
200+
f"Empty {record_set=}. No files found for filters={json.dumps(filters)}"
201+
)
198202

199203
# We don't know in advance the number of records per shards. So we just allocate the
200204
# maximum number which is `sys.maxsize // num_shards`. Taking the practical case of

0 commit comments

Comments
 (0)
Please sign in to comment.