-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] feat: add materialize_scan_results
arrow utility
#621
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #621 +/- ##
==========================================
- Coverage 83.45% 83.44% -0.01%
==========================================
Files 74 75 +1
Lines 16877 16893 +16
Branches 16877 16893 +16
==========================================
+ Hits 14084 14097 +13
+ Misses 2135 2133 -2
- Partials 658 663 +5 ☔ View full report in Codecov by Sentry. |
let batches: Vec<RecordBatch> = | ||
materialize_scan_results(table_changes_scan.execute(engine.clone())?).try_collect()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would work better to define a materialize_scan_result
function that takes DeltaResult<ScanResult>
and returns DeltaResult<RecordBatch>
? It would be a simpler type signature, and would also allow a more natural use at the call site:
let batches: Vec<RecordBatch> = | |
materialize_scan_results(table_changes_scan.execute(engine.clone())?).try_collect()?; | |
let batches: Vec<_> = table_changes_scan | |
.execute(engine.clone())? | |
.map(materialize_scan_result) | |
.try_collect()?; |
(more readable IMO even tho it technically has more lines of code)
let scan_res = res.and_then(|res| Ok((res.full_mask(), res.raw_data?))); | ||
let (mask, data) = scan_res?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let scan_res = res.and_then(|res| Ok((res.full_mask(), res.raw_data?))); | |
let (mask, data) = scan_res?; | |
let res = res? | |
let (mask, data) = (res.full_mask(), res.raw_data?); |
Ok(match mask { | ||
Some(mask) => filter_record_batch(&record_batch, &mask.into())?, | ||
None => record_batch, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is updating a mut record_batch
potentially simpler?
Ok(match mask { | |
Some(mask) => filter_record_batch(&record_batch, &mask.into())?, | |
None => record_batch, | |
}) | |
if let Some(mask) = mask { | |
record_batch = filter_record_batch(&record_batch, &mask.into())?; | |
} | |
Ok(record_batch) |
} | ||
}) | ||
.try_collect()?; | ||
let batches: Vec<RecordBatch> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Type annotation not needed, read_cdf_for_table
return value fully constrains it.
(probably a bunch of other call sites can also simplify -- even e.g. arrow print_batches
above, that takes a slice, still constrains the inner type so we can annotate as Vec<_>
)
What changes are proposed in this pull request?
arrow-compute
that pulls in arrow (for arrow-compute) TODO rename?ScanResult
s into an iterator ofRecordBatch
s by using arrow-compute'sfilter_record_batch
How was this change tested?
existing
resolves #592