-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RecordBatch
normalization (flattening)
#6758
base: main
Are you sure you want to change the base?
Conversation
RecordBatch
normalization (flattening)
… iterative function for `RecordBatch`. Not sure which one is better currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some questions regarding the implementation of this, since the one example from PyArrow doesn't seem to clarify on the edge cases here. Normalizing the Schema seems fairly straight forward to me, I'm just not sure on
- Whether the iterative or recursive approach is better (or something I missed)
- If
DataType::Struct
is the onlyDataType
that requires flattening. To me, it looks like that's the only one that can contained nestedField
s.
(I'm also not sure if I'm missing something with unwrapping like a List<Struct>
)
Any feedback/help would be appreciated!
let field_name = field.name().as_str(); | ||
new_fields = [ | ||
new_fields, | ||
Self::normalizer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's better to have it be recursive or iterative.
(c.clone(), Arc::new(c_field.clone()) as ArrayRef), | ||
])); | ||
|
||
/*let exclamation_field = Arc::new(StructArray::from(vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this should work to me, based on the schema, but I get an error when trying to construct this Field
, not sure what I'm missing here.
error[E0277]: the trait bound
std::sync::Arc<struct_array::StructArray>: array::Arrayis not satisfied --> arrow-array/src/record_batch.rs:1353:27 | 1353 | (one.clone(), Arc::new(one_field.clone()) as ArrayRef), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait
array::Arrayis not implemented for
std::sync::Arc<struct_array::StructArray>| = help: the trait
array::Arrayis implemented for
std::sync::Arc<(dyn array::Array + 'static)>= note: required for the cast from
std::sync::Arc<std::sync::Arc<struct_array::StructArray>>to
std::sync::Arc<(dyn array::Array + 'static)>
Which issue does this PR close?
Closes #6369.
Rationale for this change
Adds normalization (flattening) for
RecordBatch
, with normalization viaSchema
. Based on pandas/pola-rs.What changes are included in this PR?
Are there any user-facing changes?