-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support datatype struct reorder #4962
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,7 +41,7 @@ use std::sync::Arc; | |
#[derive(Clone, Eq, PartialEq, Ord, PartialOrd, Hash)] | ||
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))] | ||
#[cfg_attr(feature = "serde", serde(transparent))] | ||
pub struct Fields(Arc<[FieldRef]>); | ||
pub struct Fields(Arc<Vec<FieldRef>>); | ||
|
||
impl std::fmt::Debug for Fields { | ||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
|
@@ -50,9 +50,13 @@ impl std::fmt::Debug for Fields { | |
} | ||
|
||
impl Fields { | ||
pub fn new(fields: Arc<Vec<FieldRef>>) -> Self { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We provide |
||
Self(fields) | ||
} | ||
|
||
/// Returns a new empty [`Fields`] | ||
pub fn empty() -> Self { | ||
Self(Arc::new([])) | ||
Self(Arc::new(vec![])) | ||
} | ||
|
||
/// Return size of this instance in bytes. | ||
|
@@ -83,6 +87,16 @@ impl Fields { | |
.zip(other.iter()) | ||
.all(|(a, b)| Arc::ptr_eq(a, b) || a.contains(b)) | ||
} | ||
|
||
pub fn reverse(&mut self) { | ||
let new_fields: Vec<FieldRef> = self.iter().rev().map(|f| f.clone() as FieldRef).collect(); | ||
self.0 = Arc::new(new_fields); | ||
} | ||
Comment on lines
+91
to
+94
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here only support reverse, but should we provide more function like this? |
||
|
||
pub fn push(&mut self, field: Field) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will be extremely inefficient, performing multiple allocations each time, I would recommend using https://docs.rs/arrow-schema/latest/arrow_schema/struct.SchemaBuilder.html instead There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The proposal in this PR will add an additional pointer indirection to field access, whilst still requiring atomics for every call to push. I think the better question is, why not use SchemaBuilder, it will be faster, less complicated, and avoid having to make changes to Fields? TLDR is I don't think any of the changes in this PR should be necessary for #4908 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The problem is that the transformation of Fields into SchemaBuilder appears to be performance intensive. SchemaBuilder::from(self.clone()); The result could be something like this: pub fn push(&mut self, field: Field) {
let mut new_fields = SchemaBuilder::from(self.clone());
new_fields.push(field);
*self = new_fields.finish().fields;
} I am not sure why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm suggesting not adding these methods to Schema and instead using SchemaBuilder instead where you were intending to use them. As you've discovered Schema, like the arrays, is not designed to be mutated in place, especially given it is normally wrapped in an Arc itself as a SchemaRef There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Really thanks for you. ❤️ So we need an function like pub fn push(&mut self, builder: &mut SchemaBuilder, field: Field) { } When the user want to push a new field, should take the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, they'd just use https://docs.rs/arrow-schema/latest/arrow_schema/struct.SchemaBuilder.html#method.push I'm suggesting not trying to make Schema support in place mutation, it isn't designed for it, SchemaBuilder is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SchemaBuilder has support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing, I am suggesting we close this PR, and you work on implementing #4908 using SchemaBuilder |
||
let fields = Arc::make_mut(&mut self.0); | ||
fields.push(Arc::new(field)); | ||
} | ||
} | ||
|
||
impl Default for Fields { | ||
|
@@ -99,7 +113,7 @@ impl FromIterator<Field> for Fields { | |
|
||
impl FromIterator<FieldRef> for Fields { | ||
fn from_iter<T: IntoIterator<Item = FieldRef>>(iter: T) -> Self { | ||
Self(iter.into_iter().collect()) | ||
Self(Arc::new(iter.into_iter().map(|f| f as FieldRef).collect())) | ||
} | ||
} | ||
|
||
|
@@ -117,13 +131,13 @@ impl From<Vec<FieldRef>> for Fields { | |
|
||
impl From<&[FieldRef]> for Fields { | ||
fn from(value: &[FieldRef]) -> Self { | ||
Self(value.into()) | ||
Self(Arc::new(value.to_vec())) | ||
} | ||
} | ||
|
||
impl<const N: usize> From<[FieldRef; N]> for Fields { | ||
fn from(value: [FieldRef; N]) -> Self { | ||
Self(Arc::new(value)) | ||
Self(Arc::new(value.to_vec())) | ||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think change
[FieldRef]
toVec<FieldRef]
is necessaryThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust [] is a fixed size vector in compile time, if we want to push a field, should use ‘Vec’?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary, as a result of unsized coercion - https://doc.rust-lang.org/std/ops/trait.CoerceUnsized.html
https://docs.rs/arrow-schema/latest/arrow_schema/struct.SchemaBuilder.html provides an easy to use interface on top of this, but
Fields
also implementsFromIterator
and so can be collected into directly