Add API for Creating Variant Values #7452

PinkCrow007 · 2025-04-29T00:10:14Z

Which issue does this PR close?

Variant: Rust API to Create Variant Values #7424

Rationale for this change

This PR implements a builder-style API for creating Variant values in Rust, following the Variant binary encoding specification. It supports reusing metadata within a single builder session, as discussed in the issue.

What changes are included in this PR?

Add Variants as a canonical Arrow Extension Type (built on top of Struct type).
Introduce VariantBuilder, ObjectBuilder, and ArrayBuilder for constructing Variant values.

Are there any user-facing changes?

This PR adds a new public API for programmatically creating Variant-encoded values.
No breaking changes.

CC: @alamb for visibility

alamb · 2025-04-30T01:44:28Z

Thank you so much for this @PinkCrow007 -- I have seen it and plan to review it, but may not have a chance for another day or two. So exciting!@

Weijun-H

Impressive work! Thanks @PinkCrow007 👍

arrow-variant/src/encoder/mod.rs

Weijun-H · 2025-04-30T05:28:32Z

arrow-variant/src/encoder/mod.rs

+/// Encodes a date value (days since epoch)
+pub fn encode_date(value: i32, output: &mut Vec<u8>) {
+    // Use primitive + date type
+    let header = primitive_header(VariantPrimitiveType::Date as u8);
+    output.push(header);
+    output.extend_from_slice(&value.to_le_bytes());
+}
+
+/// Encodes a timestamp value (milliseconds since epoch)
+pub fn encode_timestamp(value: i64, output: &mut Vec<u8>) {
+    // Use primitive + timestamp type
+    let header = primitive_header(VariantPrimitiveType::Timestamp as u8);
+    output.push(header);
+    output.extend_from_slice(&value.to_le_bytes());
+}
+
+/// Encodes a timestamp without timezone value (milliseconds since epoch)
+pub fn encode_timestamp_ntz(value: i64, output: &mut Vec<u8>) {
+    // Use primitive + timestamp_ntz type
+    let header = primitive_header(VariantPrimitiveType::TimestampNTZ as u8);
+    output.push(header);
+    output.extend_from_slice(&value.to_le_bytes());
+}
+
+/// Encodes a time without timezone value (milliseconds)
+pub fn encode_time_ntz(value: i64, output: &mut Vec<u8>) {
+    // Use primitive + time_ntz type
+    let header = primitive_header(VariantPrimitiveType::TimeNTZ as u8);
+    output.push(header);
+    output.extend_from_slice(&value.to_le_bytes());
+}


These functions are quite similar. To reduce duplication, could we create a more general encoder function, like encode_general(type_id: VariantPrimitiveType, value: i64, output: &mut Vec<u8>)? Or create a encoder trait for VariantPrimitiveType

That is a good point

I also think they don't need to be pub (maybe we could start with pub(crate)) as the main API people would use is the builder I think

Thanks @Weijun-H and @alamb ! I've refactored the primitive encoding using an Encoder trait for VariantPrimitiveType and set visibility to pub(crate). How does this overall design look to you now?

Thanks @PinkCrow007 -- will check it out tomorrow

arrow-variant/src/encoder/mod.rs

arrow-schema/src/extension/canonical/variant.rs

alamb

Thank you @PinkCrow007 -- this is very very nice work and I think it will form the basis of a great API to create variant values

I had some structural suggestions which I left, but the biggest suggestion is that I think the wonderful tests you have written would be improved significantly if we can change them to read back the Variant values that were written in the builders

Here is what I would like to propose to move forward:

I'll make a new PR with the scaffolding for a parquet-variant crate
I'll port some of the code for reading variant values there

Then I think we could add the builder code and tests you have in this PR and add them to the other crate.

But really this is great work

arrow-variant/src/builder/mod.rs

alamb · 2025-05-02T16:12:34Z

arrow-schema/src/error.rs

@@ -60,6 +60,8 @@ pub enum ArrowError {
    DictionaryKeyOverflowError,
    /// Error when the run end index in a REE array is bigger than the array length
    RunEndIndexOverflowError,
+    /// Error during Variant operations in `arrow-variant`.


If we add a new variant to this enum, it will be a "breaking API change" as then downstream projects would potentially have to update their code to handle new variants

We make releases with API changes every three months,
https://github.com/apache/arrow-rs?tab=readme-ov-file#release-versioning-and-schedule

So in other words, it would be great to remove this change from the PR so we can merge it faster.

alamb · 2025-05-02T16:14:06Z

arrow-schema/src/extension/canonical/mod.rs

@@ -37,6 +37,8 @@ mod uuid;
 pub use uuid::Uuid;
 mod variable_shape_tensor;
 pub use variable_shape_tensor::{VariableShapeTensor, VariableShapeTensorMetadata};
+mod variant;


I recommend we postpone adding the canonical extension type classes until we get farther along in the process and are in a better position to write tests.

In other words I recommend removing the changes in arrow-schema/src/extension/ as well in this pR

arrow-variant/Cargo.toml

alamb · 2025-05-02T16:22:21Z

arrow-variant/Cargo.toml

+arrow-array = { workspace = true }
+arrow-buffer = { workspace = true }
+arrow-cast = { workspace = true, optional = true }
+arrow-data = { workspace = true }
+arrow-schema = { workspace = true, features = ["canonical_extension_types"] }
+serde = { version = "1.0", default-features = false }
+serde_json = { version = "1.0", default-features = false, features = ["std"] }
+indexmap = "2.0.0"


I don't think any of these dependencies are used so we can remove them

arrow-array = { workspace = true } arrow-buffer = { workspace = true } arrow-cast = { workspace = true, optional = true } arrow-data = { workspace = true } serde = { version = "1.0", default-features = false } serde_json = { version = "1.0", default-features = false, features = ["std"] }

alamb · 2025-05-02T16:34:09Z

arrow-variant/src/builder/mod.rs

+        // Verify metadata contains all keys
+        let keys = get_metadata_keys(&metadata_buffer);
+        assert_eq!(keys.len(), 11, "Should have 11 keys in metadata");
+        assert!(keys.contains(&"null".to_string()), "Missing 'null' key");
+        assert!(
+            keys.contains(&"bool_true".to_string()),
+            "Missing 'bool_true' key"
+        );
+        assert!(keys.contains(&"string".to_string()), "Missing 'string' key");
+
+        // Verify object has the correct number of entries
+        // First byte after header is the number of fields (if small object)
+        assert!(value_buffer.len() > 1, "Value buffer too small");
+        let num_fields = value_buffer[1];
+        assert_eq!(num_fields as usize, 11, "Object should have 11 fields");


Rather than testing these "internal" fields I think the tests would be better if they tested that the resulting value is a readable Variant value. See my next comment below

alamb · 2025-05-02T16:34:38Z

arrow-variant/src/builder/mod.rs

+        assert!(!variant.value().is_empty());
+    }
+
+    // =========================================================================


this is a very impressive set of test cases 👌

alamb · 2025-05-02T16:39:00Z

arrow-variant/src/encoder/mod.rs

+/// Encodes a date value (days since epoch)
+pub fn encode_date(value: i32, output: &mut Vec<u8>) {
+    // Use primitive + date type
+    let header = primitive_header(VariantPrimitiveType::Date as u8);
+    output.push(header);
+    output.extend_from_slice(&value.to_le_bytes());
+}
+
+/// Encodes a timestamp value (milliseconds since epoch)
+pub fn encode_timestamp(value: i64, output: &mut Vec<u8>) {
+    // Use primitive + timestamp type
+    let header = primitive_header(VariantPrimitiveType::Timestamp as u8);
+    output.push(header);
+    output.extend_from_slice(&value.to_le_bytes());
+}
+
+/// Encodes a timestamp without timezone value (milliseconds since epoch)
+pub fn encode_timestamp_ntz(value: i64, output: &mut Vec<u8>) {
+    // Use primitive + timestamp_ntz type
+    let header = primitive_header(VariantPrimitiveType::TimestampNTZ as u8);
+    output.push(header);
+    output.extend_from_slice(&value.to_le_bytes());
+}
+
+/// Encodes a time without timezone value (milliseconds)
+pub fn encode_time_ntz(value: i64, output: &mut Vec<u8>) {
+    // Use primitive + time_ntz type
+    let header = primitive_header(VariantPrimitiveType::TimeNTZ as u8);
+    output.push(header);
+    output.extend_from_slice(&value.to_le_bytes());
+}


That is a good point

I also think they don't need to be pub (maybe we could start with pub(crate)) as the main API people would use is the builder I think

alamb · 2025-05-02T16:39:35Z

arrow-variant/src/lib.rs

+// specific language governing permissions and limitations
+// under the License.
+
+//! [`arrow-variant`] contains utilities for working with the [Arrow Variant][format] binary format.


❤️ 📖

alamb · 2025-05-02T16:40:30Z

arrow-variant/src/lib.rs

+/// Builder API for creating variant values
+pub mod builder;
+/// Encoder module for converting values to Variant binary format
+pub mod encoder;


I think we should start with a minimal API surface area (only expose the Builder and Varaint types directly)

Suggested change

/// Builder API for creating variant values

pub mod builder;

/// Encoder module for converting values to Variant binary format

pub mod encoder;

/// Builder API for creating variant values

mod builder;

/// Encoder module for converting values to Variant binary format

mod encoder;

alamb · 2025-05-02T16:44:11Z

Thank yoU @Weijun-H for the review as well

Co-authored-by: Andrew Lamb <[email protected]>

alamb

Thank you @PinkCrow007 -- this is (again) quite amazing. The code you have sketched out in this PR I think will form the basis for all variant processing going forward. Very impressive

What I suggest we should do now is start merging this PR piece by piece into the repo. I have created a PR here to add a skeleton structure we can work on

[Variant] Add (empty) parquet-variant crate, update parquet-testing pin #7485

That PR also updates the datafusion-testing submodule so it contains binary examples so we can test interoperability with Spark

I think the next PR into the repo should add the Variant struct type along with some basic "read the existing binary values" type tests.

alamb · 2025-05-08T18:56:28Z

arrow-variant/src/variant.rs

+use arrow_schema::ArrowError;
+use std::fmt;
+
+/// A Variant value in the Arrow binary format


Technically Variant is part of the Parquet spec, not part of the Arrow spec 🤷

Suggested change

/// A Variant value in the Arrow binary format

/// A Variant value in the Parquet binary format

alamb · 2025-05-08T18:58:21Z

arrow-variant/src/variant.rs

+
+/// A Variant value in the Arrow binary format
+#[derive(Debug, Clone, PartialEq)]
+pub struct Variant<'a> {


This is a good start as it does not copy the values. However I think it may have a few issues:

The lifetimes ('a) are the same for the Value and the Metadata which I think will make sharing metadata across multiple variants potentially tricky

There is no way to use match effectively to switch on variant type. Instead, I think it needs methods like is_object or is_array

As written I think these structures follow a pattern that is more common on Java or C++ (which is fine, but if we are going to make a native Rust library I think it is worth following standard rust Idioms)

I wonder if you considered the structure contemplated here: #7423

Specifically this structure:

/// Variant value. May contain references to metadata and value /// 'a is lifetime for metadata /// 'b is lifetime for value pub enum Variant<'a, 'b> { Variant::Null, Variant::Int8 ... // strings are stored in the value and thus have references to that value Variant::String(&'b str), Variant::ShortString(&'b str), // Objects and Arrays need the metadata and values, so store both. Variant::Object(VariantObject<'a, 'b>), VariantArray(VariantArray<'a, 'b>) } /// Wrapper over Variant Metadata pub struct VariantMetadata<'a> { metadata: &'a[u8], // perhaps access to header fields like dict length and is_sorted } /// Represents a Variant Object with references to the underlying metadata /// and value fields pub enum VariantObject<'a, 'b> { // pointer to metadata metadata: VariantMetadata<'a>, // pointer to value value: &'a [u8], }

A few notes:

Can we use 'm and 'v as self-documenting lifetimes?

String(&'m str) and ShortString(&'v str) have different lifetimes

The enum variants for most types need args. It's probably nicer to track decoded values (i32, f64, etc) rather than slices of small-endian bytes?

Decimal will need some kind of design?

UUID would be handled by Uuid([u8; 16]), because a slice would also take 16 bytes)?

Possible VariantDecimal type?

// NOTE: This should be a sealed trait trait UnscaledDecimalValue: Copy { const MAX_SCALE: u8; } impl UnscaledDecimalValue for i32 { const MAX_SCALE: u8 = 9; // 31*log10(2) } impl UnscaledDecimalValue for i64 { const MAX_SCALE: u8 = 18; // 63*log10(2) } impl UnscaledDecimalValue for i128 { const MAX_SCALE: u8 = 38; // 127*log10(2) } pub struct VariantDecimal<U: UnscaledDecimalValue> { scale: u8, unscaled_value: U, } impl<U: UnscaledDecimalValue> VariantDecimal<U> { pub fn try_new(scale: u8, unscaled_value: U) -> Result<Self, ArrowError> { if scale <= U::MAX_SCALE { Ok(Self { scale, unscaled_value }) } else { Err(...) } } pub fn scale() -> u8 { self.scale } pub fn unscaled_value() -> U { self.unscaled_value } } pub enum Variant<'m, 'v> { ... Decimal4(VariantDecimal<i32>), Decimal8(VariantDecimal<i64>), Decimal16(VariantDecimal<i128>), ... }

alamb · 2025-05-08T19:00:58Z

arrow-variant/src/variant.rs

+    /// Converts the variant value to a serde_json::Value
+    pub fn as_value(&self) -> Result<serde_json::Value, ArrowError> {
+        let keys = crate::decoder::parse_metadata_keys(self.metadata)?;
+        crate::decoder::decode_value(self.value, &keys)


I think we should try and avoid using the JSON representation when decoding Variant values for such a low level API because:

They are likely inefficient (e.g. to access a Variant value it needs to be converted from bytes --> Json --> Variant)

They can't represent the types fully (e.g. there is no way to represent the different between Int and Float in JSON, all numbers are floats)

alamb · 2025-05-16T13:18:28Z

Here is suggested next steps:

[EPIC] [Parquet] Implement Variant type support in Parquet #6736 (comment)

scovich · 2025-05-18T03:43:55Z

arrow-variant/src/variant.rs

+                    if i >= i32::MIN as i64 && i <= i32::MAX as i64 {
+                        return Ok(i as i32);


Why not i32::try_from(i) with a map_err or similar?

I agree using try_from would be a nicer pattern

scovich

Very incomplete review, but hopefully some useful ideas.

scovich · 2025-05-18T03:48:30Z

arrow-variant/src/decoder/mod.rs

+    value: &[u8],
+    metadata: &[u8],
+    key: &str,
+) -> Result<Option<(usize, usize)>, ArrowError> {


Why not return an actual Range<usize>?

scovich · 2025-05-19T11:31:15Z

arrow-variant/src/variant.rs

+
+/// A Variant value in the Arrow binary format
+#[derive(Debug, Clone, PartialEq)]
+pub struct Variant<'a> {


A few notes:

Can we use 'm and 'v as self-documenting lifetimes?

String(&'m str) and ShortString(&'v str) have different lifetimes

The enum variants for most types need args. It's probably nicer to track decoded values (i32, f64, etc) rather than slices of small-endian bytes?

Decimal will need some kind of design?

UUID would be handled by Uuid([u8; 16]), because a slice would also take 16 bytes)?

Possible VariantDecimal type?

// NOTE: This should be a sealed trait trait UnscaledDecimalValue: Copy { const MAX_SCALE: u8; } impl UnscaledDecimalValue for i32 { const MAX_SCALE: u8 = 9; // 31*log10(2) } impl UnscaledDecimalValue for i64 { const MAX_SCALE: u8 = 18; // 63*log10(2) } impl UnscaledDecimalValue for i128 { const MAX_SCALE: u8 = 38; // 127*log10(2) } pub struct VariantDecimal<U: UnscaledDecimalValue> { scale: u8, unscaled_value: U, } impl<U: UnscaledDecimalValue> VariantDecimal<U> { pub fn try_new(scale: u8, unscaled_value: U) -> Result<Self, ArrowError> { if scale <= U::MAX_SCALE { Ok(Self { scale, unscaled_value }) } else { Err(...) } } pub fn scale() -> u8 { self.scale } pub fn unscaled_value() -> U { self.unscaled_value } } pub enum Variant<'m, 'v> { ... Decimal4(VariantDecimal<i32>), Decimal8(VariantDecimal<i64>), Decimal16(VariantDecimal<i128>), ... }

scovich · 2025-05-19T11:33:10Z

arrow-variant/src/variant.rs

+
+impl<'a> Variant<'a> {
+    /// Creates a new Variant with metadata and value bytes
+    pub fn new(metadata: &'a [u8], value: &'a [u8]) -> Self {


This should be new_unchecked?

(but if we made it an enum as suggested above, the method will just go away -- internal code can directly create the desired enum variant if it knows all invariants hold)

scovich · 2025-05-19T12:21:02Z

arrow-variant/src/variant.rs

+    /// Creates a Variant by parsing binary metadata and value
+    pub fn try_new(metadata: &'a [u8], value: &'a [u8]) -> Result<Self, ArrowError> {
+        // Validate that the binary data is a valid Variant
+        decoder::validate_variant(value, metadata)?;


If we make this an enum, then the constructor itself will naturally do most of the validation?

pub fn try_new(metadata: &'m [u8], value: &'v [u8]) -> Result<Self, ArrowError> { use Variant::*; let Some(header) = v.get(0) else { return Err(...); }; let basic_type = header & 0b11; let value_header = header >> 2; let result = match basic_type { 0 => match value_header { 0 => Null, 1 => True, 2 => False, ... 6 => Int64(i64::try_from_le_bytes(v[1..])?), 7 => Double(f64::try_from_le_bytes(v[1..])?), 8 => Decimal4(VariantDecimal4::try_new(metadata, value[1..])?), ... 20 => Uuid(v[1..].try_into_array()?), _ => return Err(...), }, 1 => { let len = usize::from(value_header); let value = value[1..]; if value.len() != len { return Err(...); } ShortString(str::from_utf8(value)?), } 2 => Object(VariantObject::try_new(metadata, value[1..])?), 3 => Array(VariantArray::try_new(metadata, value[1..])?), _ => return Err(...), }; Ok(result) }

with helpers:

// Helper that converts TryFromSliceError into ArrowError fn try_into_array<const N: usize>(bytes: &[u8]) -> Result<[u8; N], ArrowError> { bytes.try_into().map_err(|_| ...) } // Expose the existing family of primitive `from_le_bytes` methods as a trait trait TryFromLittleEndianBytes<const N: usize>: Sized { fn try_from_le_bytes(bytes: &[u8]) -> Result<Self, ArrowError> { Ok(Self::from_le_bytes(try_into_array(bytes)?)) } fn from_le_bytes(bytes: [u8; N]) -> Self; } macro_rules! TryFromLittleEndianBytes { ($ty:ty) => { const _: () = { const N: usize = std::mem::size_of::<$ty>(); impl TryFromLittleEndianBytes<N> for $ty { fn from_le_bytes(bytes: [u8; N]) -> $ty { <$ty>::from_le_bytes(bytes) } } }; }; } TryFromLittleEndianBytes!(i64); TryFromLittleEndianBytes!(f64);

scovich · 2025-05-19T14:11:33Z

arrow-variant/src/variant.rs

I have not reviewed the code carefully at all yet, and what follows is a general observation based on the inherent nature of variant data and rust notions of safety:

It will be really tempting to have "efficient" code that e.g. uses from_utf8_unchecked to extract a &str from a &[u8], or to use indexing operations like v[10] to extract bytes. But variant data is generally untrusted user input and whatever Variant struct/enum we define will become the first -- and often only -- line of defense against malicious or malformed input.

Hopefully we can code carefully, with the goal that sizes and/or contents of metadata and value slices will never cause a panic?

Additinoally, it seems like we have a few choices for values such as strings and decimals even a right-sized byte slice can contain invalid values:

Return obviously unvalidated values, e.g. &[u8] instead of &str for strings, and &[u8] instead of whatever VariantDecimal struct we might otherwise define -- leaving the user responsible to finish the conversion as (un)safely as they deem prudent.

Return ostensibly validated values, with (safe) checked and (unsafe) unchecked constructors and/or getters that let the user choose the one they deem appropriate.

I personally favor the latter approach (safe and easy to use, even if not always the absolutely max efficient), but the topic probably needs a wider discussion.

I agree these are great points -- I moved this comment to #7423 (comment) and will reply there to try and get wider distribution

scovich · 2025-05-19T23:15:47Z

arrow-variant/src/variant.rs

+    }
+
+    /// Converts the variant value to a i64.
+    pub fn as_i64(&self) -> Result<i64, ArrowError> {


Once we introduce an enum version of Variant, we open up the question of type conversions. I guess we would want support for automatic type widening? e.g. something like:

pub fn as_i64(&self) -> Result<Option<i64>, ArrowError> { use Variant::*; let val = match self { Null => return Ok(None), Int64(val) => val, Int32(val) => val.into(), Int16(val) => val.into(), Int8(val) => val.into(), Decimal4(d) if d.scale() == 0 => d.unscaled_value().into(), Decimal8(d) if d.scale() == 0 => d.unscaled_value(), _ => return Err(...), }; Ok(Some(val)) } pub fn as_f64(&self) -> Result<Option<f64>, ArrowError> { use Variant::*; let val = match self { Null => return Ok(None), Int32(val) => val.into(), Int16(val) => val.into(), Int8(val) => val.into(), Decimal4(d) => d.unscaled_value().into(), Float(val) => val.into(), _ => return Err(...), }; Ok(Some(val)) } pub fn as_decimal16(&self, scale: u8) -> Result<Option<VariantDecimal16>, ArrowError> { use Variant::*; let (old_scale, unscaled_value) = match self { Null => return Ok(None), Decimal16(d) if d.scale() <= scale => (d.scale(), d.unscaled_value()), Decimal8(d) if d.scale() <= scale => (d.scale(), d.unscaled_value().into()), Decimal4(d) if d.scale() <= scale => (d.scale(), d.unscaled_value().into()), Int64(val) => (0, val.into()), Int32(val) => (0, val.into()), Int16(val) => (0, val.into()), Int8(val) => (0, val.into()), _ => return Err(...), }; Ok(Some(VariantDecimal16::try_new(scale, old_scale, unscaled_value)?)) }

The above assumes something like:

impl VariantDecimal16 { fn try_new(scale: u8, current_scale: u8, unscaled_value: i128) -> Result<Self, ArrowError> { if scale > 38 || current_scale > 38 || scale < current_scale { return Err(...); } let exponent = u32::from(scale - current_scale); let (unscaled_value, false) = unscaled_value.overflowing_pow(exponent) else { return Err(...); }; Self { scale, unscaled_value } } }

A couple corrections to the above:

We can't actually widen Decimal16 to Decimal16 of a different scale, without also knowing the precision (which variant doesn't track).

The suggested VariantDecimal16::try_new had a bug:
let exponent = u32::from(scale - current_scale); let (unscaled_value, false) = unscaled_value.overflowing_pow(exponent) else { return Err(...); };
should be something like:
let exponent = u32::from(scale - current_scale); let (exponent, false) = i128::overflowing_pow(10, exponent) else { return Err(...); }; unscaled_value *= exponent;

scovich · 2025-05-20T17:31:26Z

arrow-variant/src/variant.rs

+    }
+
+    /// Converts the variant value to a f64.
+    pub fn as_f64(&self) -> Result<f64, ArrowError> {


Question about casting/converting values -- what semantics do we want when requesting a specific type such as f64? From strictest to loosest:

type extraction - only return values that were actually encoded as variant Double

type widening casts - additionally convert and return values of narrower types (e.g. variant Float or Int8)

value widening casts - additionally convert and narrower values of wider types (e.g. f64 can exactly represent Int64(1), but not Int64(18014398509481985))

narrowing casts - additionally convert wider values of wider types, with information loss (e.g. Int64(18014398509481985) becomes 1.8014398509481984e+16)

converting casts - additionally attempt to convert/parse values of unrelated types (e.g. ShortString("10") becomes Double(10.0)).

Using an enum Variant already allows extraction by matching. Narrowing and converting casts seem a bit too dangerous/unpredictable to build in at such a low level. That leaves type widening and value widening. Type widening is pretty cheap and convenient; value widening is even more convenient but not as cheap (requires branching, because some values cannot be converted while others can).

This PR is currently using value widening, which arguably matches JSON parsing most closely given that JSON only defines one numeric type. The RFC actually recommends treating all numbers as IEEE 754 double, which includes limiting integer precision to 54 bit signed... but we can do a bit better using a combination of Int64, Decimal16, and Double).

Do others have thoughts/opinions here?

I personally think this is the least surprising and most Rust-ideomatic:

type extraction - only return values that were actually encoded as variant Double

If someone wants to widen the types they could pretty easily implement functions to do so like

fn variant_as_f64(var: &Variant) -> Result<f64> { match var { Variant::F32(f) -> Ok(f as f64), Variant::F64(f) -> Ok(f) Variant::I64(f) -> Ok(f as f64) ... }

Though I could see the value of providing such a function on the Variant enum directly 🤔

alamb · 2025-05-23T18:35:31Z

FYI @PinkCrow007 , @mapleFU points out some interesting design ideas for Variant builders in discussions here:

[C++][Parquet] Encoding tools for variant type arrow#46555 (comment)

PinkCrow007 added 6 commits April 23, 2025 15:07

create variant.rs

bebb0da

create variant api draft

2fb0ab5

reuse encoder in builder

ecd618d

correct encoding

163a439

make encoder the core binary serialization logic library

1261027

clean code

13f4c10

github-actions bot added the arrow Changes to the arrow crate label Apr 29, 2025

alamb mentioned this pull request Apr 29, 2025

Weekly Plan: Andrew Lamb 2025-04-28 apache/datafusion#15880

Closed

26 tasks

fix format

c6c570c

PinkCrow007 force-pushed the variant-create-api branch from 44ddac2 to c6c570c Compare April 29, 2025 14:58

Weijun-H reviewed Apr 30, 2025

View reviewed changes

arrow-schema/src/extension/canonical/variant.rs Outdated Show resolved Hide resolved

alamb reviewed May 2, 2025

View reviewed changes

PinkCrow007 and others added 6 commits May 5, 2025 15:17

Remove magic numbers (per review)

acbb73c

Update link in comment

b04d9b5

Refactor encoder using trait to reduce redundancy

a97b31d

remove VariantError from ArrowError

aeefd1e

add unit tests in encoder

1971bf3

Update arrow-variant/Cargo.toml

df17d27

Co-authored-by: Andrew Lamb <[email protected]>

alamb mentioned this pull request May 7, 2025

Weekly Plan: Andrew Lamb 2025-05-05 apache/datafusion#15943

Closed

23 tasks

PinkCrow007 added 2 commits May 8, 2025 01:07

Add get() and decoder for readable Variant inspection (initial)

a053cf5

fix format

0eb8db5

github-actions bot removed the arrow Changes to the arrow crate label May 8, 2025

alamb reviewed May 8, 2025

View reviewed changes

This was referenced May 8, 2025

[EPIC] [Parquet] Implement Variant type support in Parquet #6736

Open

Weekly Plan: Andrew Lamb 2025-05-12 apache/datafusion#16022

Closed

Merge branch 'apache:main' into variant-create-api

5ee0fbc

alamb mentioned this pull request May 15, 2025

primitive_int64.value maybe an int32 type apache/parquet-testing#82

Closed

scovich reviewed May 18, 2025

View reviewed changes

scovich reviewed May 19, 2025

View reviewed changes

alamb mentioned this pull request May 20, 2025

Variant: Rust API to Read Variant Values #7423

Open

scovich reviewed May 20, 2025

View reviewed changes

mkarbo mentioned this pull request May 21, 2025

Foundation of API for reading Variant data and metadata #7535

Open

	/// A Variant value in the Arrow binary format
	/// A Variant value in the Parquet binary format

		if i >= i32::MIN as i64 && i <= i32::MAX as i64 {
		return Ok(i as i32);

Add API for Creating Variant Values #7452

Are you sure you want to change the base?

Add API for Creating Variant Values #7452

Uh oh!

Conversation

PinkCrow007 commented Apr 29, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

alamb commented Apr 30, 2025

Uh oh!

Weijun-H left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented May 2, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented May 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scovich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

scovich May 20, 2025 •

edited

Loading