Skip to content

fix: Panic in pretty_format function when displaying DurationSecondsA… #7534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 26, 2025

Conversation

zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented May 20, 2025

…rray with i64::MIN / i64::MAX

Which issue does this PR close?

Closes #7533

Rationale for this change

Avoid panic when printing invalid durations

Also added testing

 #[test]
    fn duration_pretty_and_iso_extremes() {
        // Build [MIN, MAX, 3661, NULL]
        let arr = DurationSecondArray::from(vec![Some(i64::MIN), Some(i64::MAX), Some(3661), None]);
        let array: ArrayRef = Arc::new(arr);

        // Pretty formatting
        let opts = FormatOptions::default().with_null("<NULL>");
        let opts = opts.with_duration_format(DurationFormat::Pretty);
        let pretty = pretty_format_columns_with_options("pretty", &[array.clone()], &opts)
            .unwrap()
            .to_string();
        assert_eq!(pretty.matches("<invalid>").count(), 2);
        assert!(pretty.contains("0 days 1 hours 1 mins 1 secs"));
        assert!(pretty.contains("<NULL>"));

        // ISO8601 formatting
        let opts_iso = FormatOptions::default()
            .with_null("<NULL>")
            .with_duration_format(DurationFormat::ISO8601);
        let iso = pretty_format_columns_with_options("iso", &[array], &opts_iso)
            .unwrap()
            .to_string();
        assert_eq!(iso.matches("<invalid>").count(), 2);
        assert!(iso.contains("PT3661S"));
        assert!(iso.contains("<NULL>"));
    }

What changes are included in this PR?

  1. Avoid panics when printing invalid durations
  2. Deprecate functions that internally panic, provide functions that return Option instead

Support panic handling and print invalid result.

Are there any user-facing changes?

Support panic handling and print invalid result.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuqi-lucas for this contribution ❤️

@@ -215,28 +215,28 @@ pub(crate) fn split_second(v: i64, base: i64) -> (i64, u32) {
(v.div_euclid(base), v.rem_euclid(base) as u32)
}

/// converts a `i64` representing a `duration(s)` to [`Duration`]
/// converts a `i64` representing a `duration(s)` to [`Option<Duration>`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these methods are part of the public API : https://docs.rs/arrow/latest/arrow/array/temporal_conversions/fn.duration_s_to_duration.html

Thus we can't change them until the next major release (July 2025)

Option 1:

We could leave the original signatures and deprecate the two that are actually fallible per https://github.com/apache/arrow-rs?tab=readme-ov-file#deprecation-guidelines

And we can then add new functions like try_ for the two fallible conversions:

#[inline]
#[deprecated(since = "55.2.0", note = "Use `try_duration_s_to_duration` instead")]
pub fn duration_s_to_duration(v: i64) -> Duration {
    Duration::try_seconds(v).unwrap()
}

#[inline]
pub fn try_duration_s_to_duration(v: i64) -> Option<Duration> {
    Duration::try_seconds(v)
}

And similarly for milliseconds

Option 2: Reexport Duration

Since these are thin wrappers over Chrono anyways, we could simply re-export Duration 🤔

// Reexport Duration for use by downstream libraries
pub use chrono::Duration;

#[inline]
#[deprecated(since = "55.2.0", note = "Use `Duration::try_seconds` instead")]
pub fn duration_s_to_duration(v: i64) -> Duration {
    Duration::try_seconds(v).unwrap()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb for good suggestions! Addressed the option1 in latest PR, i think it's more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuqi-lucas -- I have a suggestion here zhuqi-lucas#2

I think it would be nice to avoid having to use try_duration_us_to_duration and handle a None when that function can not return None. Let me know what you think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb , i tried this way, but it will make the following function failed with match fmt with Some():

macro_rules! duration_display {
    ($convert:ident, $t:ty, $scale:tt) => {
        impl<'a> DisplayIndexState<'a> for &'a PrimitiveArray<$t> {
            type State = DurationFormat;

            fn prepare(&self, options: &FormatOptions<'a>) -> Result<Self::State, ArrowError> {
                Ok(options.duration_format)
            }

            fn write(&self, fmt: &Self::State, idx: usize, f: &mut dyn Write) -> FormatResult {
                let v = self.value(idx);
                match fmt {
                    DurationFormat::ISO8601 => match $convert(v) {
                        Some(td) => write!(f, "{}", td)?,
                        None => write!(f, "<invalid>")?,
                    },
                    DurationFormat::Pretty => match $convert(v) {
                        Some(_) => duration_fmt!(f, v, $scale)?,
                        None => write!(f, "<invalid>")?,
                    },
                }
                Ok(())
            }
        }
    };
}

The original one will always pass option:

duration_display!(try_duration_s_to_duration, DurationSecondType, 0);
duration_display!(try_duration_ms_to_duration, DurationMillisecondType, 3);
duration_display!(try_duration_us_to_duration, DurationMicrosecondType, 6);
duration_display!(try_duration_ns_to_duration, DurationNanosecondType, 9);

Or, we can make the macro support both cases, if we want to remove option for those don't need:

macro_rules! duration_display {
    //–– converters that return Option<Duration> ––
    ($convert:ident, $t:ty, $scale:tt, option) => {
        impl<'a> DisplayIndexState<'a> for &'a PrimitiveArray<$t> {
            type State = DurationFormat;

            fn prepare(&self, options: &FormatOptions<'a>) -> Result<Self::State, ArrowError> {
                Ok(options.duration_format)
            }

            fn write(&self, fmt: &Self::State, idx: usize, f: &mut dyn Write) -> FormatResult {
                let v = self.value(idx);
                match fmt {
                    DurationFormat::ISO8601 => {
                        if let Some(td) = $convert(v) {
                            write!(f, "{}", td)?;
                        } else {
                            write!(f, "<invalid>")?;
                        }
                    }
                    DurationFormat::Pretty => {
                        if $convert(v).is_some() {
                            duration_fmt!(f, v, $scale)?;
                        } else {
                            write!(f, "<invalid>")?;
                        }
                    }
                }
                Ok(())
            }
        }
    };

    //–– converters that return plain Duration ––
    ($convert:ident, $t:ty, $scale:tt) => {
        impl<'a> DisplayIndexState<'a> for &'a PrimitiveArray<$t> {
            type State = DurationFormat;

            fn prepare(&self, options: &FormatOptions<'a>) -> Result<Self::State, ArrowError> {
                Ok(options.duration_format)
            }

            fn write(&self, fmt: &Self::State, idx: usize, f: &mut dyn Write) -> FormatResult {
                let v = self.value(idx);
                match fmt {
                    DurationFormat::ISO8601 => {
                        write!(f, "{}", $convert(v))?;
                    }
                    DurationFormat::Pretty => {
                        duration_fmt!(f, v, $scale)?;
                    }
                }
                Ok(())
            }
        }
    };
}

And call it by:

// those four lines at the top of your file become:

// these two return Option<Duration>:
duration_display!(try_duration_s_to_duration, DurationSecondType, 0, option);
duration_display!(try_duration_ms_to_duration, DurationMillisecondType, 3, option);

// these two return Duration:
duration_display!(duration_us_to_duration, DurationMicrosecondType, 6);
duration_display!(duration_ns_to_duration, DurationNanosecondType, 9);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, we can make the macro support both cases, if we want to remove option for those don't need:

This is a great idea. I have tried it in :zhuqi-lucas#3

This comment was marked as outdated.

@@ -37,6 +37,7 @@ all-features = true

[features]
prettyprint = ["comfy-table"]
default = ["prettyprint"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just put the tests in the prettyprint module and avoid having to change any feature flags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in latest PR, thanks @alamb .

let array: ArrayRef = Arc::new(arr);

// Pretty formatting
let opts = FormatOptions::default().with_null("<NULL>");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps these tests can be put into the pretty module directly which would mean we don't have to adjust feature flags

Comment on lines 1265 to 1267
assert_eq!(pretty.matches("<invalid>").count(), 2);
assert!(pretty.contains("0 days 1 hours 1 mins 1 secs"));
assert!(pretty.contains("<NULL>"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to validate this test if you just compared the output entirely

Perhaps you can follow the model of the existing pretty print tests:

let table = pretty_format_batches(&[batch]).unwrap().to_string();
let expected = vec![
"+--------------------------------------------+",
"| d1 |",
"+--------------------------------------------+",
"| 68656c6c6f |",
"| |",
"| 6c6f6e676572207468616e203132206279746573 |",
"| 616e6f74686572207468616e203132206279746573 |",
"| |",
"| 736d616c6c |",
"+--------------------------------------------+",
];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in latest PR, thanks @alamb!

@zhuqi-lucas
Copy link
Contributor Author

Thank you @zhuqi-lucas for this contribution ❤️

Thank you @alamb for review and good suggestions! Addressed in latest PR.

alamb and others added 2 commits May 21, 2025 08:39
Avoid API changes for infallible functions, try 2
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @zhuqi-lucas

@@ -1186,4 +1186,56 @@ mod tests {
let actual: Vec<&str> = batch.lines().collect();
assert_eq!(expected_table, actual, "Actual result:\n{batch}");
}

#[test]
fn duration_pretty_and_iso_extremes() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that would be good would be to extend these tests to cover the other duration TimeUnits (like milliseconds, microseconds and nanoseconds)

However I don't think it is required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb , I think it's a good point, because it seems no other testing cover it, i will create a follow-up PR to improve the test.

@alamb
Copy link
Contributor

alamb commented May 21, 2025

I'll plan to merge this tomorrow unless there are any concerns with

@alamb alamb merged commit 1c380b8 into apache:main May 26, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants