feat: add decimal argument support to round function #713

andrew-coleman · 2024-09-26T15:43:15Z

The round function has a number of variants to support different numeric types. This commit adds support for rounding decimals. This is required for the spark module.

EpsilonPrime · 2024-10-01T06:44:10Z

extensions/functions_rounding.yaml

+              and this value cannot be exactly represented, this specifies how
+              to round it.
+
+                - TIE_TO_EVEN: round to nearest value; if exactly halfway, tie


Can this happen with decimal representations? I'd argue all of the floating point handling stuff here does not apply.

Why would these not apply? For example, the input value 2.5 could be represented exactly in a decimal type, but rounding it to the nearest integer would result in 2 if the rounding mode is TIE_TO_EVEN or 3 if the mode is TIE_AWAY_FROM_ZERO.

EpsilonPrime · 2024-10-01T06:45:15Z

extensions/functions_rounding.yaml

@@ -268,3 +268,43 @@ scalar_functions:
              AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
        nullability: DECLARED_OUTPUT
        return: fp64?
+      - args:


For all other decimal functionality we have placed them in _decimal.yaml files. Not sure if we want to have just this one function in a file by itself though.

EpsilonPrime · 2024-10-01T06:46:56Z

extensions/functions_rounding.yaml

+
+              When `s` is a negative number, the rounding is
+              performed to the left side of the decimal point
+              as specified by `s`.


Does this operation affect the scale? We should probably clarify that here.

I guess this function could return a different decimal type (i.e. reduce the precision and the scale parameters), but I was working on the assumption that it would just return a different value. I'm not sure if that is what you are asking.

Let's not assume. Let's add some expected behaviors here. Once we get tests inside core, we can transplant those into test cases. And if this is the behaviors of spark we're trying to match, we shouldn't probably just put this in a spark function file (or name it spark_round here). Decimal behavior is often quite different between different systems.

andrew-coleman · 2025-02-06T09:45:14Z

Just revisiting this. I've updated the parameters of the decimal return type to match the logic used by the Spark round functions:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1492

I note the comment here: #671 (comment)
Does this still apply? If so, how would you suggest I work around this since the output decimal scale depends on that second argument?

Thanks!

EpsilonPrime

I don't think we can use any constants in the calculation. The s parameter in round could be not a constant although I suspect some backends only will allow constants here. If it is not a constant then we don't know a lot. If the number turns out to be negative the scale could even increase. Because that second value could change the value in any manner we likely have to return maximum scale and precision here.

I will take it as an action to try running these tests against a backend (as the consumer testing does) to see if the tests are functional/correct.

EpsilonPrime · 2025-02-14T07:02:35Z

tests/cases/rounding_decimal/round_decimal.test

+
+# negative_rounding: Examples with negative rounding
+round(2::dec<2,0>, -2::i32) = 0::dec<2,0>
+round(123::dec<2,0>, -2::i32) = 100::dec<2,0>


there are three digits here so the precision needs to be 3

The last two lines should be:

round(123::dec<3,0>, -2::i32) = 100::dec<3,0>
round(8793::dec<4,0>, -2::i32) = 8800::dec<4,0>

EpsilonPrime · 2025-02-21T02:44:48Z

tests/cases/rounding_decimal/ceil_decimal.test

Since this file is the rounding_decimal directory I'd name this file ceil.test.

EpsilonPrime · 2025-02-21T02:47:54Z

tests/cases/rounding_decimal/round_decimal.test

+
+# negative_rounding: Examples with negative rounding
+round(2::dec<2,0>, -2::i32) = 0::dec<2,0>
+round(123::dec<2,0>, -2::i32) = 100::dec<2,0>


The last two lines should be:

round(123::dec<3,0>, -2::i32) = 100::dec<3,0>
round(8793::dec<4,0>, -2::i32) = 8800::dec<4,0>

EpsilonPrime · 2025-02-21T03:58:36Z

tests/cases/rounding_decimal/ceil_decimal.test

+### SUBSTRAIT_INCLUDE: '/extensions/functions_rounding_decimal.yaml'
+
+# basic: Basic examples without any special cases
+ceil(2.25::dec<8,2>) = 3::dec<7,0>


FWIW, DuckDB returns decimal<8,0> for the first two and decimal<2,0> for the last one (and decimal<8,0> for the floor tests too). There may be some variation between systems here.

EpsilonPrime · 2025-02-21T04:02:02Z

tests/cases/rounding_decimal/round_decimal.test

+### SUBSTRAIT_INCLUDE: '/extensions/functions_rounding_decimal.yaml'
+
+# basic: Basic examples without any special cases
+round(2::dec<2,0>, 2::i32) = 2::dec<3,0>


DuckDB returns:

2,0
8,1
2,0
3,0
4,0

andrew-coleman · 2025-02-25T14:58:18Z

Thanks @EpsilonPrime, that's helpful. It looks like the scale of the decimal returned by DuckDB is also a function of the second parameter of round() as is the case with Spark.

Given the following query:

select num, floor(num), ceil(num),
     round(num, -2), round(num, -1), round(num, 0),
     round(num, 1), round(num, 2), round(num, 3)
from (values (0.5), (-0.5), (999.9), (-999.9), (2.75)) as table(num)

Spark produces the following output and type schema:

+-------+----------+---------+--------------+--------------+-------------+-------------+-------------+-------------+
|    num|FLOOR(num)|CEIL(num)|round(num, -2)|round(num, -1)|round(num, 0)|round(num, 1)|round(num, 2)|round(num, 3)|
+-------+----------+---------+--------------+--------------+-------------+-------------+-------------+-------------+
|   0.50|         0|        1|             0|             0|            1|          0.5|         0.50|         0.50|
|  -0.50|        -1|        0|             0|             0|           -1|         -0.5|        -0.50|        -0.50|
| 999.90|       999|     1000|          1000|          1000|         1000|        999.9|       999.90|       999.90|
|-999.90|     -1000|     -999|         -1000|         -1000|        -1000|       -999.9|      -999.90|      -999.90|
|   2.75|         2|        3|             0|             0|            3|          2.8|         2.75|         2.75|
+-------+----------+---------+--------------+--------------+-------------+-------------+-------------+-------------+

root
 |-- num: decimal(5,2) (nullable = false)
 |-- FLOOR(num): decimal(4,0) (nullable = true)
 |-- CEIL(num): decimal(4,0) (nullable = true)
 |-- round(num, -2): decimal(4,0) (nullable = true)
 |-- round(num, -1): decimal(4,0) (nullable = true)
 |-- round(num, 0): decimal(4,0) (nullable = true)
 |-- round(num, 1): decimal(5,1) (nullable = true)
 |-- round(num, 2): decimal(6,2) (nullable = true)
 |-- round(num, 3): decimal(6,2) (nullable = true)

I propose, then, that the given an input of type decimal<P, S>, the return type expression should be:

        return: |-
          precision = min(P + 1, 38)
          decimal?<precision, S>

Which is not necessarily what it actually returns, but is the maximum precision/scale of what it could return (taking into account your earlier comment).

How does that sound?

The round function has a number of variants to support different numeric types. This commit adds support for rounding decimals. The precision of the resultant decimal type is one greater than the precision of the input decimal to allow for rounding up to the next decimal digit. The scale of the resultant decimal type is the same as the input type since the result of rounding cannot add any further decimal places. Signed-off-by: Andrew Coleman <[email protected]>

andrew-coleman requested review from jacques-n, cpcloud, westonpace, EpsilonPrime and vbarua as code owners September 26, 2024 15:43

EpsilonPrime reviewed Oct 1, 2024

View reviewed changes

EpsilonPrime self-assigned this Dec 11, 2024

andrew-coleman force-pushed the round branch 2 times, most recently from 97b20bd to 1dbdb4b Compare February 6, 2025 09:43

andrew-coleman requested a review from EpsilonPrime February 12, 2025 10:36

EpsilonPrime reviewed Feb 14, 2025

View reviewed changes

EpsilonPrime reviewed Feb 21, 2025

View reviewed changes

andrew-coleman force-pushed the round branch from 1dbdb4b to 92c964b Compare February 25, 2025 15:33

andrew-coleman requested a review from EpsilonPrime February 28, 2025 07:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add decimal argument support to round function #713

feat: add decimal argument support to round function #713

andrew-coleman commented Sep 26, 2024

EpsilonPrime Oct 1, 2024

andrew-coleman Oct 15, 2024

EpsilonPrime Oct 1, 2024

EpsilonPrime Oct 1, 2024

andrew-coleman Oct 15, 2024

jacques-n Oct 16, 2024 •

edited

Loading

andrew-coleman commented Feb 6, 2025

EpsilonPrime left a comment

EpsilonPrime Feb 14, 2025

EpsilonPrime Feb 21, 2025

EpsilonPrime Feb 21, 2025

EpsilonPrime Feb 21, 2025

EpsilonPrime Feb 21, 2025

EpsilonPrime Feb 21, 2025

andrew-coleman commented Feb 25, 2025

feat: add decimal argument support to round function #713

Are you sure you want to change the base?

feat: add decimal argument support to round function #713

Conversation

andrew-coleman commented Sep 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacques-n Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

andrew-coleman commented Feb 6, 2025

EpsilonPrime left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrew-coleman commented Feb 25, 2025

jacques-n Oct 16, 2024 •

edited

Loading