Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add decimal argument support to round function #713

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

andrew-coleman
Copy link
Contributor

The round function has a number of variants to support different numeric types. This commit adds support for rounding decimals. This is required for the spark module.

and this value cannot be exactly represented, this specifies how
to round it.

- TIE_TO_EVEN: round to nearest value; if exactly halfway, tie
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen with decimal representations? I'd argue all of the floating point handling stuff here does not apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would these not apply? For example, the input value 2.5 could be represented exactly in a decimal type, but rounding it to the nearest integer would result in 2 if the rounding mode is TIE_TO_EVEN or 3 if the mode is TIE_AWAY_FROM_ZERO.

@@ -268,3 +268,43 @@ scalar_functions:
AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
nullability: DECLARED_OUTPUT
return: fp64?
- args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all other decimal functionality we have placed them in _decimal.yaml files. Not sure if we want to have just this one function in a file by itself though.


When `s` is a negative number, the rounding is
performed to the left side of the decimal point
as specified by `s`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this operation affect the scale? We should probably clarify that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this function could return a different decimal type (i.e. reduce the precision and the scale parameters), but I was working on the assumption that it would just return a different value. I'm not sure if that is what you are asking.

Copy link
Contributor

@jacques-n jacques-n Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not assume. Let's add some expected behaviors here. Once we get tests inside core, we can transplant those into test cases. And if this is the behaviors of spark we're trying to match, we shouldn't probably just put this in a spark function file (or name it spark_round here). Decimal behavior is often quite different between different systems.

@EpsilonPrime EpsilonPrime self-assigned this Dec 11, 2024
@andrew-coleman andrew-coleman force-pushed the round branch 2 times, most recently from 97b20bd to 1dbdb4b Compare February 6, 2025 09:43
@andrew-coleman
Copy link
Contributor Author

Just revisiting this. I've updated the parameters of the decimal return type to match the logic used by the Spark round functions:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1492

I note the comment here: #671 (comment)
Does this still apply? If so, how would you suggest I work around this since the output decimal scale depends on that second argument?

Thanks!

Copy link
Member

@EpsilonPrime EpsilonPrime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use any constants in the calculation. The s parameter in round could be not a constant although I suspect some backends only will allow constants here. If it is not a constant then we don't know a lot. If the number turns out to be negative the scale could even increase. Because that second value could change the value in any manner we likely have to return maximum scale and precision here.

I will take it as an action to try running these tests against a backend (as the consumer testing does) to see if the tests are functional/correct.


# negative_rounding: Examples with negative rounding
round(2::dec<2,0>, -2::i32) = 0::dec<2,0>
round(123::dec<2,0>, -2::i32) = 100::dec<2,0>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are three digits here so the precision needs to be 3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last two lines should be:

round(123::dec<3,0>, -2::i32) = 100::dec<3,0>
round(8793::dec<4,0>, -2::i32) = 8800::dec<4,0>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file is the rounding_decimal directory I'd name this file ceil.test.


# negative_rounding: Examples with negative rounding
round(2::dec<2,0>, -2::i32) = 0::dec<2,0>
round(123::dec<2,0>, -2::i32) = 100::dec<2,0>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last two lines should be:

round(123::dec<3,0>, -2::i32) = 100::dec<3,0>
round(8793::dec<4,0>, -2::i32) = 8800::dec<4,0>

### SUBSTRAIT_INCLUDE: '/extensions/functions_rounding_decimal.yaml'

# basic: Basic examples without any special cases
ceil(2.25::dec<8,2>) = 3::dec<7,0>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, DuckDB returns decimal<8,0> for the first two and decimal<2,0> for the last one (and decimal<8,0> for the floor tests too). There may be some variation between systems here.

### SUBSTRAIT_INCLUDE: '/extensions/functions_rounding_decimal.yaml'

# basic: Basic examples without any special cases
round(2::dec<2,0>, 2::i32) = 2::dec<3,0>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DuckDB returns:

2,0
8,1
2,0
3,0
4,0

@andrew-coleman
Copy link
Contributor Author

Thanks @EpsilonPrime, that's helpful. It looks like the scale of the decimal returned by DuckDB is also a function of the second parameter of round() as is the case with Spark.

Given the following query:

select num, floor(num), ceil(num),
     round(num, -2), round(num, -1), round(num, 0),
     round(num, 1), round(num, 2), round(num, 3)
from (values (0.5), (-0.5), (999.9), (-999.9), (2.75)) as table(num)

Spark produces the following output and type schema:

+-------+----------+---------+--------------+--------------+-------------+-------------+-------------+-------------+
|    num|FLOOR(num)|CEIL(num)|round(num, -2)|round(num, -1)|round(num, 0)|round(num, 1)|round(num, 2)|round(num, 3)|
+-------+----------+---------+--------------+--------------+-------------+-------------+-------------+-------------+
|   0.50|         0|        1|             0|             0|            1|          0.5|         0.50|         0.50|
|  -0.50|        -1|        0|             0|             0|           -1|         -0.5|        -0.50|        -0.50|
| 999.90|       999|     1000|          1000|          1000|         1000|        999.9|       999.90|       999.90|
|-999.90|     -1000|     -999|         -1000|         -1000|        -1000|       -999.9|      -999.90|      -999.90|
|   2.75|         2|        3|             0|             0|            3|          2.8|         2.75|         2.75|
+-------+----------+---------+--------------+--------------+-------------+-------------+-------------+-------------+

root
 |-- num: decimal(5,2) (nullable = false)
 |-- FLOOR(num): decimal(4,0) (nullable = true)
 |-- CEIL(num): decimal(4,0) (nullable = true)
 |-- round(num, -2): decimal(4,0) (nullable = true)
 |-- round(num, -1): decimal(4,0) (nullable = true)
 |-- round(num, 0): decimal(4,0) (nullable = true)
 |-- round(num, 1): decimal(5,1) (nullable = true)
 |-- round(num, 2): decimal(6,2) (nullable = true)
 |-- round(num, 3): decimal(6,2) (nullable = true)

I propose, then, that the given an input of type decimal<P, S>, the return type expression should be:

        return: |-
          precision = min(P + 1, 38)
          decimal?<precision, S>

Which is not necessarily what it actually returns, but is the maximum precision/scale of what it could return (taking into account your earlier comment).

How does that sound?

The round function has a number of variants to support different numeric types.  This commit adds support for rounding decimals.

The precision of the resultant decimal type is one greater than
the precision of the input decimal to allow for rounding up to the
next decimal digit.
The scale of the resultant decimal type is the same as the input type since the result of rounding cannot add any further decimal places.

Signed-off-by: Andrew Coleman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants