fix(spark): remove internal functions MakeDecimal and UnscaledValue #386

andrew-coleman · 2025-04-10T14:48:08Z

These two functions are inserted by the catalyst optimizer for queries that involve aggregation (sum & average) of decimal values.
Approx 50% of the TPC-DS tests rely on these internal functions which doesn’t make them interchangable with other query processors. This commit reverses this particular optimisation before conversion to substrait, and removes MakeDecimal and UnscaledValue from the spark.yaml file.

andrew-coleman · 2025-04-10T15:00:48Z

spark/src/main/scala/io/substrait/spark/logical/ToSubstraitRel.scala

@@ -198,8 +221,7 @@ class ToSubstraitRel extends AbstractLogicalPlanVisitor with Logging {
  override def visitWindow(window: Window): relation.Rel = {
    val windowExpressions = window.windowExpressions.map {
      case w: WindowExpression => fromWindowCall(w, window.child.output)
-      case a: Alias if a.child.isInstanceOf[WindowExpression] =>
-        fromWindowCall(a.child.asInstanceOf[WindowExpression], window.child.output)
+      case Alias(w: WindowExpression, _) => fromWindowCall(w, window.child.output)


Same condition, just more scala-like!

Blizzara · 2025-04-11T10:33:56Z

spark/src/main/scala/io/substrait/spark/logical/ToSubstraitRel.scala

+    val actualResultExprs = agg.aggregateExpressions.map {
+      // eliminate the internal MakeDecomal and UnscaledValue functions by undoing the spark optimisation:
+      // https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L2223
+      case Alias(expr, name) =>


hm, an alternative would be to disable the rule 🤔 that'd be a bit nicer maybe since there's no reason to optimize something just to unoptimize it, and also if someone wants to use substrait-spark for transferring optimized spark plans to other spark instance, they could then do that (I think we could still remove these functions from the spark.yml in this repo, but it'd be easier to then add them back). But I don't know if we have any good way of excluding rules, I guess it'd need to be like a readme thing saying "please exclude these rules before sending the plan over to substrait-spark"

I agree that would be the cleanest solution (for us). But it would throw the problem back to the end user and make the API harder to use. Since we can handle the situation within this library, we don't have to bother our users with these nasty internal issues - it will "just work"! 😁

Blizzara · 2025-04-11T10:34:22Z

spark/src/main/scala/io/substrait/spark/logical/ToSubstraitRel.scala

@@ -133,7 +133,30 @@ class ToSubstraitRel extends AbstractLogicalPlanVisitor with Logging {
   */
  override def visitAggregate(agg: Aggregate): relation.Rel = {
    val input = visit(agg.child)
-    val actualResultExprs = agg.aggregateExpressions
+    val actualResultExprs = agg.aggregateExpressions.map {
+      // eliminate the internal MakeDecomal and UnscaledValue functions by undoing the spark optimisation:


Suggested change

// eliminate the internal MakeDecomal and UnscaledValue functions by undoing the spark optimisation:

// eliminate the internal MakeDecimal and UnscaledValue functions by undoing the spark optimisation:

These two functions are inserted by the catalyst optimizer for queries that involve aggregation (sum & average) of decimal values. Approx 50% of the TPC-DS tests rely on these internal functions which doesn’t make them interchangable with other query processors. This commit reverses this particular optimisation before conversion to substrait, and removes MakeDecimal and UnscaledValue from the `spark.yaml` file.

vbarua

Looks reasonable to me, though mostly deferring to @Blizzara's review.

andrew-coleman commented Apr 10, 2025

View reviewed changes

Blizzara reviewed Apr 11, 2025

View reviewed changes

Blizzara approved these changes Apr 11, 2025

View reviewed changes

Blizzara reviewed Apr 11, 2025

View reviewed changes

andrew-coleman force-pushed the remove_make_decimal branch from d742724 to b19dfc8 Compare April 11, 2025 11:18

vbarua approved these changes Apr 11, 2025

View reviewed changes

vbarua merged commit 7a689e9 into substrait-io:main Apr 11, 2025
12 checks passed

andrew-coleman deleted the remove_make_decimal branch April 14, 2025 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(spark): remove internal functions MakeDecimal and UnscaledValue #386

fix(spark): remove internal functions MakeDecimal and UnscaledValue #386

Uh oh!

andrew-coleman commented Apr 10, 2025

Uh oh!

andrew-coleman Apr 10, 2025

Uh oh!

Blizzara Apr 11, 2025

Uh oh!

andrew-coleman Apr 11, 2025

Uh oh!

Blizzara Apr 11, 2025

Uh oh!

vbarua left a comment

Uh oh!

Uh oh!

Uh oh!

	// eliminate the internal MakeDecomal and UnscaledValue functions by undoing the spark optimisation:
	// eliminate the internal MakeDecimal and UnscaledValue functions by undoing the spark optimisation:

fix(spark): remove internal functions MakeDecimal and UnscaledValue #386

fix(spark): remove internal functions MakeDecimal and UnscaledValue #386

Uh oh!

Conversation

andrew-coleman commented Apr 10, 2025

Uh oh!

andrew-coleman Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

Blizzara Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

andrew-coleman Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

Blizzara Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

vbarua left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!