Skip to content

Function overload resolution

Evan Machusak edited this page Mar 18, 2024 · 10 revisions

This article describes how all aspects of how function overload resolution works in CQL.

Intro

Function overload resolution is the process by which a set of arguments are bound to a function's operands. When two functions exist which have the same name but differ only by its list of operands, also known as its signature, we must determine which of these functions is the best possible match.

For example, consider these two overloads:

define function Foo(x Integer)

define function Foo(x String)

define "Call with Integer": Foo(1)

define "Call with String": Foo('hello')

In the above example, x is the operand of Foo. In the first overload, the type of the operand is Integer; in the second, it is String.

The value 1 as it appears in Foo(1) is an argument. This is the value that will be bound to the operand x in Foo.

These terms will be used throughout this article and it is critical to keep them straight. These are also the terms we use in the SDK codebase.

Algorithm

There are three main aspects to overload resolution:

  • Computing the cost of a coercion
  • Inferring the value of generic arguments
  • Comparing compatible overloads to pick the best one

We will discuss each one of these in detail.

Coercion cost

Coercion is the process of binding an argument of one type to an operand of another type. This allows usage like this:

define function TakesDecimal(x Decimal)

define CallWithInteger: TakesDecimal(1)

If CQL didn't implement automatic coercion during overload resolution, then CallWithInteger would be an error. The author would have been required to write this instead:

define CallWithInteger: TakesDecimal(ToDecimal(1))

This would make CQL unwieldy to use. Virtually every modern language that any programmer will ever use implements argument-to-operand type coercion, so this concept should be familiar to you.

Consider this scenario:

define function Foo(x Decimal): 'decimal'

define function Foo(x Integer): 'integer'

define CallWithInteger: Foo(1)

In this example, your programming instincts will tell you that CallWithInteger will return integer because it will call the overload that takes an Integer - the language will not decide to convert 1 to Decimal implicitly, like it does in the prior example, because a "better" overload exists.

The reason Foo(x Integer) is a better signature than Foo(x Decimal) is because the coercion is cheaper.

CQL defines the relative cost of the coercion it supports in §4.9. Conversion Precedence.

We implement this functionality through a CoercionProvider class, which does two things:

  • Computes the cost of a conversion, returning a CoercionCost enum value
  • Creates the coercion by changing the ELM of the argument

Coercion cost algorithm

This function has a signature of:

CoercionCost GetCoercionCost(Expression from, TypeSpecifier to)

from is as an ELM Expression for the argument which we are coercing. to is the type of the operand to which we are coercing the argument.

For example, in this ELM:

define function TakesDecimal(x Decimal)

define CallWithInteger: TakesDecimal(1)

We would be passing a Literal expression whose value is 1 as from, and a NamedTypeSpecifier for the system Decimal type as to.

This function returns a CoercionCost enumeration whose values are as follows:

ExactMatch = 1
Subtype = 2
Compatible = 3
Cast = 4
ImplicitToSimpleType = 5
ImplicitToClassType = 6
IntervalPromotion = 7
ListDemotion = 8
IntervalDemotion = 9
ListPromotion = 10
Incompatible = 1000

This order corresponds to the language of the specification.

The function follows these steps in order.

  1. Check if from's type is an exact match with from. If true, return ExactMatch, else continue.

Exact match compares type specifiers according to §4.3 Type Testing. In particular, it performs an Identity check.

  1. If to is an Interval whose point type is not valid, return Incompatible, else continue.

This is an undocumented rule that does not appear in the specification language, but is established through tests. Without this check, a number of the published specification tests will result in ambiguities.

It comes in this order because if from is Interval<Any> and to is Interval<Any>, this should constitute an exact match. If from is an Interval<T>, then the subtype check that comes immediately after this one will return true because Interval<T> is always a subtype of Interval<Any> for all T.

  1. Check if from's type is a subtype of to. If true, return Subtype, else continue.

Subtype relationships are established in the model info, and are determined using it. All types are subtypes of the System.Any subtype, regardless of whether the model establishes this or not (e.g., List<T> and Interval<T> are not declared in models, but are subtypes of Any).

  1. Check if from's type is Any. If true, return Compatible, else continue.

The language from the specification for Compatible says this:

Compatible – If the invocation type is compatible with the declared type of the argument (e.g., the invocation type is Any)

The specification here uses invocation type to mean the type of the argument.

The purpose of this is primarily to allow this kind of usage:

define function TakesInteger(x Integer): x

define "Call with null": TakesInteger(null)
  1. Check if from's type can be cast to to. If true, return Cast, else continue.

Specifically, we check whether to is a subtype of from. For example, the conversion from Any to Integer requires a cast.

We also consider list element types and interval point types here, allowing List<Any> to be cast to List<Integer> and Interval<Any> to be cast to Interval<Integer>.

We also check that when from is a Choice type:

  • If to is also a Choice type, then there must be a pair (typeInFrom, typeInTo) where typeInFrom can be cast to typeInTo
  • Else, there must be a pair (typeInFrom, to) where typeInFrom can be cast to to.

Otherwise, if to is a Choice type, then there must be a pair (from, typeInTo) where from can be cast to typeInTo.

  1. Check if from can be implicitly cast to to. If true, then if to is a simple type, return ImplicitToSimpleType, els return ImplicitToClassType, else continue.

Implicit conversions are defined in §4.6.3. The determination of whether a type is a simple type or a class type is based on how it is declared in the model information.

The system types that are class types are:

  • Quantity
  • Ratio
  • Code
  • Concept
  • Vocabulary
  • ValueSet
  • CodeSystem

All other types are simple types.

  1. If EnableIntervalPromotion is configured, to is an Interval type, and from can be promoted, return IntervalPromotion, else continue.

An interval can be promoted if the cost of converting from to the point type of to is not Incompatible.

  1. If EnableListDemotion is configured, from is an List type and can be demoted, return ListDemotion, else continue.

A list can be demoted if the cost of converting its element type to to is not Incompatible.

  1. If EnableIntervalDemotion is configured, from is an Interval type, and from can be demoted, return IntervalDemotion, else continue.

An interval can be demoted if the cost of converting its point type to to is not Incompatible.

  1. If EnableListPromotion is configured, to is a List type, and from can be promoted, return ListPromotion, else continue.

A type can be promoted to a list if the cost of converting from to the element type of to is not Incompatible.

  1. Return Incompatible.

If none of the options for coercing from to to result in success, then from cannot be coerced to to.

Coercing

Arguments are always coerced when bound to a function's operands. In the trivial case where the argument and the operand are exactly the same type, no coercion takes place and the argument is bound directly to the operand. In all other cases, some coercion takes place, which generates new ELM nodes.

CoercionResult<Expression> Coerce(Expression expression, TypeSpecifier to)

CoercionResult<T> is a simple record type that stores the resulting expression, the cost, and provides a Success property that checks that cost is not Incompatible.

The first step is to compute the coercion cost. Once done, we switch on that cost as follows:

  • ExactMatch: the result is expression.
  • Subtype: the result is converted with an As node.
  • Compatible: the result is converted with an As node.
  • Cast: the result is converted with an As node.
  • ImplicitToSimpleType: the result is converted using the To* nodes, e.g. ToDecimal. When one does not exist, it is converted with an As node. The following conversions are used: ToBoolean, ToInteger, ToLong, ToDecimal, and ToString.
  • ImplicitToClassType: the result is converted using the To* nodes, e.g. ToQuantity. When one does not exist, it is converted with an As node. The following conversions are used: ToQuantity, ToDate, ToDateTime, ToTime, ToConcept, and ToRatio.
  • IntervalPromotion: a new closed point interval is created whose low and high values are expression converted to to's point type via As.
  • ListDemotion: the result is a SingletonFrom whose operand is expression, converted with an As node.
  • IntervalDemotion: the result is a Start whose operand is expression, converted with an As node.
  • ListPromotion: the result is a ToList node whose operand is expression converted to to's element type with an As node.
  • Incompatible: the result is expression.

Note that for Incompatible costs, an error annotation is added to the node downstream of this function.

Generic type inference

When a function is generic, CQL infers the type of the generic type argument based on the arguments provided. This inference is done *statically.

Note the following constraints, true as of the time of this writing:

  • Generic functions can have only one type argument, T;
  • Generic type arguments cannot have constraints applied to them;
  • Users cannot define their own generic functions.

This problem has two variants:

  1. Determining whether a set of arguments can match the signature of a generic function by inferring the generic type T;
  2. Choosing from a list of overloads which overload, if any, has the cheapest coercion cost.

The first step in signature matching is argument count matching.

Although user defined functions cannot define optional operands, system functions can. An example is the Date function, which can be invoked with 1, 2, or 3 parameters depending on the precision of the date, e.g.:

Date(2024)
Date(2024, 3)
Date(2024, 3, 18)

For system functions, we only declare how many of the operands are required, so for the Date function, we say that at least 1 argument must be specified.

Many functions have an optional _precision_ operand. For those functions, 1 (for unary) or 2 (for binary) operands are required with the 2nd or 3rd operand (precision) being optional.

Matching a generic signature

The algorithm works as follows:

lowestCost := max(ConversionCost)
bestCoercion := {}
for each (argument, operand) do
   inference := infer_generic(argument.resultTypeSpecifier, operand.operandTypeSpecifier)
   if (inference is not empty) then
      replaced := replace_generic_arguments(arguments, operands, inference)
      totalCost := 0
      for each r in replaced do
        totalCost := totalCost + r.Cost
      end
      if (totalCost < lowestCost) then
        lowestCost := totalCost
        bestCoercion := replaced
      end
   end
end

We go through the argument list one by one, and assume the generic argument T is the type of the argument. With that assumption, we replace the generic type argument T in all operands with the argument type, and then compute that coercion cost. We treat the total cost of binding arguments to operands as the sum of the coercion cost of each argument to its operand. Thus:

define function Add(x Integer, y Integer): x + y

Calling Add(1,2) would have a cost of 2, because binding 1 to x and 2 to y both have a cost of ExactMatch (1).

The final generic type inference is the one which has the lowest total cost. It is possible that two different inferences for type T could result in the same total cost. When that happens, the result is ambiguous, and the arguments cannot be bound to those operands. This happens most often with null arguments.

Inferring generic types

We infer the generic type T by comparing the argument's type and the operand's type, taking into account whether the argument and operand types are scalar type, List types, or Interval types.

  • When the argument is a scalar type and the operand type is T, we infer T to be the argument type.
  • When the argument is a list type and the operand type is List<T>, we infer T to be the argument's element type.
  • When the argument is a scalar type and the operand type is a List<T>, we recursively infer T from the argument type and the operand's element type.
  • When the argument is an interval type and the operand is Interval<T>, we recursively infer T from the argument's point type and the operand's point type.
  • When the argument is a scalar type and the operand is Interval<T>, we recursively infer T from the argument's type and the operand's point type.

We specifically do not allow List<Integer> to match an operand declared simply as T. We require that the operand be declared as List<T>. Thus, an argument of List<Integer> being inferred against List<T> would result in T being inferred as Integer.

In our experimenting, allowing List and Interval types to bind directly to T (instead of List<T> and Interval<T>) causes a lot of specification tests to fail, because we end up with ambiguous signature matches.

Replacing generic types

Once the inferences are made, we replace generic operand types with their inferred types using the following rules:

Given T' to be the inferred type:

  • When the operand type is T, the operand type is replaced with T'
  • When the operand type is List<T>, the operand type is replaced with List<T'>
  • When the operand type is Interval<T>, the operand type is replaced with Interval<T'>

Matching overloaded functions

Consider a function like ProperIncludes which has these overloads:

ProperIncludes(List<T>, T)
ProperIncludes(List<T>, List<T>)
ProperIncludes(Interval<T>, T)
ProperIncludes(Interval<T>, Interval<T>)

In order to find the best match, the algorithm works as follows:

  1. Try to match the arguments against all of the signatures in the overload
  2. Group the results by their total cost, and order them by cheapest first.
  3. If no match is compatible, pick the first overload and add an error condition ("Could not resolve...")
  4. If the cheapest group has only one match, return that match.
  5. If the cheapest group has more than one match, pick the first match and add an error condition ("Call to operator is ambiguous with: ...")