Extension of Erased Values in Scala 3

This is a web version of my last semester project report.

Introduction

In functional programming, it is common that programmers describe the possible side effects of a function in its signature. This can be expressed by returning a monad, or by expressing the capability to use the side effects as a parameter of the function. Scala uses the notion of contextual parameters (“Contextual Parameters, Aka Implicit Parameters” n.d.) for such capabilities, allowing the programmer to delegate to the compiler the tedious work of explicitly writing down an (obvious) instance of the capability when calling that function:

def sqrt(x: Double)(using {*} CanThrow[Exception]) =
    if x >= 0 then builtin.sqrt(x)
    else
        throw new Exception("sqrt of negative number")

However, often, the capabilities exist only as a proof of existence for static type-checking purposes, and is never interacted with during the execution of the program (other than being passed around). Hence, this style of programming creates a runtime overhead that we wish to avoid. Erased definitions allows the programmer to explicitly mark a definition/parameter as unused at runtime, guaranteeing that they are optimized away during compilation. The compiler should also guarantee that erased definitions are not referenced to at runtime, and the observable behaviour of the program does not change after erasure of these definitions. Scala already has erased definitions as an experimental feature (“Erased Definitions” n.d.), however the feature is incomplete and has some soundness issues (“Unsoundness Due to ‘Erased‘ · Issue #4060 · Lampepfl/Dotty” n.d.).

My semester project’s goal was to expand on the original implementation of erased definitions, adding needed functionalities as well as fixing soundness issues, giving it a firm ground to become an official Scala feature. Although the goal was not fully realized, my project has made the following contributions, which will be later described in the report:

Finally, the latter half of 7 discusses some future directions on erased defintions, which include a possible formalization of the feature.

Erased Definitions and “Erased” Functions

Erased Definitions is an experimental feature of Scala 3 (dotty), available behind a language feature import (“Erased Definitions” n.d.). When enabled, the current dotty compiler introduces two new concept, erased definitions and erased functions.

Erased definitions are val or def definitions that are guaranteed to be erased at runtime. They cannot be referenced from other code, except other erased defintions’ right-hand side and passed as erased parameters. Note that the erased definition’s right-hand side itself only type-checked by the compiler: in fact, right after type-checking, the right-hand side of the definition gets erased. This creates a problem if the right-hand side itself is a diverging computation, and the assigned type being uninhabited and/or not meant to be easily constructed. We discuss this problem in 6 and 7.1.

Erased functions, as of the start of the project, are functions that takes erased parameters: all parameters are erased at runtime, and the actual function at runtime does not take any (runtime) parameters.

// before erasure
def fn(erased x: Int, y: Int) = 5

erased val x = 5
val t = fn(x, x)
// after erasure
def fn() = 5

val t = fn()

Erased functions have a different signature to their non-erased counterpart, represented by a different method companion (Erased[Contextual][Implicit]Method). This disallows overriding an erased method with a non-erased method and vice-versa, avoiding a conflict in signatures post-erasure.

Individual Erased Parameters

The above representation of method types also has a severe limitation: all parameters have to be erased, or none are erased. A major part of the semester project is to overcome this limitation, by allowing the erased keyword on every parameter before every parameter, and only erasing them during erasure phase.

To support this, several parts of the compiler needs to be changed: – Adding support for erased flags in the parser. – Introduce a special annotation for erased parameters in method types, and change the rules in how we compare method types with erased parameters. – Change how we erase method parameters in erasure. This is rather simple, as this is just an extension of the current work: we only erased erased parameters, instead of all of them in an erased method.

We shall go into more details of the first two changes.

Parser changes

The main challenge with the parser is how to handle erased as a soft keyword, in the following cases:

def f1(using inline Int)
def f2(using inline: Int)

def f3(using erased x: Int)

def f4(using erased inline  Int)
def f5(using erased inline: Int)

Scala allows contextual parameters to be declared unnamed (so they are mainly used as contextual parameters of other function calls in its body). However, by the time the parser reaches the identifier following using keyword, it wants to know whether we are parsing the parameters named or unnamed (since the whole parameter list has to be of the same syntax).

The original parser before erased parameters only allowed inline as a parameter modifier and a soft keyword. Therefore, we can differentiate cases f1 and f2 by checking the lookahead when we see inline: If the lookahead is :, we know that inline is an identifier, otherwise it is a keyword. Furthermore, Scala does not allow implicit inline parameters without an identifier (possibly because of this parsing ambiguity), so f3 is not valid Scala with the inline keyword.

A rather simple approach works here: we eagerly parse all the soft modifiers of the first parameters (erased for f3 and f4, and both erased and inline for f5); figure out if the following syntax is named or unnamed, parse them, and finally apply the modifiers back to the first parameter. To figure out whether the current erased or inline is a soft modifier or not, we simply look at the lookahead, and assume that they are if the lookahead is not :.

The same eager modifier parsing approach is used when parsing lambdas with erased parameters, whereas previously no modifiers are allowed in lambda parameters.

Erased Method Types, Erased Method Companions

We implement method types with erased parameters as normal method types: there are no distinction of MethodCompanions between the former and latter. Erased parameter are instead annotated with an internal annotation, ErasedParam. When comparing method types, we require that the parameters match this annotation: for two method types to have a subtype relationship, each pair of parameters in their respective positions have to both have the annotation, or both not have the annotation.

That is, given

m1 = MethodType([Int, Int @ErasedParam], Int) // (Int, erased Int) => Int
m2 = MethodType([Int, Any @ErasedParam], Int) // (Int, erased Any) => Int
m3 = MethodType([Int, Int], Int)              // (Int, Int) => Int
m4 = MethodType([Int @ErasedParam, Int], Int) // (erased Int, Int) => Int
m5 = MethodType([Int], Int)                   // (Int) => Int

We consider m2 <  : m1 by usual subtyping rules. However, m3, m4 and m5 does not have any subtyping relationship with any other types listed. Interestingly, although m1, m2 and m4 all become m5 after erasure, during typing we do not consider them equal nor having any subtyping relationship. This is to avoid “casting away” the erased parameter, defeating the purpose of having them in the first place.

The approach requires no change to the representation of the method types, nor the TASTy output format. In fact, the special logic for the Erased[Contextual][Implicit]Method companions are now unneeded and removed, making method types handling less cumbersome. It does, however, require some changes in the type comparer to treat the annotation differently while doing method parameter comparisons. This messes with the usual logic of annotations, where they do not matter during comparison; hopefully, since this is a tightened restriction, the implementation shall not introduce unsoundness here.

Another approach was also considered, which does not require this special annotation treatment: adopt the Erased[Contextual][Implicit]Method companions to also carry a boolean flag list indicating erased/non-erased parameters. Method type comparison includes MethodCompanion comparison, and method types with differing companions are naturally considered not comparable. However, this further complicates the already exponentially growing set of method companions (with mixed Impure, Contextual, Implicit); and introduces a non-backward-compatible change to the TASTy representation of the method type. This is considered both difficult to maintain and changing TASTy format is undesirable, so the former approach was preferred.

Function Types with Erased Parameters

Function traits are how Scala represent lambdas/closures in the type system. Typically, a function of arity N with parameter types T1, …, TN and returning type R is assigned the type FunctionN[T1, …, TN, R] with its syntax sugar (T1, …, TN) =>R. The trait itself is defined as

trait FunctionN[-T1, ..., -TN, +R] {
    def apply(x1: T1, ..., xN: TN): R
}

Similarly, contextual function types are represented as ContextualFunctionN, with a similar trait definition (with using added to the parameter list).

Function traits are important during typing and during erasure (into JVM types):

Previously, a function can only have the entire parameter list erased, and hence can be represented by a simple set of traits ErasedFunctionN:

trait ErasedFunctionN[-T1, ..., -TN, +R] {
    def apply(erased x1: T1, ..., erased xN: TN): R
}

The ErasedFunctionN[T1, …, TN, R] traits are then transformed into Function0[R] during erasure. This is a simple representation, however we cannot simply extend this to allow individual erased parameters.

An initial idea for extension was to simply do the same thing as method types, and annotate the parameter types T1, …, TN with ErasedParam annotations. This, however, does not work when adding the same treatment of annotations during ErasedFunctions comparison. This is due to the fact that ErasedFunctionN are just normal traits, which can be aliased or extended:

trait EraseSecondParam[T1, T2, R]
    extends ErasedFunction2[T2, erased T1, R] { }
// is EraseSecondParam[Int, Int, Int] =:= ErasedFunction2[Int, erased Int, Int]?

Note that in the example, EraseSecondParam annotates parameters for ErasedFunction2, but it can shuffle the parameters (or use any type expressions) freely. This makes type-parameter annotation matching impossible.

Erased function types with refined ErasedFunction trait

The final approach involves defining an empty trait, ErasedFunction, and defining a function type with erased parameters as a refinement of this trait:

// (Int, erased Int) => Int
type IeII = ErasedFunction {
    def apply(x1: Int, erased x2: Int): Int
}

With this approach, we simply delegate both typing and erasure to the underlying MethodType (with erased parameters) of the refined apply method:

From this delegation we also get contextual/implicit function types with erased parameters for free.

The implementation mostly focuses on adding support for other phases of the compiler to recognize refined ErasedFunction types as function types. Luckily, this is very similar to the implementation of polymorphic function types (“Polymorphic Function Types” n.d.). However, since the compiler treat polymorphic function types themselves as a separate function type, support for polymorphic functions with erased parameters does not come “for free”, and hence are left out of the scope of the project.

Lambdas with Erased Parameters

Just like methods and function types, lambdas themselves can declare a parameter as erased:

type T = (Int, erased Int) => Int // syntax sugar for refined ErasedFunction trait

val v1    = (x: Int, erased y: Int) => x + 1
val v2: T = (x, erased y) => x + 1
val v3: T = (x, y) => x + 1 // error: expected T, got (Int, Int) => Int

Of note is the special case for v3, where we know that y has to be erased, but the compiler rejects the definition anyway. This is intentional: it is hard to tell whether a parameter is inferred as erased (especially with the introduction of erased classes in 5), and runtime references to the parameters can be surprising. However, if we explicitly mark y as erased, the compiler will accept the defintion, as in v2.

Adding support for a soft keyword in lambda parameter list, however, is another challenge for the parser. Scala has support for placeholder syntax for lambdas (“Placeholder Syntax for Anonymous Functions Scala Language Reference” n.d.), but this introduces an ambiguity in the parser:

val v1 = (_ + 1, _ + 2)           // (Int, Int) => (Int, Int)
val v2 = (x, y) => (x + 1, y + 2) // (Int, Int) => (Int, Int)

val v3 = (x, erased y)      // error: invalid syntax
val v4 = (x, erased y) => x // ok

Note that while parsing the right-hand side of v1 and v2, we cannot distinguish between a tuple expression and a parameter list until we have parsed the entire “tuple” and see the following arrow. The Scala parser overcomes this ambiguity by parsing the parameter list/tuple as a tuple itself, and re-interpret the tuple as a parameter list if it is followed by an arrow. However, this is a problem for erased, since parameter modifiers in a tuple is not valid syntax.

To mitigate this, we allow tuples to have a temporarily invalid ValDef as a tuple value (with modifiers). During parsing, if the tuple turns out to be an actual tuple (and not a parameter list), we check for ValDefs and reject them.

Erased Classes

Erased classes are a syntax sugar for marking every defintion/parameter of the defined class as erased automatically. Classes that extend an erased class has to be erased as well.

erased class A(x: Int)
erased class B extends A(5)
val a = new A(5) // a is erased
def f(x: A, y: Int) = y + 1 // x is an erased parameter
val v = (x: A, y: Int) => y + 1 // (erased x: A, y: Int) => Int

def f1[T <: A](x: T) = 5 // x is NOT erased

During typing, we mark all definitions and parameters whose types declared or inferred to be a concrete erased class as erased. Erased classes are otherwise treated exactly as normal classes. This means that, even if by subtyping rules we can deduce that a type parameter is of an erased class, we don’t infer erased from type parameters (as in f1).

These limitations mean that we are not creating a separate hiararchy of “erased” types, similar to phantom types (Stucki, Biboudis, and Odersky 2017). Instead, erased classes are purely syntactic sugar, and does not change the type system in any way.

Soundness of Erasure

Scala supports path-dependent types, where one type can refer to another type defined as a member of another variable/path, and traits can give a lower and upper bound of its values’ type members. Any bounds can be defined on traits, but a concrete type has to be given when constructing a value. Hence, the existence of a value implies a subtyping relationship between the lower and upper bound. The call-by-value semantics of Scala ensures that an expression with impossible bounds cannot reduce successfully to a value (it can throw an exception, or loop forever). However, this is not ensured for erased values, whose runtime semantics do not exist.

trait A {
    type T >: Any
}
def upcast(erased x: A, y: Any): x.T = y
erased val v: A { type T <: Nothing } = ???
def coerce(x: Any): Int = upcast(v, x)

The above program type-checks if v and the parameter x are not erased, but the program would crash once v’s right-hand side is executed. However, we cannot allow the program to type-check if these defintions are erased, since the right hand side of v is then never executed (in fact, v does not exist at runtime, and we have an unchecked cast in coerce).

Note that at the point of v’s definition, we know that v’s type is uninhabited: there is no type T that is a subtype of Nothing and a supertype of Any. Normally, any attempt to use the path-dependent type v.T are subject to uninhabited type checks (i.e. a realizability check), but it is fine to pass it to a function. This gives us an approach to check for this unsoundness: at the point of passing an expression as an erased parameter, we perform a realizability check on that expression’s type. This is implemented in (“Ensure Erased Vals Pass CheckRealizable · Pull Request #16111 · Lampepfl/Dotty” n.d.).

There are two notable points about this implementation:

Future Work

Interaction with Capabilities

Right now, safer exceptions (“CanThrow Capabilities” n.d.) uses erased CanThrow[E] capabilities to control the types of exceptions allowed to be thrown. The CanThrow instances themselves are synthesized with the right-hand side scala.compiletime. erasedValue[T], which is a special alias to ???, the “undefined” value. Since the erased values are erased at compile-time, this does not cause a crash at runtime.

However, this also means that the programmer themself can summon their own instance of CanThrow with scala.compiletime.erasedValue[T], or another similar expression (such as an endless loop). This completely defeats the purpose of having CanThrow instances as capability tokens. We are in a hard position: we wish to prove that erased defintions are terminating, yet it is an impossible task.

One of the approach that is proposed is to limit the erased expressions themselves. After macro expansions and inlining, erased expressions should be either a constant, or a constructor call. Note that no function calls are allowed; however, we don’t attempt to check that constructors are terminating: it should be the programmer’s responsibility that capability tokens are successfully created, erased or not.

The realization of this approach, including the restriction implementation and a new approach to synthesized erased values, is early work in progress.

Formalization

A possible formalization of erased definitions and parameters, including a proof of semantics-preserving erasure, is highly desired for future work. With the restriction in 7.1, it might also be possible to prove that erased values are side-effect-free.

Idris 2 has compile-time only parameters, as part of its quantitative type theory (Brady 2021). Its explicit annotation and checking for usage counts of the parameters within the function body help make compile-time erasure semantics of dependent type values obvious and easily understandable. It will be interesting to look into the underlying formalisms of quantitative type theory, to apply to our own future formalization.

Dafny has ghost defintions (“Dafny Documentation” n.d.), which are defined to be specification-only variables. Due to a strict distinction between pure and impure functions, pure functions can take both ghost and runtime values and return either; while impure functions are the same as Scala’s methods with erased parameters.

References