Extension of Erased Values in Scala 3
This is a web version of my last semester project report.
Introduction
In functional programming, it is common that programmers describe the possible side effects of a function in its signature. This can be expressed by returning a monad, or by expressing the capability to use the side effects as a parameter of the function. Scala uses the notion of contextual parameters (“Contextual Parameters, Aka Implicit Parameters” n.d.) for such capabilities, allowing the programmer to delegate to the compiler the tedious work of explicitly writing down an (obvious) instance of the capability when calling that function:
def sqrt(x: Double)(using {*} CanThrow[Exception]) =
if x >= 0 then builtin.sqrt(x)
else
throw new Exception("sqrt of negative number")
However, often, the capabilities exist only as a proof of existence for static type-checking purposes, and is never interacted with during the execution of the program (other than being passed around). Hence, this style of programming creates a runtime overhead that we wish to avoid. Erased definitions allows the programmer to explicitly mark a definition/parameter as unused at runtime, guaranteeing that they are optimized away during compilation. The compiler should also guarantee that erased definitions are not referenced to at runtime, and the observable behaviour of the program does not change after erasure of these definitions. Scala already has erased definitions as an experimental feature (“Erased Definitions” n.d.), however the feature is incomplete and has some soundness issues (“Unsoundness Due to ‘Erased‘ · Issue #4060 · Lampepfl/Dotty” n.d.).
My semester project’s goal was to expand on the original implementation of erased definitions, adding needed functionalities as well as fixing soundness issues, giving it a firm ground to become an official Scala feature. Although the goal was not fully realized, my project has made the following contributions, which will be later described in the report:
- Implemented support for mixed erased/non-erased parameters in functions (“Implement Individual Erased Parameters · Pull Request #16507 · Lampepfl/Dotty” n.d.). This fundamentally changes how erased functions are represented and treated in the compiler, which will be cover in 2, 3 and 4.
- Extend support for erased classes, a syntax sugar for classes with only erased definitions. The work is preliminary, however future ideas on expansions as well as possible challenges are described in 5.
- Fixed (“Ensure Erased Vals Pass CheckRealizable · Pull Request #16111 · Lampepfl/Dotty” n.d.) the aforementioned unsoundness bug (“Unsoundness Due to ‘Erased‘ · Issue #4060 · Lampepfl/Dotty” n.d.), which uses a pessimisstic type check on erased definitions to prevent bad bounds. We describe the details in 6.
- Explored the interaction between erased definitions and capabilities, where the ability to create a capability at will is a problem. We present a way forward, by restricting the right-hand sides of the erased defintions to certain forms. The details are on 7.1.
Finally, the latter half of 7 discusses some future directions on erased defintions, which include a possible formalization of the feature.
Erased Definitions and “Erased” Functions
Erased Definitions is an experimental feature of Scala 3 (dotty), available behind a language feature import (“Erased Definitions” n.d.). When enabled, the current dotty compiler introduces two new concept, erased definitions and erased functions.
Erased definitions are val
or def
definitions that are guaranteed to be erased at runtime. They cannot be referenced from other code, except other erased defintions’ right-hand side and passed as erased parameters. Note that the erased definition’s right-hand side itself only type-checked by the compiler: in fact, right after type-checking, the right-hand side of the definition gets erased. This creates a problem if the right-hand side itself is a diverging computation, and the assigned type being uninhabited and/or not meant to be easily constructed. We discuss this problem in 6 and 7.1.
Erased functions, as of the start of the project, are functions that takes erased parameters: all parameters are erased at runtime, and the actual function at runtime does not take any (runtime) parameters.
// before erasure
def fn(erased x: Int, y: Int) = 5
erased val x = 5
val t = fn(x, x)
// after erasure
def fn() = 5
val t = fn()
Erased functions have a different signature to their non-erased counterpart, represented by a different method companion (Erased[Contextual][Implicit]Method
). This disallows overriding an erased method with a non-erased method and vice-versa, avoiding a conflict in signatures post-erasure.
Individual Erased Parameters
The above representation of method types also has a severe limitation: all parameters have to be erased, or none are erased. A major part of the semester project is to overcome this limitation, by allowing the erased
keyword on every parameter before every parameter, and only erasing them during erasure phase.
To support this, several parts of the compiler needs to be changed:
– Adding support for erased
flags in the parser.
– Introduce a special annotation for erased parameters in method types, and change the rules in how we compare method types with erased parameters.
– Change how we erase method parameters in erasure. This is rather simple, as this is just an extension of the current work: we only erased erased
parameters, instead of all of them in an erased
method.
We shall go into more details of the first two changes.
Parser changes
The main challenge with the parser is how to handle erased
as a soft keyword, in the following cases:
def f1(using inline Int)
def f2(using inline: Int)
def f3(using erased x: Int)
def f4(using erased inline Int)
def f5(using erased inline: Int)
Scala allows contextual parameters to be declared unnamed (so they are mainly used as contextual parameters of other function calls in its body). However, by the time the parser reaches the identifier following using
keyword, it wants to know whether we are parsing the parameters named or unnamed (since the whole parameter list has to be of the same syntax).
The original parser before erased
parameters only allowed inline
as a parameter modifier and a soft keyword. Therefore, we can differentiate cases f1
and f2
by checking the lookahead when we see inline
: If the lookahead is :
, we know that inline
is an identifier, otherwise it is a keyword. Furthermore, Scala does not allow implicit inline parameters without an identifier (possibly because of this parsing ambiguity), so f3
is not valid Scala with the inline
keyword.
A rather simple approach works here: we eagerly parse all the soft modifiers of the first parameters (erased
for f3
and f4
, and both erased
and inline
for f5
); figure out if the following syntax is named or unnamed, parse them, and finally apply the modifiers back to the first parameter. To figure out whether the current erased
or inline
is a soft modifier or not, we simply look at the lookahead, and assume that they are if the lookahead is not :
.
The same eager modifier parsing approach is used when parsing lambdas with erased parameters, whereas previously no modifiers are allowed in lambda parameters.
Erased Method Types, Erased Method Companions
We implement method types with erased parameters as normal method types: there are no distinction of MethodCompanion
s between the former and latter. Erased parameter are instead annotated with an internal annotation, ErasedParam
. When comparing method types, we require that the parameters match this annotation: for two method types to have a subtype relationship, each pair of parameters in their respective positions have to both have the annotation, or both not have the annotation.
That is, given
m1 = MethodType([Int, Int @ErasedParam], Int) // (Int, erased Int) => Int
m2 = MethodType([Int, Any @ErasedParam], Int) // (Int, erased Any) => Int
m3 = MethodType([Int, Int], Int) // (Int, Int) => Int
m4 = MethodType([Int @ErasedParam, Int], Int) // (erased Int, Int) => Int
m5 = MethodType([Int], Int) // (Int) => Int
We consider m2
< : m1
by usual subtyping rules. However, m3
, m4
and m5
does not have any subtyping relationship with any other types listed. Interestingly, although m1
, m2
and m4
all become m5
after erasure, during typing we do not consider them equal nor having any subtyping relationship. This is to avoid “casting away” the erased parameter, defeating the purpose of having them in the first place.
The approach requires no change to the representation of the method types, nor the TASTy output format. In fact, the special logic for the Erased[Contextual][Implicit]Method
companions are now unneeded and removed, making method types handling less cumbersome. It does, however, require some changes in the type comparer to treat the annotation differently while doing method parameter comparisons. This messes with the usual logic of annotations, where they do not matter during comparison; hopefully, since this is a tightened restriction, the implementation shall not introduce unsoundness here.
Another approach was also considered, which does not require this special annotation treatment: adopt the Erased[Contextual][Implicit]Method
companions to also carry a boolean flag list indicating erased/non-erased parameters. Method type comparison includes MethodCompanion
comparison, and method types with differing companions are naturally considered not comparable. However, this further complicates the already exponentially growing set of method companions (with mixed Impure
, Contextual
, Implicit
); and introduces a non-backward-compatible change to the TASTy representation of the method type. This is considered both difficult to maintain and changing TASTy format is undesirable, so the former approach was preferred.
Function Types with Erased Parameters
Function traits are how Scala represent lambdas/closures in the type system. Typically, a function of arity N with parameter types T1, …, TN and returning type R is assigned the type Function
N[
T1, …, TN, R]
with its syntax sugar (
T1, …, TN) =>
R. The trait itself is defined as
trait FunctionN[-T1, ..., -TN, +R] {
def apply(x1: T1, ..., xN: TN): R
}
Similarly, contextual function types are represented as ContextualFunction
N, with a similar trait definition (with using
added to the parameter list).
Function traits are important during typing and during erasure (into JVM types):
- During typing, we want to distinguish between different function types (
Function1[Int, Int]
andFunction2[Int, Int, Int]
shoud not have a subtyping relation, nor doFunction1[Int, Int]
andContextualFunction1[Int, Int]
; however,Function1[Any, Int]
< :Function1[Int, Int]
). - During erasure, we want to convert function types into the JVM equivalent. Furthermore, Scala-specific function types like
ContextualFunction
s do not have a JVM equivalent, they will need to be converted into normalFunction
s.
Previously, a function can only have the entire parameter list erased, and hence can be represented by a simple set of traits ErasedFunction
N:
trait ErasedFunctionN[-T1, ..., -TN, +R] {
def apply(erased x1: T1, ..., erased xN: TN): R
}
The ErasedFunction
N[
T1, …, TN, R]
traits are then transformed into Function0[
R]
during erasure. This is a simple representation, however we cannot simply extend this to allow individual erased parameters.
An initial idea for extension was to simply do the same thing as method types, and annotate the parameter types T1, …, TN with ErasedParam
annotations. This, however, does not work when adding the same treatment of annotations during ErasedFunction
s comparison. This is due to the fact that ErasedFunctionN
are just normal traits, which can be aliased or extended:
trait EraseSecondParam[T1, T2, R]
extends ErasedFunction2[T2, erased T1, R] { }
// is EraseSecondParam[Int, Int, Int] =:= ErasedFunction2[Int, erased Int, Int]?
Note that in the example, EraseSecondParam
annotates parameters for ErasedFunction2
, but it can shuffle the parameters (or use any type expressions) freely. This makes type-parameter annotation matching impossible.
Erased function types with refined ErasedFunction
trait
The final approach involves defining an empty trait, ErasedFunction
, and defining a function type with erased parameters as a refinement of this trait:
// (Int, erased Int) => Int
type IeII = ErasedFunction {
def apply(x1: Int, erased x2: Int): Int
}
With this approach, we simply delegate both typing and erasure to the underlying MethodType
(with erased parameters) of the refined apply
method:
- During type comparison, the type comparer naturally compares the
apply
refined methods, which performs the comparison of the underlyingMethodType
s, so the logic implemented in 3 kicks in automatically. - During erasure, we erase all refined types of which the base type is derived from
ErasedFunction
(to support aliased/extended traits of it). Erasure looks at the refinedapply
method type, and erases the value into aFunction
M value, where M is the number of non-erased parameters.
From this delegation we also get contextual/implicit function types with erased parameters for free.
The implementation mostly focuses on adding support for other phases of the compiler to recognize refined ErasedFunction
types as function types. Luckily, this is very similar to the implementation of polymorphic function types (“Polymorphic Function Types” n.d.). However, since the compiler treat polymorphic function types themselves as a separate function type, support for polymorphic functions with erased parameters does not come “for free”, and hence are left out of the scope of the project.
Lambdas with Erased Parameters
Just like methods and function types, lambdas themselves can declare a parameter as erased:
type T = (Int, erased Int) => Int // syntax sugar for refined ErasedFunction trait
val v1 = (x: Int, erased y: Int) => x + 1
val v2: T = (x, erased y) => x + 1
val v3: T = (x, y) => x + 1 // error: expected T, got (Int, Int) => Int
Of note is the special case for v3
, where we know that y has to be erased, but the compiler rejects the definition anyway. This is intentional: it is hard to tell whether a parameter is inferred as erased (especially with the introduction of erased classes in 5), and runtime references to the parameters can be surprising. However, if we explicitly mark y as erased
, the compiler will accept the defintion, as in v2
.
Adding support for a soft keyword in lambda parameter list, however, is another challenge for the parser. Scala has support for placeholder syntax for lambdas (“Placeholder Syntax for Anonymous Functions Scala Language Reference” n.d.), but this introduces an ambiguity in the parser:
val v1 = (_ + 1, _ + 2) // (Int, Int) => (Int, Int)
val v2 = (x, y) => (x + 1, y + 2) // (Int, Int) => (Int, Int)
val v3 = (x, erased y) // error: invalid syntax
val v4 = (x, erased y) => x // ok
Note that while parsing the right-hand side of v1
and v2
, we cannot distinguish between a tuple expression and a parameter list until we have parsed the entire “tuple” and see the following arrow. The Scala parser overcomes this ambiguity by parsing the parameter list/tuple as a tuple itself, and re-interpret the tuple as a parameter list if it is followed by an arrow. However, this is a problem for erased
, since parameter modifiers in a tuple is not valid syntax.
To mitigate this, we allow tuples to have a temporarily invalid ValDef
as a tuple value (with modifiers). During parsing, if the tuple turns out to be an actual tuple (and not a parameter list), we check for ValDef
s and reject them.
Erased Classes
Erased classes are a syntax sugar for marking every defintion/parameter of the defined class as erased
automatically. Classes that extend an erased class has to be erased as well.
erased class A(x: Int)
erased class B extends A(5)
val a = new A(5) // a is erased
def f(x: A, y: Int) = y + 1 // x is an erased parameter
val v = (x: A, y: Int) => y + 1 // (erased x: A, y: Int) => Int
def f1[T <: A](x: T) = 5 // x is NOT erased
During typing, we mark all definitions and parameters whose types declared or inferred to be a concrete erased class as erased
. Erased classes are otherwise treated exactly as normal classes. This means that, even if by subtyping rules we can deduce that a type parameter is of an erased class, we don’t infer erased
from type parameters (as in f1
).
These limitations mean that we are not creating a separate hiararchy of “erased” types, similar to phantom types (Stucki, Biboudis, and Odersky 2017). Instead, erased classes are purely syntactic sugar, and does not change the type system in any way.
Soundness of Erasure
Scala supports path-dependent types, where one type can refer to another type defined as a member of another variable/path, and traits can give a lower and upper bound of its values’ type members. Any bounds can be defined on traits, but a concrete type has to be given when constructing a value. Hence, the existence of a value implies a subtyping relationship between the lower and upper bound. The call-by-value semantics of Scala ensures that an expression with impossible bounds cannot reduce successfully to a value (it can throw an exception, or loop forever). However, this is not ensured for erased values, whose runtime semantics do not exist.
trait A {
type T >: Any
}
def upcast(erased x: A, y: Any): x.T = y
erased val v: A { type T <: Nothing } = ???
def coerce(x: Any): Int = upcast(v, x)
The above program type-checks if v and the parameter x are not erased
, but the program would crash once v’s right-hand side is executed. However, we cannot allow the program to type-check if these defintions are erased
, since the right hand side of v is then never executed (in fact, v does not exist at runtime, and we have an unchecked cast in coerce
).
Note that at the point of v’s definition, we know that v’s type is uninhabited: there is no type T that is a subtype of Nothing
and a supertype of Any
. Normally, any attempt to use the path-dependent type v.T
are subject to uninhabited type checks (i.e. a realizability check), but it is fine to pass it to a function. This gives us an approach to check for this unsoundness: at the point of passing an expression as an erased parameter, we perform a realizability check on that expression’s type. This is implemented in (“Ensure Erased Vals Pass CheckRealizable · Pull Request #16111 · Lampepfl/Dotty” n.d.).
There are two notable points about this implementation:
- The realizability check itself is not a perfect check, but rather a pessimisstic check: types passing the check are guaranteed to be inhabited, but not vice versa.
- We only perform checks if the function being called uses the path-dependent type in its parameters or result type. This seems to be enough to prevent the unsoundness bug, but it is future work to formally prove that this is a sufficient condition.
Future Work
Interaction with Capabilities
Right now, safer exceptions (“CanThrow Capabilities” n.d.) uses erased CanThrow[E]
capabilities to control the types of exceptions allowed to be thrown. The CanThrow
instances themselves are synthesized with the right-hand side scala.compiletime.
erasedValue[T]
, which is a special alias to ???
, the “undefined” value. Since the erased values are erased at compile-time, this does not cause a crash at runtime.
However, this also means that the programmer themself can summon their own instance of CanThrow
with scala.compiletime.erasedValue[T]
, or another similar expression (such as an endless loop). This completely defeats the purpose of having CanThrow
instances as capability tokens. We are in a hard position: we wish to prove that erased defintions are terminating, yet it is an impossible task.
One of the approach that is proposed is to limit the erased expressions themselves. After macro expansions and inlining, erased expressions should be either a constant, or a constructor call. Note that no function calls are allowed; however, we don’t attempt to check that constructors are terminating: it should be the programmer’s responsibility that capability tokens are successfully created, erased or not.
The realization of this approach, including the restriction implementation and a new approach to synthesized erased values, is early work in progress.
Formalization
A possible formalization of erased definitions and parameters, including a proof of semantics-preserving erasure, is highly desired for future work. With the restriction in 7.1, it might also be possible to prove that erased values are side-effect-free.
Related Works
Idris 2 has compile-time only parameters, as part of its quantitative type theory (Brady 2021). Its explicit annotation and checking for usage counts of the parameters within the function body help make compile-time erasure semantics of dependent type values obvious and easily understandable. It will be interesting to look into the underlying formalisms of quantitative type theory, to apply to our own future formalization.
Dafny has ghost defintions (“Dafny Documentation” n.d.), which are defined to be specification-only variables. Due to a strict distinction between pure and impure functions, pure functions can take both ghost and runtime values and return either; while impure functions are the same as Scala’s methods with erased parameters.
References
- Brady, Edwin. 2021. “Idris 2: Quantitative Type Theory in Practice.” arXiv. https://doi.org/10.48550/arXiv.2104.00480.
- “CanThrow Capabilities.” n.d. Accessed January 29, 2023. https://docs.scala-lang.org/scala3/reference/experimental/canthrow.html#.
- “Contextual Parameters, Aka Implicit Parameters.” n.d. Scala Documentation. Accessed January 25, 2023. https://docs.scala-lang.org/tour/implicit-parameters.html.
- “Dafny Documentation.” n.d. Dafny Documentation. Accessed January 29, 2023. https://dafny-lang.github.io/dafny/QuickReference.html.
- “Ensure Erased Vals Pass CheckRealizable · Pull Request #16111 · Lampepfl/Dotty.” n.d. GitHub. Accessed January 29, 2023. https://github.com/lampepfl/dotty/pull/16111.
- “Erased Definitions.” n.d. Accessed January 25, 2023. https://docs.scala-lang.org/scala3/reference/experimental/erased-defs.html#.
- “Implement Individual Erased Parameters · Pull Request #16507 · Lampepfl/Dotty.” n.d. GitHub. Accessed January 29, 2023. https://github.com/lampepfl/dotty/pull/16507.
- “Placeholder Syntax for Anonymous Functions Scala Language Reference.” n.d. Accessed January 29, 2023. https://scala-lang.org/files/archive/spec/2.13/06-expressions.html#placeholder-syntax-for-anonymous-functions.
- “Polymorphic Function Types.” n.d. Accessed January 29, 2023. https://docs.scala-lang.org/scala3/reference/new-types/polymorphic-function-types.html.
- Stucki, Nicolas Alexander, Aggelos Biboudis, and Martin Odersky, eds. 2017. Dotty Phantom Types.
- “Unsoundness Due to ‘Erased‘ · Issue #4060 · Lampepfl/Dotty.” n.d. GitHub. Accessed January 25, 2023. https://github.com/lampepfl/dotty/issues/4060.