Language Specification [blog/lti-english/language][edit]
Language Specification [blog/lti-english/language][edit]
Before rigorously discussing the topic of type inference, we must first clearly define its inference target—an unambiguous, fully annotated internal language. This language serves as the “true” representation within the compiler and is the final form into which the external language—written by programmers and allowing omitted annotations—is translated.
The internal language used in this article originates from the well-known calculus System $F_\leq$ proposed by Cardelli and Wegner, which incorporates subtyping and impredicative polymorphism. However, we extend it by adding the $\bot$ (Bottom) type. To ensure the existence of supremum and infimum, this algebraic structure’s completeness is fundamental for the concise and deterministic operation of the constraint-solving algorithm in subsequent chapters. Additionally, it can serve as the result type for expressions that never return (e.g., functions that throw exceptions). We first define the formal language as follows
(Note: Overline denotes sequence notation $\overline{X} = X_1, X_2, \dots, X_n$; similarly, $\overline{x} : \overline{T} = x_1 : T_1, \dots, x_n : T_n$):
$$ \begin{array}{ll} T ::= X & \text{type variable} \\ \quad~\mid~ \top & \text{maximal type} \\ \quad~\mid~ \bot & \text{minimal type} \\ \quad~\mid~ \forall \overline{X} . \overline{T} \rightarrow T & \text{function type} \\[1em] e ::= x & \text{variable} \\ \quad~\mid~ \text{fun}[ \overline{X} ]\, ( \overline{x} : \overline{T} )\, e & \text{abstraction} \\ \quad~\mid~ e[ \overline{T} ]\, ( \overline{e} ) & \text{application} \\[1em] \Gamma ::= \bullet & \text{empty context} \\ \quad~\mid~ \Gamma, x : T & \text{variable binding} \\ \quad~\mid~ \Gamma, X & \text{type variable binding} \end{array} $$
We can simply map the above formal language to the following MoonBit code,
where Var implements type variables:
pub(all) enum Type {
TyVar(Var)
TyTop
TyBot
TyFun(Array[Var], Array[Type], Type)
} derive(Eq)
pub(all) enum Expr {
EVar(Var)
EApp(Expr, Array[Type], Array[Expr])
EAbs(Array[Var], Array[(Var, Type)], Expr)
}
The subtyping relation, denoted as $S \lt: T$ (read as “$S$ is a subtype of $T$”), is the core of this formal system. It defines a “substitutability” between types: wherever a term of type $T$ is required, it is safe to substitute a term of type $S$. Unlike definitions in many theoretical works, this article deliberately adopts an algorithmic approach to present the subtyping relation. This means the definition includes only a minimal core set of rules directly usable for implementation of the decision procedure, while properties like transitivity—typically treated as axioms—become provable lemmas in this system. This style brings the definition closer to a specification for a type-checking algorithm: $$
\begin{array}{ll}
X \lt: X & \text{(S-Refl)} \\[1em]
T \lt: \top & \text{(S-Top)} \\[1em]
\bot \lt: T & \text{(S-Bot)} \\[1em]
\dfrac{
\overline{T} \lt: \overline{R} \quad S \le: U
}{
\forall \overline{X}. \overline{R} \rightarrow S \lt: \forall \overline{X}. \overline{T} \rightarrow U
}
& \text{(S-Fun)}
\end{array}
$$ where $\overline{T} \lt: \overline{R}$ holds if and only if $\text{len}(T) = \text{len}(S) \land \forall 1 \leq i \leq \text{len}(S). S_i \lt: T_i$. To support subsequent constraint-solving algorithms, the system must compute the least upper bound (join, denoted $\lor$) and greatest lower bound (meet, denoted $\land$) for any two types. Thanks to $\bot$ and $\top$, these operations are total functions in this system—bounds exist for every type pair. We define them as follows: Least Upper Bound $S \vee T$ Greatest Lower Bound $S \wedge T$ Simple structural induction proves these operations satisfy: Readers are encouraged to implement this in code; solutions can be viewed by expanding the snippet below. In MoonBit code, we implement $S \vee T$ as 1. Subtyping Relation [blog/lti-english/subtype][edit]
The S-Fun rule captures the key characteristics of function subtyping: contravariant in argument types (reversed subtyping direction) and covariant in return type (preserved subtyping direction). Readers may attempt to translate this formalism into a MoonBit predicate function or refer to the folded code snippet below. 1.1. Solution [blog/lti-english/subtype_code][edit]
/// l <: r
struct Subtype {
l : Type
r : Type
}
pub fn Subtype::subtype(self : Subtype) -> Bool {
match (self.l, self.r) {
(TyVar(v1), TyVar(v2)) => v1 == v2
(_, TyTop) => true
(TyBot, _) => true
(TyFun(xs1, r, s), TyFun(xs2, t, u)) if xs1 == xs2 =>
subtypes(t, r) && { l: s, r: u }.subtype()
_ => false
}
}
1.2. Solution [blog/lti-english/subtype_code2][edit]
s | t and $S \wedge T$ as s & t. The definitions of these two operators are as follows:impl BitAnd for Type with land(s, t) {
match (s, t) {
(TyFun(xs1, v, p), TyFun(xs2, w, q)) if xs1 == xs2 => {
guard v.length() == w.length() else { TyBot }
TyFun(xs1, v.zip(w).map(z => z.0 | z.1), p & q)
}
(s, t) if s.subtype(t) => s
(s, t) if t.subtype(s) => t
_ => TyBot
}
}
impl BitOr for Type with lor(s, t) {
match (s, t) {
(TyFun(xs1, v, p), TyFun(xs2, w, q)) if xs1 == xs2 => {
guard v.length() == w.length() else { TyTop }
TyFun(xs1, v.zip(w).map(z => z.0 & z.1), p | q)
}
(s, t) if s.subtype(t) => t
(s, t) if t.subtype(s) => s
_ => TyTop
}
}