我能在遗忘线性代数和 Leibniz 律的情况下通关矩阵求逆和符号微分吗 [leibniz-amnesia][edit]

kokic
2025-10-29

1. 引入 [intro][edit]

kokic
2025-10-29

本文旨在展现一种不依赖于经典自动微分程序硬编码线性性和 Leibniz 律的微分手段，且此手段同时可用于矩阵求逆。我们将在动机部分简单地阐述此手段的数学原理，而其他部分则不涉及任何中学以上的数学推导。

首先回顾经典自动微分程序中对线性性和 Leibniz 律的实现，以下 Haskell 代码片段引自 Automatic Differentiation is Trivial in Haskell.

instance Num Dual where
  (+) (Dual u u') (Dual v v') = Dual (u + v) (u' + v')
  (*) (Dual u u') (Dual v v') = Dual (u * v) (u' * v + u * v')
  (-) (Dual u u') (Dual v v') = Dual (u - v) (u' - v')
  ...

instance Fractional Dual where
  (/) (Dual u u') (Dual v v') = Dual (u / v) ((u' * v - u * v') / v ** 2)
  ...

此实现虽然在性能和简洁方面取得了良好的平衡，但不免使人追问：这四条微分规则是否都必须写出？答案自然是否定的，不过在具体编码环节要如何实践这一点，许多读者也许并不能立刻想到。

另一个也常被冠以 Functional Pearl 之名的主题是，利用星半环的性质实现矩阵的种种运算，如求逆。对此不熟悉的读者可阅读 Stephen Dolan 的 Fun with semirings: a functional pearl on the abuse of linear algebra.

Exegesis 2. 复健 [review][edit]

kokic
2025-10-29

$\gdef\spaces#1{~ #1 ~}$ $\gdef\Mat{\operatorname{Mat}}$

考虑一个 $2 \times 2$ 可逆矩阵 $M \in \Mat_{2 \times 2}(R)$, 也即

$$ M \spaces= \begin{pmatrix} a & b \\ c & d \end{pmatrix}, \quad ad - bc \spaces\neq 0 $$

为了储存 $(a,b,c,d) \in R^4$ 的信息，我们可以准备一个如下形式的结构体以及一个简化的构造器 Mat2x2::mk $: R^4 \to \Mat_{2 \times 2}(R)$

struct Mat2x2[R] {
  a : R
  b : R
  c : R
  d : R
} derive(Show, Eq)

fn[R] Mat2x2::mk(a : R, b : R, c : R, d : R) -> Mat2x2[R] {
  { a, b, c, d }
}

我们知道，为了保证 Mat2x2[R] 能有与我们印象中的矩阵相符的性质，此处矩阵元素类型至少是一个半环. 随后可得 $0_{\Mat_{2 \times 2}}$, $1_{\Mat_{2 \times 2}}$ 和 Mat2x2[R] 上典范的加法与乘法。

impl[R : Semiring] HasNil for Mat2x2[R] with nil() {
  Mat2x2::mk(R::nil(), R::nil(), R::nil(), R::nil())
}

$$ 0 \in {\small 半环} ~ R \spaces\implies 0_{\Mat_{2 \times 2}} \spaces= \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix} \spaces\in \Mat_{2 \times 2}(R) $$

impl[R : Semiring] HasOne for Mat2x2[R] with one() {
  Mat2x2::mk(R::one(), R::nil(), R::nil(), R::one())
}

$$ 0,1 \in {\small 半环} ~ R \spaces\implies 1_{\Mat_{2 \times 2}} \spaces= \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \spaces\in \Mat_{2 \times 2}(R) $$

impl[R : Add] Add for Mat2x2[R] with add(u : Mat2x2[R], v : Mat2x2[R]) {
  Mat2x2::mk(u.a + v.a, u.b + v.b, u.c + v.c, u.d + v.d)
}

$$ +: R^2 \to R \spaces\implies +_{\Mat_{2 \times 2}}: \Mat_{2 \times 2}(R)^2 \to \Mat_{2 \times 2}(R) $$

impl[R : Add + Mul] Mul for Mat2x2[R] with mul(u : Mat2x2[R], v : Mat2x2[R]) {
  {
    a: u.a * v.a + u.b * v.c,
    b: u.a * v.b + u.b * v.d,
    c: u.c * v.a + u.d * v.c,
    d: u.c * v.b + u.d * v.d,
  }
}

$$ \times: R^2 \to R \spaces\implies \times_{\Mat_{2 \times 2}}: \Mat_{2 \times 2}(R)^2 \to \Mat_{2 \times 2}(R) $$

于是我们得到了一个确实可计算 $+, \times$ 的 $\Mat_{2 \times 2}(R)$ 结构，且立刻可以写下：

impl[R : Semiring] Semiring for Mat2x2[R]

从而使 $\Mat_{2 \times 2}(R)$ 也构成一个半环. 我们的目标是使 $\Mat_{2 \times 2}(R)$ 更进一步成为星半环, 也就是配备了 Kleene 星运算的半环.

Exegesis 3. 正则魔术 [kira][edit]

kokic
2025-10-29

$\gdef\spaces#1{~ #1 ~}$ $\gdef\Mat{\operatorname{Mat}}$

大部分念过中学数学的读者想来是熟悉等比数列的，即

$$ 1 ~(= q^0), ~~ q^1, ~~ q^2, ~~ q^3, \spaces\cdots, ~ q^n $$

此数列的和不难求出，为 $\frac{1-q^n}{1-q}$. 特别地，我们发现，只要 $|q|<1$, 那 $n \to \infty$ 时一定有 $q^n = 0$, 从而

$$ 1 + q + q^2 + q^3 + \cdots \spaces= \frac{1}{1-q} \tag{1} $$

现在，如果读者还稍微了解过正则表达式，将 $+$ 理解为匹配中的或组合子 |, 将 $q$ 理解为字符 "q", 那么 "q" 的字符闭包 q* 就匹配空串 "" $(= q^0)$ 或 "q" 的若干次重复 "qqq...". 即

q* := "" | q | qq | qqq | ...

如果我们沿用这个 Kleene 星记号，则式 $(1)$ 就重写为

$$ q^* \spaces= \frac{1}{1-q} \quad \xRightarrow{~ q ~\leadsto~ 1-q ~} \quad q^{-1} \spaces= (1-q)^* $$

请读者暂且记住 $q^{-1} = (1-q)^*$ 这样一种关系，而忽略其完全形式的推导。

对于一个矩阵 $M$, 我们已经在复健中实现了 $+$ 和 $\times$, 为了能真正计算矩阵的 Kleene 闭包 $\square^*_{\Mat_{2 \times 2}}: \Mat_{2 \times 2}(R) \to \Mat_{2 \times 2}(R)$, 我们需要从自动机的图景中窥探 $M^*$ 之等价表达，此表达应只涉及星半环 $R$ 可提供的 $+_R, \times_R, \square^*_R$ 运算。如下所示

$(a,b,c,d)$ 照例是 $2 \times 2$ 矩阵 $M$ 的四个分量。固定下标 $i,j$, 分量 $(M^*)_{ij}$ 的值会等于图 $G$ 中刻画所有 $i$ 到 $j$ 的路径之正则表达式。请读者亲自动手验证：

$$ \begin{aligned} 1 \longrightarrow 1 &: \quad (a+b d^*c)^* \\ 1 \longrightarrow 2 &: \quad (a+b d^*c)^*b d^* \\ 2 \longrightarrow 1 &: \quad d^* c(a+b d^*c)^* \\ 2 \longrightarrow 2 &: \quad d^* + d^*c(a+b d^*c)^*b d^* \\ \end{aligned} $$

简单起见记 $\alpha = (a+b d^*c)^* \in R$, 我们毫不费力就得到

$$ M^* \spaces= \begin{pmatrix} \alpha & \alpha b d^* \\ d^* c \alpha & \quad d^* + d^*c \alpha b d^* \end{pmatrix} $$

此处对 $R$ 元素 $\square$ 进行的 Kleene 星运算都可以通过 $\frac{1}{1-\square}$ 具体算出。

fn[R : HasOne + Add + Neg + Inverse] star(x : R) -> R {
  (R::one() + x.neg()).inverse()
}

顺水推舟，我们用 $R$ 上的乘法逆 $\frac{1}{\square}$ 定义了 $R$ 上的 Kleene 星 $\square^*_R$, 然后用 $R$ 上的 Kleene 星 $\square^*_R$ 定义了 $\Mat_{2 \times 2}(R)$ 上的 Kleene 星 $\square^*_{\Mat_{2 \times 2}}$, 最后用 $\Mat_{2 \times 2}(R)$ 上的 Kleene 星 $\square^*_{\Mat_{2 \times 2}}$ 定义了 $\Mat_{2 \times 2}(R)$ 上的乘法逆 $\frac{1}{\square}$. 于是我们有了一个简单的逆矩阵计算实现。

impl[R : StarSemiring] StarSemiring for Mat2x2[R] with star(u : Mat2x2[R]) {
  let (a, b, c, d) = (u.a, u.b, u.c, u.d)
  let dalt = d.star()
  let balt = b * dalt
  let aalt = (a + balt * c).star()
  let calt = dalt * c * aalt
  Mat2x2::mk(aalt, aalt * balt, calt, dalt + calt * balt)
}

fn[R : StarSemiring + Neg] Mat2x2::inverse(u : Mat2x2[R]) -> Mat2x2[R] {
  (Mat2x2::one() + u.neg()).star()
}

Example 4. `Mat2x2[Float]` [float][edit]

kokic
2025-10-29

$\gdef\spaces#1{~ #1 ~}$

作为一个小小的测试，我们用此 MoonBit 程序在 Mat2x2[Float] 中计算 $A^{-1}$, 这里

$$ A \spaces= \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} $$

首先使 Float 成为星半环. 严格来说对 $q=0$ 按 $q^* = \frac{1}{1-q}$ 计算是有问题的，我们实际上需要添加了形式无穷的紧化版本 Float. 这对我们试图接近的理想实数 $\R$ 也是一样，非负扩展实数集 $\R_{\ge 0} \cup \{\infty\}$ 也即 $\R_{\ge 0}$ 的单点紧化连同实数的通常加法和乘法才能构成闭半环。我们不在此进一步展开。

impl HasNil for Float with nil() {
  0
}

impl HasOne for Float with one() {
  1
}

impl Inverse for Float with inverse(x : Float) {
  1 / x
}

impl Semiring for Float

impl StarSemiring for Float with star(x : Float) {
  star(x)
}

现在可以用 Mat2x2::mk(1.F, 2, 3, 4).inverse() 计算 $A^{-1}$ 了：

test "float matrix inverse" {
  assert_eq(Mat2x2::mk(1.F, 2, 3, 4).inverse(), Mat2x2::mk(-2, 1, 1.5, -0.5))
}

Exegesis 5. 回到 Leibniz [derive][edit]

kokic
2025-10-29

为了方便进行符号微分，我们先定义一个表达式类型 Expr, 其数据显然是树状的，所以我们使用 enum 刻画此归纳结构。

enum Expr {
  Symbol(String)
  Const(Float)
  Neg(Expr)
  Add(Expr, Expr)
  Mul(Expr, Expr)
  Div(Expr, Expr)
} derive(Eq, Show)

然后为了打印和借助其他工具简化结果，我们为 Expr 定义一个 to_sexp 方法。

fn Expr::to_sexp(e : Expr) -> String {
  match e {
    Symbol(x) => x
    Const(c) => c.to_string()
    Neg(a) => "(- 0 \{a.to_sexp()})"
    Add(a, b) => "(+ \{a.to_sexp()} \{b.to_sexp()})"
    Mul(a, b) => "(* \{a.to_sexp()} \{b.to_sexp()})"
    Div(a, b) => "(/ \{a.to_sexp()} \{b.to_sexp()})"
  }
}

与例子 Mat2x2[Float] 类似，可以为 Expr 实现星半环特质。

impl HasNil for Expr with nil() {
  Const(0)
}

impl HasOne for Expr with one() {
  Const(1)
}

impl Neg for Expr with neg(x : Expr) {
  Neg(x)
}

impl Inverse for Expr with inverse(x : Expr) {
  Div(Const(1), x)
}

impl Add for Expr with add(a : Expr, b : Expr) {
  Add(a, b)
}

impl Mul for Expr with mul(a : Expr, b : Expr) {
  Mul(a, b)
}

impl Semiring for Expr

impl StarSemiring for Expr with star(x : Expr) {
  star(x)
}

到这一步，进行符号微分所需的全部定义就结束了。以下只是一些简化调用的函数。

fn shear(a : Expr, b : Expr) -> Mat2x2[Expr] {
  Mat2x2::mk(a, b, Expr::nil(), a)
}

let symbol : (String) -> Mat2x2[Expr] = name => shear(Symbol(name), Const(1))

fn function(name : String) -> Mat2x2[Expr] {
  shear(Symbol(name), Symbol("\{name}'"))
}

let extract : (Mat2x2[Expr]) -> String = u => u.b.to_sexp()

现在，微分的线性性和 Leibniz 律将自然地被 Mat2x2[Expr] 上的加法和乘法导出。

test "leibniz rule" {
  let (f, g) = (function("f"), function("g"))
  assert_eq(extract(f * g), "(+ (* f g') (* f' g))")
}

而 $(f/g)'$ 按 $(f g^{-1})'$ 计算即可。再次强调，此处 f 和 g 的类型是 Mat2x2[Expr], 换言之 $f,g$ 实际上是两个矩阵，$g^{-1}$ 按前文实现的矩阵逆计算。

impl[R : StarSemiring + Neg] Div for Mat2x2[R] with div(
  u : Mat2x2[R],
  v : Mat2x2[R],
) {
  u * v.inverse()
}

test "leibniz rule" {
  assert_eq(
    extract(f * g.inverse()),
    "(+ (* f (* (/ 1 (+ 1 (- 0 (+ (+ 1 (- 0 g)) (* (* (+ 0 (- 0 g')) (/ 1 (+ 1 (- 0 (+ 1 (- 0 g)))))) (+ 0 (- 0 0))))))) (* (+ 0 (- 0 g')) (/ 1 (+ 1 (- 0 (+ 1 (- 0 g)))))))) (* f' (+ (/ 1 (+ 1 (- 0 (+ 1 (- 0 g))))) (* (* (* (/ 1 (+ 1 (- 0 (+ 1 (- 0 g))))) (+ 0 (- 0 0))) (/ 1 (+ 1 (- 0 (+ (+ 1 (- 0 g)) (* (* (+ 0 (- 0 g')) (/ 1 (+ 1 (- 0 (+ 1 (- 0 g)))))) (+ 0 (- 0 0)))))))) (* (+ 0 (- 0 g')) (/ 1 (+ 1 (- 0 (+ 1 (- 0 g))))))))))",
  )
}

读者可以用 egg 或其他符号化简工具简化此处 extract(f * g.inverse()) 的结果，一个可能的形式是：

(+ (* f (* (- g') (pow g -2))) (* (pow g -1) f'))

这当然就是 $(f'g - fg')/g^2 = (f/g)'$. 此方法对多变量情形也完全适用。

test "differentiation" {
  let x = symbol("x")
  let y = symbol("y")
  assert_eq(extract(x), "1")
  assert_eq(extract(x - y), "(+ 1 (- 0 1))")
  assert_eq(
    extract(x / y),
    "(+ (* x (* (/ 1 (+ 1 (- 0 (+ (+ 1 (- 0 y)) (* (* (+ 0 (- 0 1)) (/ 1 (+ 1 (- 0 (+ 1 (- 0 y)))))) (+ 0 (- 0 0))))))) (* (+ 0 (- 0 1)) (/ 1 (+ 1 (- 0 (+ 1 (- 0 y)))))))) (* 1 (+ (/ 1 (+ 1 (- 0 (+ 1 (- 0 y))))) (* (* (* (/ 1 (+ 1 (- 0 (+ 1 (- 0 y))))) (+ 0 (- 0 0))) (/ 1 (+ 1 (- 0 (+ (+ 1 (- 0 y)) (* (* (+ 0 (- 0 1)) (/ 1 (+ 1 (- 0 (+ 1 (- 0 y)))))) (+ 0 (- 0 0)))))))) (* (+ 0 (- 0 1)) (/ 1 (+ 1 (- 0 (+ 1 (- 0 y))))))))))",
  )
}

最后的 extract(x / y) 可以简化为 (+ (pow y -1) (* x (* -1 (pow y -2)))), 即 $\frac{1}{y}-\frac{x}{y^2}$. 读者可以通过为 Expr 类型添加诸如 Cos, Sin, Exp, Log 此类其他构造器从而使本程序支持带有其他函数的微分运算，这点与经典自动微分程序的做法是一致的。完整的 MoonBit 源代码可以在此处查看。

Appendix 6. 特质声明 [traits][edit]

kokic
2025-10-29

trait HasNil {
  nil() -> Self
}

trait HasOne {
  one() -> Self
}

trait AddMonoid: Add + HasNil {}

trait MulMonoid: Mul + HasOne {}

trait Semiring: AddMonoid + MulMonoid {}

trait StarSemiring: Semiring {
  star(Self) -> Self
}

trait Inverse {
  inverse(Self) -> Self
}

Appendix 7. 数学方面的动机 [motive][edit]

kokic
2025-10-29

$\gdef\CC{\mathbb{C}}$

历史上 Mathworks 的创始人 Cleve Moler 曾发现过利用复数的数值微分方法，称为 Complex step 微分法，但其方法也完全适用于符号微分，并且思路和现在称为自动微分的做法完全相同，都是意识到需要分离 $R^R = R \to R$ 和 $\{f' : f \in R^R \}$. 另一方面，光滑函数 $f$ 的 Taylor 级数的有限项截断可视为多项式，所有次数不超过 $n$ 的多项式构成的 $n+1$ 维 $k$-向量空间 $k[x]^{\le n}$, 其基为 $\{1,x,\cdots,x^n\}$, 我们取 $n=1$ 处的截断，此多项式就是 $f$ 的切线，已完整记录 $f$ 导数的信息，这意味着 $f,f'$ 本就是两个维度的对象。

Complex step 微分法依赖这样一条性质：复数域 $\CC = \R[x]/(x^2+1)$ 可以看成维数为 $2$, 典范基为 $\{1,\sqrt{-1}\}$ 的 $\R$-向量空间。而自动微分用的是对偶数或曰抛物复数 $R[x]/(x^2) \cong R^2$. 完全相同的线性代数道理告诉我们使用双曲复数 $\R[x]/(x^2-1) \cong R^2$ 一样可以做到自动微分.

Iwasawa 分解对所有半单李群有效，所以任意实二阶可逆矩阵群 $\operatorname{GL}(2,\R)$ 被分解为 “复数部分”、“双曲部分” 和 “抛物部分”, 并一齐给出了复数的矩阵表示。完全自然地，我们可以考虑对应着抛物部分的对偶数 $R[x]/(x^2)$ 的矩阵表示，并通过这种矩阵表示进行符号微分。顺着这个思路，想必读者就能自行发现本文的方法了。