Affine Arithmetic LayerNorm for ML #

This file bridges the ML Transformer module with the Affine Arithmetic module, providing tight bounds for LayerNorm by preserving correlations.

The Problem with Standard Interval Arithmetic #

Standard interval arithmetic loses correlations between variables. In LayerNorm:

μ = mean(x) depends on all x_i
x_i - μ should be small when all inputs are similar
But interval arithmetic treats x_i and μ as independent!

Example: If x_i ∈ [0.9, 1.1] for all i:

Standard: μ ∈ [0.9, 1.1], so x_i - μ ∈ [-0.2, 0.2] (4x overestimate!)
Affine: Tracks that μ comes from the same x_i, giving x_i - μ ≈ 0

Solution: Affine Arithmetic #

Affine forms track symbolic dependencies via noise symbols:

x_i = c₀ + c₁·ε₁ + c₂·ε₂ + ... + [-r, r]
Shared ε_i across variables preserve correlations
When we compute x - mean(x), the common terms cancel!

Main Definitions #

LayerNormParams.forwardAffine - LayerNorm using affine arithmetic
LayerNormParams.forwardIntervalTight - Convert affine result back to intervals

References #

de Figueiredo & Stolfi, "Affine Arithmetic: Concepts and Applications", 2004

Affine LayerNorm Forward Pass #

source

def LeanCert.ML.Transformer.LayerNormParams.forwardAffine (params : LayerNormParams) (Is : IntervalVector) :

Engine.Affine.AffineVector

Forward pass using affine arithmetic for tight bounds.

This converts the input intervals to affine forms, computes LayerNorm while preserving correlations, then extracts the resulting intervals.

Key advantage: The centering step x - μ preserves correlations, giving much tighter bounds than standard interval arithmetic.

Equations

params.forwardAffine Is = (LeanCert.Engine.Affine.AffineVector.ofIntervals (List.map LeanCert.Core.IntervalDyadic.toIntervalRat Is)).layerNorm params.gamma params.beta params.epsilon

Instances For

source

def LeanCert.ML.Transformer.LayerNormParams.forwardIntervalTight (params : LayerNormParams) (Is : IntervalVector) (prec : ℤ := -53) :

IntervalVector

Convert affine output back to intervals.

This extracts conservative interval bounds from the affine forms. The bounds are tight because correlations were preserved during computation.

Equations

params.forwardIntervalTight Is prec = List.map (fun (af : LeanCert.Engine.Affine.AffineForm) => LeanCert.Core.IntervalDyadic.ofIntervalRat af.toInterval prec) (params.forwardAffine Is)

Instances For

Comparison: Interval vs Affine Bounds #

source

def LeanCert.ML.Transformer.LayerNormParams.compareBounds (params : LayerNormParams) (Is : IntervalVector) (prec : ℤ := -53) :

IntervalVector × IntervalVector

Compute both interval and affine bounds for comparison.

Returns (interval_bounds, affine_bounds) for the same input. The affine bounds should be tighter, especially for centering.

Equations

params.compareBounds Is prec = (params.forwardInterval Is prec, params.forwardIntervalTight Is prec)

Instances For

source

def LeanCert.ML.Transformer.LayerNormParams.tightnessRatio (params : LayerNormParams) (Is : IntervalVector) (prec : ℤ := -53) :

List ℚ

Measure the tightness improvement from affine arithmetic.

Returns the ratio of interval width to affine width for each output dimension. Values > 1 indicate affine is tighter.

Equations

One or more equations did not get rendered due to their size.

Instances For

Soundness Theorem #

source

theorem LeanCert.ML.Transformer.mem_forwardAffine {xs : List ℝ} {Is : IntervalVector} (params : LayerNormParams) (eps : Engine.Affine.AffineForm.NoiseAssignment) (hvalid : Engine.Affine.AffineForm.validNoise eps) (hmem : Engine.Affine.AffineVector.mem xs (Engine.Affine.AffineVector.ofIntervals (List.map Core.IntervalDyadic.toIntervalRat Is)) eps) (hne : ¬List.isEmpty (Engine.Affine.AffineVector.ofIntervals (List.map Core.IntervalDyadic.toIntervalRat Is)) = true) (hlen_eps : List.length eps = ((Engine.Affine.AffineVector.ofIntervals (List.map Core.IntervalDyadic.toIntervalRat Is)).variance.add (Engine.Affine.AffineForm.const params.epsilon)).coeffs.length) (hlen_sqrt : List.length eps = ((Engine.Affine.AffineVector.ofIntervals (List.map Core.IntervalDyadic.toIntervalRat Is)).variance.add (Engine.Affine.AffineForm.const params.epsilon)).sqrt.coeffs.length) (hsqrt_pos : 0 < ((Engine.Affine.AffineVector.ofIntervals (List.map Core.IntervalDyadic.toIntervalRat Is)).variance.add (Engine.Affine.AffineForm.const params.epsilon)).sqrt.toInterval.lo) :

have ys := layerNormReal xs params.gamma params.beta params.epsilon; Engine.Affine.AffineVector.mem ys (params.forwardAffine Is) eps

The affine LayerNorm is sound: if inputs are in the affine forms, then outputs are in the resulting affine forms.

This follows from composition of:

AffineVector.mem_centered - centering preserves membership
AffineVector.mem_variance - variance is sound
AffineForm.mem_add - addition is sound
AffineForm.mem_sqrt - square root is sound
AffineForm.mem_inv - inversion is sound
AffineForm.mem_mul, AffineForm.mem_scale, AffineForm.mem_add - final combination

The proof requires additional hypotheses about positivity of variance + epsilon and compatibility of lengths, which are handled in the implementation.