Quantum Finite Automata and the Stinespring Dilation Theorem in Lean
1 Introduction
In this project we formalize parts of the article Unbounded length minimal synchronizing words for quantum channels over qutrits [ 5 ] and notions in quantum information theory.
In particular we formalize the notion of a measure-once one-way general quantum finite automaton (MO-1gQFA) introduced by Li, Qiu, Zou, Li, Wu, Mateus [ 6 ] in 2012. As an alternative to “MO-1gQFA” we refer to these simply as Kraus operator automata. In particular we formalize cut-point languages as in [ 9 ] who call the automata “RT-QFA” (real-time QFA).
The paper addresses two needs or request by other researchers:
Marco David and Jens Palsberg posted a list of Top 100 Quantum Theorems on March 11, 2026, at https://marcodavid.net/top100. We include a matrix version of #54 on their list, the Stinespring Dilation Theorem.
Meiburg, Lessa and Soldati [ 8 ] formalized a result from Hayashi and Yamasaki [ 2 ] in the proof assistant Lean, giving a corrected proof of the Generalized Quantum Stein’s Lemma. The latter is an important result in operational quantum information theory going back to Hiai and Petz [ 3 ] .
In the course of formalizing this monumental result they build a library Lean-QuantumInfo [ 7 ] containing large parts of quantum information theory.
One notable missing and sought-after ingredient in Lean-Quantuminfo is the theory of quantum finite automata. In the project Kraus [ 4 ] we partially remedy this. In particular, we formalize the language of a measure-once real-time QFA [ 9 , 6 ] in terms of Mathlib’s probability mass functions (PMF) and compute some examples related to the Kraus operators in this paper and in Grudka et al.’s paper [ 1 ] . Most results are proved for the two fields \(\mathbb R\) and \(\mathbb C\), encapsulated in Lean’s mathematics library Mathlib4 by the RCLike typeclass.
The list collects 100 important theorems in quantum physics, quantum information and quantum computing. It was inspired by the list of the top 100 mathematical theorems of which Freek Wiedijk tracks the formalization progress (Figure 1).
3 55
The Spectral Theorem
Eigenstates in an infinite square well
The Quantum Harmonic Oscillator
Ehrenfest’s Theorem
Quantum tunneling through a finite potential barrier
Conservation of the probability (4-)current
Eigenstates of a δ-distribution well
Heisenberg Uncertainty Relation
Baker–Campbell–Hausdorff Formula
Trotter Product Formula (Lie–Trotter–Kato)
Simultaneous diagonalization of commuting observables
Noether’s Theorem
Wigner’s Theorem
Stone–von Neumann Theorem
Spectrum of the hydrogen atom (Coulomb potential)
Addition of angular momentum (e.g. Clebsch–Gordan)
Wigner–Eckart Theorem
Power-series spectrum of a perturbed time-independent Hamiltonian
Dyson series for time-dependent perturbation theory
Correctness of the Rayleigh–Ritz method
Correctness of the Born–Oppenheimer approximation
The Adiabatic Theorem
Berry phase (Geometric phase)
Superdense coding
Quantum teleportation with shared entanglement
No-Cloning Theorem / No-Broadcast Theorem / No-Teleportation Theorem
No-Communication Theorem
No-Deleting Theorem
Schrödinger–HJW Theorem (purification of mixed states)
Schmidt decomposition for Hilbert spaces
Bell’s Theorem (QM violates CHSH inequality)
Monogamy of entanglement
Tsirelson bound
Correctness of the Deutsch–Jozsa algorithm
Correctness of Shor’s algorithm
Correctness of Grover’s algorithm
The Bennett–Bernstein–Brassard–Vazirani Theorem (BBBV)
Quantum key distribution by BB84
Impossibility of quantum bit commitment (Mayers–Lo–Chau)
Solovay–Kitaev Theorem
Universality of sets of two-qubit quantum gates
Universality of the Toffoli and the Hadamard
Five two-qubit gates are necessary and sufficient to implement a Toffoli gate
Six two-qubit linear nearest-neighbor gates are necessary and sufficient to implement a Toffoli gate
3n CX gates are necessary to implement a multi-control Toffoli gate with n-1 controls
Adiabatic and gate-based quantum computing are equivalent up to polynomial factors
The Threshold Theorem (quantum fault-tolerance)
Petz recovery map
Choi’s Theorem on completely positive maps
Choi–Jamiołkowski isomorphism (channel-state duality)
Gelfand–Naimark–Segal construction
Gelfand–Naimark Theorem
Krein–Milman Theorem
Stinespring’s Factorization Theorem / Naimark’s Dilation Theorem
Continuous Functional Calculus
Gleason’s Theorem
Holevo’s Theorem
Pusey–Barrett–Rudolph Theorem
Kochen–Specker Theorem
The Spectral Theorem for PVMs
Strong subadditivity of quantum entropy
Entanglement-assisted classical capacity of quantum channels
Quantum state discrimination for two states (Ivanović–Dieks–Peres Limit)
The Lloyd–Shor–Devetak Theorem for quantum channel capacity
Eastin–Knill Theorem
Gottesman–Knill Theorem
Magic state distillation
Correctness of distillation of Bell pairs
1D gapped Hamiltonians have area-law ground states
The Lieb–Robinson bound
Onsager’s solution to the 2D Ising model
MIP* = RE
BQP \(\subseteq \) PP
QIP = PSPACE
Quantum Stein Lemma
Quantum 3-SAT is QMA1-complete
Computing the Jones polynomial at roots of unity is BQP-hard
Uniqueness of 4-dimensional representations of the Clifford algebra (up to unitary equivalence)
Spin-statistics Theorem
Reeh–Schlieder Theorem
Wick’s Theorem
Elitzur’s Theorem
2D TQFTs are equivalent to Frobenius algebras
Chern–Simons theory can compute the Jones polynomial
The CFT central charge is its entanglement entropy
Osterwalder–Schrader Reconstruction Theorem
4D Gaussian free field theory satisfies the Osterwalder-Schrader axioms
Bisognano–Wichmann Theorem
Wood–Spekkens–Bell Theorem
Correctness of block encoding
Correctness of Hamiltonian simulation by Trotterization
Correctness of Hamiltonian simulation by Linear Combination of Unitaries
Correctness of Hamiltonian simulation by Qubitization
Qubit reuse is NP-complete
Correctness of a quantum circuit optimizer
Correctness of the toric code
Correctness of entanglement-assisted stabilizer codes
The Knill–Laflamme conditions
Correctness of approximate circuit synthesis
Correctness of Simon’s algorithm
2 Formalizing QFAs
The following definitions and lemmas were generated by the Gemini 3 agent when asked to translate our Lean code from [ 4 ] into LaTeX and English.
First, operator “sandwiches”.
Let \(R\) be a star-ring with an additive commutative monoid structure. Given a family of matrices \(\{ K_i\} _{i=1}^r \in M_q(R)\), the Kraus application map \(\Phi : M_q(R) \to M_q(R)\) is defined as the completely positive map:
where \(K_i^\dagger \) denotes the conjugate transpose of the \(i\)-th Kraus operator \(K_i\).
Let \(\rho \in M_q(R)\) be a positive semi-definite matrix (\(\rho \ge 0\)). Then for any Kraus operators \(\{ K_i\} \), the result of the map
is also positive semi-definite.
For each \(i\), since \(\rho \ge 0\), the conjugation \(K_i \rho K_i^\dagger \) is positive semi-definite. Because the sum of positive semi-definite matrices is also positive semi-definite, it follows that \(\sum _{i} K_i \rho K_i^\dagger \ge 0\).
Quantum State and Operation Definitions
A set of Kraus operators \(\{ K_i\} _{i=1}^r \subset M_q(\mathbb {C})\) defines a quantum channel (or a trace-preserving map) if they satisfy the completeness relation:
where \(I_q\) is the \(q \times q\) identity matrix.
A set of Kraus operators \(\{ K_i\} _{i=1}^r \subset M_q(\mathbb {C})\) defines a quantum operation if they satisfy:
In this context, \(\le \) denotes the Loewner order, meaning \(I_q - \sum K_i^\dagger K_i\) is a positive semi-definite matrix.
A matrix \(\rho \in M_q(\mathbb {C})\) is a density matrix (or a normalized state) if it satisfies:
\(\rho \ge 0\) (Positive semi-definite)
\(\operatorname {Tr}(\rho ) = 1\) (Normalization)
A matrix \(\rho \in M_q(\mathbb {C})\) is a sub-normalized density matrix if it satisfies:
\(\rho \ge 0\) (Positive semi-definite)
\(\operatorname {Tr}(\rho ) \le 1\)
These often represent states after a non-deterministic measurement.
Let \(\Phi : M_q(\mathbb {C}) \to M_q(\mathbb {C})\) be a map defined by a set of Kraus operators \(\{ K_i\} _{i=1}^r\) satisfying the quantum channel condition \(\sum _i K_i^\dagger K_i = I_q\). Then for any \(\rho \in M_q(\mathbb {C})\), the trace is preserved:
By the linearity and cyclic property of the trace, \(\operatorname {Tr}(\sum _i K_i \rho K_i^\dagger ) = \sum _i \operatorname {Tr}(K_i^\dagger K_i \rho ) = \operatorname {Tr}((\sum _i K_i^\dagger K_i) \rho ) = \operatorname {Tr}(I_q \rho ) = \operatorname {Tr}(\rho )\).
Let \(\Phi \) be a quantum channel defined by Kraus operators \(\{ K_i\} \) such that \(\sum _i K_i^\dagger K_i = I\). If \(\rho \) is a normalized matrix with \(\operatorname {Tr}(\rho ) = 1\), then the output of the channel is also normalized:
Let \(\mathcal{D}_q\) denote the set of \(q \times q\) density matrices (positive semi-definite matrices with unit trace). A quantum channel \(\Phi \) induces a well-defined map:
mapping \(\rho \mapsto \sum _i K_i \rho K_i^\dagger \). This map is well-defined because it preserves both the positive semi-definite property and the unit trace of the input state.
Let \(\Sigma \) be an alphabet and \(\mathcal{K} : \Sigma \to (\text{Fin } r \to M_q(R))\) be a function mapping each symbol to a Kraus family. For a word \(w\) of length \(n\), the iterated application map \(\delta ^* : (\text{Fin } n \to \Sigma ) \to M_q(R) \to M_q(R)\) is defined recursively by:
where \(\Phi _{a}(\sigma ) = \sum _{i \in \text{Fin } r} K_{a,i} \sigma K_{a,i}^\dagger \) is the completely positive map associated with the symbol \(a \in \Sigma \), and \(\text{init}(w)\) denotes the prefix of the word of length \(n-1\).
Let \(\Sigma \) be an alphabet where each symbol \(a \in \Sigma \) is associated with a quantum channel \(\Phi _a\) (satisfying \(\sum _i K_{a,i}^\dagger K_{a,i} = I\)). For any word \(w \in \Sigma ^n\) and any initial density matrix \(\rho \), the state after processing the word, \(\rho _w = \delta ^*(w, \rho )\), satisfies:
\(\rho _w \ge 0\) (Positive Semi-definiteness)
\(\operatorname {Tr}(\rho _w) = 1\) (Trace Preservation)
In other words, \(\delta ^*(w, \cdot )\) is a well-defined map from \(\mathcal{D}_q\) to \(\mathcal{D}_q\).
By induction on the length of the word \(n\):
Base case (\(n=0\)): \(\delta ^*(\epsilon , \rho ) = \rho \), which is a density matrix by hypothesis.
Inductive step: Assume \(\rho _{w'}\) is a density matrix for a word of length \(m\). For a word \(w = w'a\), \(\rho _w = \Phi _a(\rho _{w'})\). Since \(\Phi _a\) is a quantum channel, it preserves both the PSD property and the unit trace of \(\rho _{w'}\).
Proceeding this way, eventually we get to defining positive operator valued measure in terms of a PMF (probability mass function) where the probability of outcome \(e_i\) is the trace of the product of the density matrix and the corresponding pure state:
Let \(\rho \in M_k(\mathbb {C})\) be a density matrix (\(\rho \ge 0\) and \(\operatorname {Tr}(\rho ) = 1\)). Let \(\{ \mathbf{e}_i\} _{i=1}^k\) be the standard basis vectors in \(\mathbb {C}^k\), where \(\mathbf{e}_i\) has a 1 at index \(i\) and 0 elsewhere. The pure state projection associated with outcome \(i\) is defined as:
The resulting probability mass function (PMF) over the set of outcomes \(\{ 1, \dots , k\} \) is defined by the assignment:
Next we turn this into a Bernoulli probability measure by declaring that a specific \(e_{\text{acc}}\) is the accepted subspace:
Let \(\rho \in M_k(\mathbb {R})\) be a density matrix, and let \(\text{acc} \in \{ 1, \dots , k\} \) be the designated acceptance index. The probability distribution over the outcomes \(\{ 0, 1\} \) (Reject, Accept) is given by the PMF:
where \(P_{\text{acc}} = \mathbf{e}_{\text{acc}} \mathbf{e}_{\text{acc}}^\top \) is the projection onto the accepting basis state.
Finally we get to the measure-once language accepted by our quantum finite automaton, with cutpoint 1/2:
Let \(\mathcal{A} = (\Sigma , \mathcal{K}, \rho _0, P_{\text{acc}})\) be a Measure-Once Quantum Finite Automaton. The language accepted by \(\mathcal{A}\) with cutpoint \(\lambda = \frac{1}{2}\) is the set of words:
where:
\(\delta ^*(w, \rho _0)\) is the state after processing word \(w\).
\(P_{\text{acc}} = \mathbf{e}_{\text{acc}} \mathbf{e}_{\text{acc}}^\top \) is the projection onto the accepting state.
The index \(1\) in the probability mass function represents the “Accept” outcome.
A probability mass function on two orthogonal projections \(P, P^{\bot }\), with measure equal to the trace of the density matrix \(\rho \) times the projection.
Transition function \(\delta ^*\) corresponding to a word over an alphabet, where each symbol is mapped to a completely positive map in Kraus form, of rank at most \(r\).
Some other formal lemmas, that we only describe stenographically, include:
A projection-valued measure, with measure equal to the trace of the density matrix \(\rho \) times the projection.
If \(A\) and \(B\) are PSD then \(AB\) has nonnegative trace.
If \(P\) is a projection and \(\rho \) is PSD then the trace of \(P\rho \) is nonnegative.
Projection-valued measure.
Quantum channel.
Kraus operators preserve the PSD property.
An automaton based on a quantum channel maps density matrices to density matrices while reading a single letter.
An automaton based on a quantum channel maps density matrices to density matrices while reading a word.
A basis state density matrix has trace one.
A pure state is PSD.
3 Stinespring dilation
We now switch gears to consider the Stinespring dilation, in matrix form.
Stinespring Dilation and Partial Trace
Let \(\rho \in M_{mn}(\mathbb {C})\) be a density matrix on a composite system \(\mathcal{H}_A \otimes \mathcal{H}_B\). The partial trace over the right subsystem (\(\mathcal{H}_B\)) is the map \(\text{Tr}_B : M_{mn}(\mathbb {C}) \to M_m(\mathbb {C})\) defined by the components:
Given a Kraus family \(\{ K_i\} _{i=1}^r \subset M_m(\mathbb {C})\), the Stinespring operator \(V \in M_{mr \times m}(\mathbb {C})\) is defined as:
where \(\mathbf{e}_i\) are the standard basis vectors of the environment space \(\mathbb {C}^r\).
The Stinespring dilation of a state \(\rho \) relative to the operator \(V\) is the matrix in the enlarged space:
It follows that the original Kraus map is recovered by \(\Phi (\rho ) = \text{Tr}_B(V \rho V^\dagger )\).
Here is the main result towards answering Question #54 of David and Palsberg.
Let \(\{ K_i\} _{i=1}^r \subset M_m(\mathbb {C})\) be a family of Kraus operators and \(\rho \in M_m(\mathbb {C})\) be a density matrix. Let \(V = \sum _{i} K_i \otimes \mathbf{e}_i\) be the Stinespring operator. Then:
where \(\operatorname {Tr}_B\) denotes the partial trace over the environment space \(\mathbb {C}^r\).
By the definition of the Stinespring operator \(V\), we have:
Applying the partial trace \(\operatorname {Tr}_B\) to each term in the sum:
Since \(\operatorname {Tr}(\mathbf{e}_i \mathbf{e}_j^\top ) = \delta _{ij}\) (the Kronecker delta), the double sum collapses to:
which is exactly the Kraus representation \(\Phi (\rho )\).
We end with a sophisticated construction addressing a common problem in quantum information: if you have a Quantum Operation (which is trace-decreasing, \(\sum K_i^\dagger K_i \le I\)), how do you “complete” it into a full Quantum Channel (which is trace-preserving, \(\sum K_j^\dagger K_j = I\))?
Let \(\{ K_i\} _{i=1}^r \subset M_m(\mathbb {C})\) be a family of operators satisfying the trace-non-increasing (TNI) condition \(\sum _i K_i^\dagger K_i \le I\). The orthogonal CPTP completion is the operator \(V_{comp} \in M_{m(r+1) \times m}(\mathbb {C})\) defined block-wise as:
In Lean notation, this is represented by branching on the index \(x.2\): using the Stinespring operator for the first \(r\) indices and the deficit root \(\Delta \) for the last index.
If the initial operators satisfy \(\sum _i K_i^\dagger K_i \le I\), then the completion \(V_{comp}\) is an isometry, i.e.,
4 Conclusion
We have chosen a somewhat concrete approach to formalization of quantum information theory using matrices. This has the advantage that it is reasonably feasible to construct examples (such as those used to obtain synchronizing words in [ 5 ] ). On the other hand, for eventual inclusion in Lean’s Mathlib, a higher level of generality is needed. A natural next step for the Stinespring dilation is to formalize the unitary extension, also known as “going to the church of the larger Hilbert space”. For quantum finite automata there is a lot that can be done including formalizing a version of quantum automatic complexity.
Acknowledgments
This work was partially supported by grants from the Simons Foundation (#704836 to Bjørn Kjos-Hanssen) and Majesco Inc. (University of Hawai‘i Foundation Account #129-4770-4).
Bibliography
- 1
Andrzej Grudka, Marcin Karczewski, Paweł Kurzyński, Jedrzej Stempin, Jan Wójcik, and Antoni Wójcik. Quantum synchronizing words: Resetting and preparing qutrit states, 2025.
- 2
Masahito Hayashi and Hayata Yamasaki. Generalized quantum Stein’s lemma and second law of quantum resource theories, 2025.
- 3
Fumio Hiai and Dénes Petz. The proper formula for relative entropy and its asymptotics in quantum probability. Communications in Mathematical Physics, 143(1):99–114, 1991.
- 4
Bjørn Kjos-Hanssen. Kraus: formalization of quantum finite automata, 2026. Project website (https://bjoernkjoshanssen.github.io/kraus/).
- 5
Bjørn Kjos-Hanssen and Swarnalakshmi Lakshmanan. Unbounded length minimal synchronizing words for quantum channels over qutrits, 2026.
- 6
Lvzhou Li, Daowen Qiu, Xiangfu Zou, Lvjun Li, Lihua Wu, and Paulo Mateus. Characterizations of one-way general quantum finite automata. Theoret. Comput. Sci., 419:73–91, 2012.
- 7
Alex Meiburg. Quantum Information in Lean. https://github.com/Meiburg/Lean-QuantumInfo, 2024.
- 8
Alex Meiburg, Leonardo A. Lessa, and Rodolfo R. Soldati. A formalization of the generalized quantum Stein’s lemma in Lean, 2025.
- 9
Abuzer Yakaryılmaz and A. C. Cem Say. Unbounded-error quantum computation with small space bounds. Inform. and Comput., 209(6):873–892, 2011.