1 First steps in automatic complexity

The Kolmogorov complexity of a finite word $w$ is, roughly speaking, the length of the shortest description $w^{*}$ of $w$ in a fixed formal language. The description $w^{*}$ can be thought of as an optimally compressed version of $w$ . Motivated by the non-computability of Kolmogorov complexity, Shallit and Wang [ 19 ] studied a deterministic finite automaton analogue.

Their notion of automatic complexity is an automata-based and length-conditional analogue of Sipser’s distinguishing complexity $C D$ ( [ 21 ] , [ 15 , Definition 7.1.4 ] ). This was pointed out by Mia Minnes in a review in the Bulletin of Symbolic Logic from 2012. Another precursor is the length-conditional Kolmogorov complexity [ 15 , Definition 2.2.2 ] .

The automatic complexity of Shallit and Wang is the minimal number of states of an automaton accepting only a given word among its equal-length peers. Finding such an automaton is analogous to the protein folding problem where one looks for a minimum-energy configuration. The protein folding problem may be NP-complete [ 3 ] , depending on how one formalizes it as a mathematical problem. For automatic complexity, the computational complexity is not known, but a certain generalization to equivalence relations gives an NP-complete decision problem [ 12 ] .

In this chapter we start to develop the properties of automatic complexity.

1.1 Words

The set of all natural numbers is $N = {0, 1, 2, \dots}$ . Following the von Neumann convention, each natural number $n \in N$ is considered to be the set of its predecessors:

0 = \emptyset, 1 = {0}, and in general n = {0, 1, \dots, n - 1} .

The power set $P (X)$ of a set $X$ is the set of all the subsets of $X$ :

P (X) = {A ∣ A \subseteq X} .

If $Σ$ is an alphabet (a set), a word (sometimes called string) is a sequence of elements of $Σ$ .

For computer implementations it is often convenient to use an alphabet that is an interval $[0, b)$ in $N$ . When we want to emphasize that 0, 1, etc. are playing the role of symbols rather than numbers, we often typeset them as $0, 1$ , etc., respectively.

We denote concatenation of words by $x + + y$ or by juxtaposition $x y$ . In the word $w = x y z$ , $y$ is called a subword or factor of $w$ . Infinite words are denoted in boldface. For example, there is a unique infinite word $w$ such that $w = 0 w$ , and we write $w = 0^{\infty}$ .

Let $⪯$ denote the prefix relation, so that $a ⪯ b$ iff $a$ is a prefix of $b$ , iff there is a word $c$ such that $b = a c$ .

The concatenation of a word $x$ and a symbol $a$ is written $x :; a$ if $a$ is appended on the right, and $a : : x$ if $a$ is appended on the left. It may seem most natural to define $Σ^{*}$ by induction using $x :; a$ , at least for speakers of languages where one reads from left to right. The Lean proof assistant [ 4 ] (version 3) uses $a : : x$ . That approach fits well with co-induction, if infinite words are ordered in order type $N$ [ 13 ] .

For $n \in N$ , $Σ^{n}$ is the set of words of length $n$ over $Σ$ . We may view $σ \in Σ^{n}$ is a function with domain $n$ and range $Σ$ .

We view functions $f : A \to B$ as subsets of the cartesian product $A \times B$ ,

f = {(x, y) ∣ y = f (x)} .

The set $Σ^{n}$ is both the set of functions from $n$ to $Σ$ and the cartesian product $(Σ^{n - 1}) \times Σ$ if $n > 0$ . The empty word is denoted $ε$ and the first symbol in a word $x$ is denoted $x (0)$ .

We can define $Σ^{*} = ⋃_{n \in N} Σ^{n}$ . More properly, the set $Σ^{*}$ is defined recursively by the rule that $ε \in Σ^{*}$ , and whenever $s \in Σ^{*}$ and $a \in Σ$ , then $s :; a \in Σ^{*}$ . We define concatenation by structural induction: for $s, t \in Σ^{*}$ ,

\begin{array}{rcl} t + + ε & = & t, \\ t + + (s :; a) & = & (t + + s) :; a . \end{array}

Definition 1

The length of a word $s \in Σ^{*}$ is defined by induction:

\begin{array}{rcl} | ε | & = & 0 \\ | s :; a | & = & | s | + 1. \end{array}

If $A \subseteq B$ and $f \subseteq B \times C$ then $f ↾ A = {(x, y) \in f ∣ x \in A}$ is the restriction of $f$ to $A$ .

The word $σ$ is also denoted $⟨ σ (0), σ (1), \dots, σ (| σ | - 1) ⟩$ . By convention, instead of $⟨ 0, 1, 0 ⟩$ we write simply $010$ .

Example 2

We have

\begin{array}{rcl} ⟨ 0 ⟩ + + ⟨ 1, 0 ⟩ & = & ⟨ 0, 1, 0 ⟩ \\ = & 0 : : ⟨ 1, 0 ⟩ \\ = & ⟨ 0, 1 ⟩ :; ⟨ 0 ⟩ . \end{array}

1.1.1 Occurrences and powers

In this subsection we state some results from the subject “combinatorics on words” that will be used frequently.

The statement $occurs (x, k, y)$ that $x$ occurs in position $k$ within $y$ may be defined by induction on $n$ :

\begin{array}{rcl} occurs (x, 0, y) & ⟺ & \exists z, x + + z = y, \\ occurs (x, n + 1, y) & ⟺ & \exists a \in Σ, occurs (a : : x, n, y) . \end{array}

Example 3

We have $occurs (na, 2, banana)$ and $occurs (na, 4, banana)$ .

The number of occurrences can be defined, without defining a notion of “occurrence”, as the cardinality of ${k \in N : occurs (x, k, y)}$ . To define disjoint occurrences, so we should have a notion of “occurrence”.

Definition 4

Two occurrences of words $a$ (starting at position $i$ ) and $b$ (starting at position $j$ ) in a word $x$ are disjoint if $x = u a v b w$ where $u, v, w$ are words and $| u | = i$ , $| u a v | = j$ .

Type-theoretically [ 2 ] we may say that an occurrence of $x$ in $y$ is a pair $(k, h)$ where $h$ is a proof that $occurs (x, k, y)$ . Of course, we could also say that the occurrence is simply the number $k$ , or even the triple $(x, k, y)$ but in that case the object does not have its defining property within it, so to speak: the number $k$ , and the triple $(x, k, y)$ , exist even when $x$ does not occur at position $k$ in $y$ .

To naturally speak of disjoint occurrences we make Definition 5. Note that we primarily use zero-based words in this book, i.e., other things being equal we prefer to call the first letter of a word $x_{0}$ , rather than $x_{1}$ .

Definition 5

A word $x = x_{0} \dots x_{n - 1}$ , or an infinite word $x = x_{0} x_{1} \dots$ , with each $x_{i} \in Σ$ , is viewed as a function $f_{x} : n \to Σ$ with $f_{x} (i) = x_{i}$ . An occurrence of $y = y_{0} \dots y_{m - 1}$ in $x$ is a function $f_{x} ↾ [a, a + m - 1]$ such that $f_{x} (a + i) = y_{i}$ for each $0 \leq i < m$ .

For $a, b \in N$ , let $[a, b] = {x \in N ∣ a \leq x \leq b}$ . Two occurrences $f_{x} ↾ [a, b]$ , $f_{x} ↾ [c, d]$ are disjoint if $[a, b] \cap [c, d] = \emptyset$ .

If moreover $[a, b + 1] \cap [c, d] = \emptyset$ then the occurrences are strongly disjoint.

A subword that occurs at least twice in a word $w$ is a repeated subword of $w$ . Let $k \in N$ , $k \geq 1$ . A word $x = x_{0} \dots x_{n - 1}$ , $x_{i} \in Σ$ , is $k$ -rainbow if it has no repeated subword of length $k$ : there are no $0 \leq i < j < n - k$ with $x ↾ [i, i + k - 1] = x ↾ [j, j + k - 1]$ . A 1-rainbow word is also known simply as rainbow.

In particular, the empty word $ε$ occurs everywhere in every word. However, each word has exactly one occurrence of $ε$ , since all empty functions are considered to be equal (in set theory, at any rate).

Having properly defined “occurrence” in Definition 5, we can state and prove the trivial Lemma 6.

Lemma 6

Suppose $n, t$ are positive integers with $t \leq n + 1$ . A word of length $n$ has $n + 1 - t$ occurrences of subwords of length $t$ .

Proof ▶

Let $x$ be a word of length $n$ . The occurrences of subwords of length $t$ are

f_{x} ↾ [0, t - 1], f_{x} ↾ [1, t], \dots, f_{x} ↾ [n - t, n - t + (t - 1)] .

Theorem 7

Let $k, t \in N$ with $k \geq 1$ . In an alphabet of cardinality $k$ , a $t$ -rainbow word has length at most $k^{t} + t - 1$ .

Proof ▶

There are $k^{t}$ words of length $t$ . Thus, by Lemma 6, for a $t$ -rainbow word of length $n$ we have $n + 1 - t \leq k^{t}$ .

Theorem 7 has a converse, Theorem 9. To prove it we shall require the notion of a de Bruijn word.

Definition 8

A de Bruijn word of order $n$ over an alphabet $Σ$ is a sequence $y$ such that every $x \in Σ^{n}$ occurs exactly once as a cyclic substring of $y$ .

Theorem 9

Let $k, t \in N$ with $k \geq 1$ . In an alphabet of cardinality $k$ , there exists a $t$ -rainbow word of length $k^{t} + t - 1$ .

Proof ▶

Case $t = 0$ : indeed, the empty word is a 0-rainbow word of length 0. Case $t = 1$ : A 1-rainbow word of length $k$ exists, namely any permutation of the symbols in $Σ$ (Definition 5). For $t > 1$ , let $x$ be a de Bruijn word $B (k, t)$ of length $k^{t}$ and let $w = x^{2} ↾ (k^{t} + t - 1)$ .

Definition 10

Let $α$ be a word of length $n$ , and let $α_{i}$ be the $i^{th}$ letter of $α$ for $1 \leq i \leq n$ . We define the $u^{th}$ power of $α$ for certain values of $u \in Q_{\geq 0}$ (the set of nonnegative rational numbers) as follows. For $u \in N$ :

\begin{array}{rcl} α^{0} & = & ε, \\ α^{n + 1} & = & α^{n} + + α . \end{array}

If $u = v + k / n$ where $0 < k < n$ , and $k$ is an integer, then $α^{u}$ denotes $α^{v} α_{1} \dots α_{k}$ and is called a $u$ -power.

The word $w$ is $u$ -power-free if no nonempty $v$ -power, $v \geq u$ , occurs (Definition 5) in $w$ . In particular, 2-power-free is called square-free and 3-power-free is called cube-free. Let $w$ be an infinite word over the alphabet $Σ$ , and let $x$ be a finite word over $Σ$ . Let $u > 0$ be a rational number. The word $x$ is said to occur in $w$ with exponent $u$ if $x^{u}$ occurs in $w$ (Definition 5).

The reader may note that the definition of $u$ -power-free is perhaps not obvious. For example, the word $a b a$ is not $1.49$ -power-free: while it contains no $1.49$ -power, it contains a $1.5$ -power. This way of defining things enables Theorem 12 and goes back at least to Krieger [ 14 , page 71 ] .

Definition 11

The critical exponent $ce (w)$ of an infinite word $w$ is defined by

ce (w) = sup {α \in Q ∣ w contains some α -power} .

Theorem 12 Krieger [ 14 ]

The critical exponent of $w$ is equal to

inf {α \in Q ∣ w is α -power-free} .

Proof ▶

Let

\begin{array}{rcl} S & = & {α \in Q ∣ w is α -power-free}, \\ T & = & {α \in Q ∣ w contains some α -power} . \end{array}

Paying careful attention to Definition 10, the word $w$ is $α$ -power-free iff for all $β \geq α$ , $w$ contains no $β$ -power. Therefore, $S$ is upward closed. On the other hand, $T$ is an upward dense subset of the complement of $S$ . Therefore, $ce (w) = sup T = inf S$ .

As an example of Definition 10, we have $0110^{3 / 2} = 011001$ . Note that the expected Power Rule for Exponents fails for word exponentiation. In general, $(x^{a})^{b} \neq x^{a b}$ , for instance

(01)^{3} = 010101 \neq 010010 = ((01)^{3 / 2})^{2} .

Fix an alphabet $Σ$ , and let $Σ^{+}$ denote the set of nonempty words over $Σ$ .

Lemma 13 Lyndon and Schützenberger [ 16 ] ; see [ 18 , Theorem 2.3.2 ]

Let $c, d, e \in Σ^{+}$ . The equation $c d = d e$ holds iff there exist a nonempty word $u$ , a word $v$ , and a natural number $p$ such that $c = u v$ , $d = (u v)^{p} u$ , and $e = v u$ .

Theorem 14 Lyndon and Schützenberger [ 16 ] ; see [ 18 , Theorem 2.3.3 ]

Let $x, y \in Σ^{+}$ . Then the following four conditions are equivalent:

$x y = y x$ .
There exists $z \in Σ$ and integers $k, l > 0$ such that $x = z^{k}$ and $y = z^{l}$ .
There exist integers $i, j > 0$ such that $x^{i} = y^{j}$ .

1.2 Automata

In Definition 15, some basic objects of study in this book are introduced.

Definition 15

Let $Σ$ be a finite alphabet and let $Q$ be a finite set whose elements are called states. A nondeterministic finite automaton (NFA) is a 5-tuple $M = (Q, Σ, δ, q_{0}, F) .$ The transition function $δ : Q \times Σ \to P (Q)$ maps each $(q, b) \in Q \times Σ$ to a subset of $Q$ . Within $Q$ we find the initial state $q_{0} \in Q$ and the set of final states $F \subseteq Q$ . The function $δ$ is extended to a function $δ^{*} : Q \times Σ^{*} \to P (Q)$ by structural induction:

\begin{array}{rcl} δ^{*} (q, ε) & = & {q}, \\ δ^{*} (q, σ :; i) & = & ⋃_{s \in δ^{*} (q, σ)} δ (s, i) . \end{array}

Overloading notation, we may also write $δ = δ^{*}$ . The language accepted by $M$ is

L (M) = {x \in Σ^{*} ∣ δ (q, x) \cap F \neq \emptyset} .

A deterministic finite automaton (DFA) is also a 5-tuple $M = (Q, Σ, δ, q_{0}, F) .$ In this case, $δ : Q \times Σ \to Q$ is a total function and is extended to $δ^{*} : Q \times Σ^{*} \to Q$ by

δ^{*} (q, σ :; i) = δ (δ^{*} (q, σ), i) .

If the domain of $δ$ is a subset of $Q \times Σ$ , $M$ is an incomplete DFA. In this case, $M$ coincides with an NFA with a special property which we can check for effectively. We denote a partial function $f$ from $A$ to $B$ by $f : (\subseteq A) \to B$ or $f ⋮ A \to B$ .

Finally, the set of words accepted by $M$ is

L (M) = {x \in Σ^{*} ∣ δ (q, x) \in F} .

Automata may be viewed as an instance of graph theory, where the automaton is a directed graph (digraph) with labeled edges, a state is a vertex and a transition is an edge. This point of view we introduce now, and return to frequently.

Definition 16

A digraph $D = (V, E)$ consists of a set of vertices $V$ and a set of edges $E \subseteq V^{2}$ . Let $s, t \in V$ . Let $n \geq 0$ , $n \in Z$ . A walk of length $n$ from $s$ to $t$ is a function $Δ : {0, 1, \dots, n} \to V$ such that $Δ (0) = s$ , $Δ (n) = t$ , and $(Δ (k), Δ (k + 1)) \in E$ for each $0 \leq k < n$ .

A cycle of length $n = | Δ | \geq 1$ in $D$ is a walk from $s$ to $s$ , for some $s \in V$ , such that $Δ (t_{1}) = Δ (t_{2}), t_{1} \neq t_{2} ⟹ {t_{1}, t_{2}} = {0, n}$ . Two cycles are disjoint if their ranges are disjoint.

Equation 1 may be viewed as a closure property: if $(q, σ, r) \in δ^{*}$ then $(q, σ :; i, δ (r, i)) \in δ^{*}$ . Formally, then, $δ^{*}$ is the intersection of all functions that contain $δ$ (viewing symbols as length-1 strings) and is closed under this closure property. Using the fact that $N$ is well-ordered and is the range of the length function on strings, we can define $G (n)$ to be $δ^{*}$ restricted to words of length $n$ , and then $G (n)$ is defined from $G ↾ n$ by $G (n) = F (n, G ↾ n)$ , where

F (n, g) = {\begin{cases} δ, & n = 0, \\ {(q, σ :; i, δ (r, i)) ∣ (q, σ, r) \in g (n - 1)}, & n > 0, n - 1 \in dom (g), \\ \emptyset, & n > 0, n - 1 \notin dom (g) . \end{cases}

Note that $(G ↾ n) (n - 1) = G (n - 1)$ so that the third clause does not occur in our application. Since $G ↾ 0$ is the empty function, we need to carve out the special case $n = 0$ .

Theorem 17 is the special case of the wellorder $(N, <)$ from Schimmerling [ 17 , Theorem 3.8 ] . We apply it with $B = {f ∣ f ⋮ Q \times Σ^{*} \to Q}$ .

Theorem 17

Let $B$ be a set and let $P$ be the set of all partial functions from $N$ to $B$ . For each $F : N \times P \to B$ , there is a unique function $G : N \to B$ such that

G (n) = F (n, G ↾ n)

for all $n \in N$ .

Remark 18

Sipser, 2nd edition [ 20 ] does not introduce $δ^{*}$ at all but discusses everything in terms of a sequence of values of $δ$ . Shallit [ 18 ] , and Hopcroft and Ullman [ 5 ] , introduce $δ^{*}$ but do not give a justification for its existence.

Definition 19

Let $Σ$ be an alphabet. Let DFA $_{Σ}$ denote the class of all DFAs over $Σ$ and let partDFA $_{Σ}$ denote the class of all partial DFAs over $Σ$ . If a DFA $M$ over $Σ$ has the property that it accepts a string $x$ , but no other strings of length $| x |$ , i.e.,

L (M) \cap Σ^{| x |} = {x},

then we say that $M$ accepts $x$ uniquely. Let $x \in Σ^{*}$ with $| x | = n$ . Define $A (x, Σ)$ , the automatic complexity of $x$ (with respect to $Σ$ ) to be the least number of states in any $M \in {DFA}_{Σ}$ such that $M$ accepts $x$ uniquely.

If we consider $x$ to be a function $x : [n] \to Σ$ , then $x \in Σ^{*}$ where $Σ = range (x)$ . We define $\tilde{A} (x) = A (x, range (x))$ .

Similarly, we define $A^{-} (x, Σ)$ , the partial automatic complexity of $x$ , to be the least number of states in any $M \in {partDFA}_{Σ}$ such that $L (M) \cap Σ^{n} = {x}$ .

In most cases below, if we write $A (x)$ we intend $\tilde{A} (x)$ .

Theorem 20

Let $x$ be a word in a finite alphabet $Σ$ . We have $A^{-} (x) \leq A (x)$ . If $| x | = 1$ and $| Σ | > 1$ then $A^{-} (x) = 1 < 2 = A (x)$ .

Proof ▶

The inequality $A^{-} (x) \leq A (x)$ follows from the inclusion ${DFA}_{Σ} \subseteq {partDFA}_{Σ}$ . Assume $| x | = 1$ and $| Σ | > 1$ . Without much loss of generality, $x = 0$ and $Σ = {0, 1, 2}$ . The witnessing partial DFA for $A^{-} (x)$ is shown in 5, and the witnessing DFA for $A (x, Σ)$ is shown in 6. □

Misplaced &

Definition 21 [ 7 , 19 ]

Let $L (M)$ be the language recognized by the automaton $M$ . Let $x$ be a finite word. The (unique-acceptance) nondeterministic automatic complexity $A_{N} (w) = A_{N u} (w)$ of $w$ is the minimum number of states of an NFA $M$ such that $M$ accepts $w$ and the number of walks along which $M$ accepts words of length $| w |$ is 1.

The exact-acceptance nondeterministic automatic complexity $A_{N e} (w)$ of $w$ is the minimum number of states of an NFA $M$ such that $M$ accepts $w$ and $L (M) \cap Σ^{| w |} = {w}$ .