Having derived strong results about the Fourier spectrum of small DNFs and CNFs, we will now extend to the case of *constant-depth circuits*. We begin by describing how Håstad applied his Switching Lemma to constant-depth circuits. We then describe some Fourier-theoretic consequences coming from a very early (1989) work in analysis of boolean functions by Linial, Mansour, and Nisan (LMN).

To define constant-depth circuits it is best to start with a picture. Here is an example of a depth-$3$ circuit:

This circuit computes the function \[ x_1x_2\ \wedge\ (\overline{x}_1 x_3\ \vee\ x_3 x_4)\ \wedge\ (x_3 x_4\ \vee\ \overline{x}_2), \] where we suppressed the $\wedge$ in concatenated literals. To be precise:

Definition 26For an integer $d \geq 2$, we define adepth-$d$ circuitover boolean variables $x_1, \dots, x_n$ as follows: It is a directed acyclic graph in which the nodes (“gates”) are arranged in $d+1$ layers, with all arcs (“wires”) going from layer $j-1$ to layer $j$ for some $j \in [d]$. There are exactly $2n$ nodes in layer $0$ (the “inputs”) and exactly $1$ node in layer $d$ (the “output”). The nodes in layer $0$ are labelled by the $2n$ literals. The nodes in layers $1$, $3$, $5$, etc. have the same label, either $\wedge$ or $\vee$, and the nodes in layers $2$, $4$, $6$, etc. have the other label. Each node “computes” a function $\{-1,1\}^n \to \{-1,1\}$: the literals compute themselves and the $\wedge$ (respectively, $\vee$) nodes compute the logical AND (respectively, OR) of the functions computed by their incoming nodes. The circuit itself is said to compute the function computed by its output node.

In particular, DNFs and CNFs are depth-$2$ circuits. We extend the definitions of size and width appropriately:

Definition 27Thesizeof a depth-$d$ circuit is defined to be the number of nodes in layers $1$ through $d-1$. Itswidthis the maximum in-degree of any node at layer $1$. (As with DNFs and CNFs, we insist that no node at layer $1$ is connected to a variable or its negation more than once.)

The layering we assume in our definition of depth-$d$ circuits can be achieved with a factor-$2d$ size overhead for any “unbounded fan-in AND/OR/NOT circuit”. We will not discuss any other type of boolean circuit in this chapter.

We now show that Håstad’s Switching Lemma can be usefully applied not just to DNFs and CNFs but more generally to constant-depth circuits:

Lemma 28Let $f : \{-1,1\}^n \to \{-1,1\}$ be computable by a depth-$d$ circuit of size $s$ and width $w$, and let $\epsilon \in (0,1]$. Set \[ \delta = \frac{1}{10w} \left(\frac{1}{10k}\right)^{d-2}, \quad \text{where } k = \log(2s/\epsilon). \] Then if $(\boldsymbol{J} \mid \boldsymbol{z})$ is a $\delta$-random restriction, $\mathop{\bf Pr}[{\mathrm{DT}}(f_{\boldsymbol{J} \mid \boldsymbol{z}}) \geq \log(2/\epsilon)] \leq \epsilon$.

*Proof:* The $d = 2$ case is immediate from Håstad’s Switching Lemma, so we assume $d \geq 3$.

The first important observation is that random restrictions “compose”. I.e., making a $\delta_1$-random restriction followed by a $\delta_2$-random restriction to the free coordinates is equivalent to making a $\delta_1\delta_2$-random restriction. Thus we can think of $(\boldsymbol{J} \mid \boldsymbol{z})$ as being produced as follows:

- make a $\frac{1}{10w}$-random restriction;
- make $d-3$ subsequent $\frac{1}{10k}$-random restrictions;
- make a final $\frac{1}{10k}$-random restriction.

Without loss of generality, assume the nodes at layer $2$ of the circuit are labelled $\vee$. Thus any node $g$ at layer $2$ computes a DNF of width at most $w$. By Håstad’s Switching Lemma, after the initial $\frac{1}{10w}$-random restriction $g$ can be replaced by a decision tree of depth at most $k$ except with probability at most $2^{-k}$. In particular, it can be replaced by a CNF of width at most $k$, using Proposition 5. If we write $s_2$ for the number of nodes at layer $2$, a union bound lets us conclude: \begin{equation} \label{eqn:bad-switch} \mathop{\bf Pr}_{\substack{\frac{1}{10w}\text{-random} \\\text{restriction}}}[\text{not all nodes at layer $2$ replaceable by width-$k$ CNFs}] \leq s_2 \cdot 2^{-k}. \end{equation}

We now come to the second important observation: if all nodes at layer $2$ can be switched to width-$k$ CNFs then layers $2$ and $3$ can be “compressed”, producing a depth-$(d-1)$ circuit of width at most $k$. More precisely, we can form an equivalent circuit by shortening all length-$2$ paths from layer $1$ to layer $3$ into single arcs, and then deleting the nodes at layer $2$. We give an illustration of this in the following figure:

Assuming the event in \eqref{eqn:bad-switch} does not occur, the initial $\frac{1}{10w}$-random restriction reduces the circuit to having depth-$(d-1)$ and width at most $k$. The number of $\wedge$-nodes at the new layer $2$ is at most $s_3$, the number of nodes at layer $3$ in the

*original*circuit.

Next we make a $\frac{1}{10k}$-random restriction. As before, by Håstad’s Switching Lemma this reduces all width-$k$ CNFs at the new layer $2$ to depth-$k$ decision trees (hence width-$k$ DNFs), except with probability at most $s_3 \cdot 2^{-k}$. We may then compress layers and reduce depth again.

Proceeding for all $\frac{1}{10k}$-random restrictions except the final one, a union bound gives \begin{multline*} \smash{\mathop{\bf Pr}_{\substack{\frac{1}{10w}\left(\frac{1}{10k}\right)^{d-3}\text{-random} \\\text{restriction}}}[\text{circuit does not reduce to depth $2$ and width $k$}]} \\ \leq s_2 \cdot 2^{-k} + s_3 \cdot 2^{-k} + \cdots + s_{d-1} \cdot 2^{-k} \leq s \cdot 2^{-k} = \epsilon/2. \end{multline*} Assuming the event above does not occur, Håstad’s Switching Lemma tells us that the final $\frac{1}{10k}$-random restriction reduces the circuit to a decision tree of depth less than $\log(2/\epsilon)$ except with probability at most $\epsilon/2$. This completes the proof. $\Box$

We may now obtain the main theorem of Linial, Mansour, and Nisan:

LMN TheoremLet $f : \{-1,1\}^n \to \{-1,1\}$ be computable by a depth-$d$ circuit of size $s > 1$ and let $\epsilon \in (0, 1/2]$. Then $f$’s Fourier spectrum is $\epsilon$-concentrated up to degree $O(\log(s/\epsilon))^{d-1} \cdot \log(1/\epsilon)$.

*Proof:* If the circuit for $f$ also had width at most $w$, we could deduce $3\epsilon$-concentration up to degree $30w \cdot (10 \log(2s/\epsilon))^{d-2} \cdot \log(2/\epsilon)$ by combining Lemma 28 with Lemma 21. But if we simply delete all layer-$1$ nodes of width at most $\log(s/\epsilon)$, the resulting circuit computes a function which is $\epsilon$-close to $f$, as in the proof of Proposition 9. Thus (using Exercise 3.16) $f$’s spectrum is $O(\epsilon)$-concentrated up to degree $O(\log(2s/\epsilon))^{d-1} \cdot \log(2/\epsilon)$, and the result follows by adjusting constants. $\Box$

Remark 29Håstad[Has01a]has slightly sharpened the degree in the LMN Theorem to $O(\log(s/\epsilon))^{d-2} \cdot \log(s) \cdot \log(1/\epsilon)$.

In the exercises you are asked to use a simpler version of this proof, along the lines of Theorem 20, to show the following:

Theorem 30Let $f : \{-1,1\}^n \to \{-1,1\}$ computable by a depth-$d$ circuit of size $s$. Then $\mathbf{I}[f] \leq O(\log s)^{d-1}$.

These rather strong Fourier concentration results for constant-depth circuits have several applications. By introducing the Low-Degree Algorithm for learning, Linial–Mansour–Nisan gave as their main application:

Theorem 31Let $\mathcal{C}$ be the class of functions $f : \{-1,1\}^n \to \{-1,1\}$ computable depth-$d$ $\mathrm{poly}(n)$-size circuits. Then $\mathcal{C}$ can be learned from random examples with error any $\epsilon = 1/\mathrm{poly}(n)$ in time $n^{O(\log n)^d}$.

In complexity theory the class of poly-size, constant-depth circuits is referred to as $\mathsf{AC}^0$. Thus the above theorem may be summarized as “$\mathsf{AC}^0$ is learnable in quasipolynomial time”. In fact, under a strong enough assumption about the intractability of factoring certain integers, it is known that quasipolynomial time is *required* to learn $\mathsf{AC}^0$ circuits, even with query access. **[Kha93]**

The original motivation of the line of work leading to Håstad’s Switching Lemma was to show that the parity function $\chi_{[n]}$ cannot be computed in $\mathsf{AC}^0$. Håstad even showed that $\mathsf{AC}^0$ cannot even approximately compute parity. We can derive this result from the LMN Theorem:

Corollary 32Fix any constant $\epsilon_0 > 0$. Suppose $C$ is a depth-$d$ circuit over $\{-1,1\}^n$ with $\mathop{\bf Pr}_{{\boldsymbol{x}}}[C({\boldsymbol{x}}) = \chi_{[n]}(x)] \geq 1/2 + \epsilon_0$. Then the size of $C$ is at least $2^{\Omega(n^{1/(d-1)})}$.

*Proof:* The hypothesis on $C$ implies $\widehat{C}([n]) \geq 2\epsilon_0$. The result then follows by taking $\epsilon = 2\epsilon_0^2$ in the LMN Theorem. $\Box$

This corollary is close to being tight, since the parity $\chi_{[n]}$ *can* be computed by a depth-$d$ circuit of size $n2^{n^{1/(d-1)}}$ for any $d \geq 2$ (see the exercises). The simpler result Theorem 30 is often handier for showing that certain functions can’t be computed by $\mathsf{AC}^0$ circuits. For example, we know that $\mathbf{I}[\mathrm{Maj}_n] = \Theta(\sqrt{n})$; hence any constant-depth circuit computing $\mathrm{Maj}_n$ must have size at least $2^{n^{\Omega(1)}}$.

Finally, Linial, Mansour, and Nisan gave an application to cryptography. Informally, a function $f : \{-1,1\}^m \times \{-1,1\}^n \to \{-1,1\}$ is said to be a “pseudorandom function generator with seed length $m$” if, for any efficient algorithm $A$, \[ \left|\mathop{\bf Pr}_{\boldsymbol{s} \sim \{-1,1\}^m}[A(f(\boldsymbol{s}, \cdot)) = \text{"accept"}] – \mathop{\bf Pr}_{\boldsymbol{g} \sim \{-1,1\}^{\{-1,1\}^n}}[A(\boldsymbol{g}) = \text{"accept"}]\right| \leq 1/n^{\omega(1)}. \] Here the notation $A(h)$ means that $A$ has query access to target function $h$, and $\boldsymbol{g} \sim \{-1,1\}^{\{-1,1\}^n}$ means that $\boldsymbol{g}$ is a uniformly random $n$-bit function. In other words, for almost all “seeds” $\boldsymbol{s}$ the function $f(\boldsymbol{s}, \cdot) : \{-1,1\}^n \to \{-1,1\}$ is nearly indistinguishable (to efficient algorithms) from a truly random function. Theorem 30 shows that pseudorandom function generators cannot be computed by $\mathsf{AC}^0$ circuits. To see this, consider the algorithm $A(h)$ which chooses ${\boldsymbol{x}} \sim \{-1,1\}^n$ and $\boldsymbol{i} \in [n]$ uniformly at random, queries $h({\boldsymbol{x}})$ and $h({\boldsymbol{x}}^{\oplus \boldsymbol{i}})$, and accepts if these values are unequal. If $h$ is a uniformly random function, $A(h)$ will accept with probability $1/2$. In general, $A(h)$ accepts with probability $\mathbf{I}[h]/n$. Thus Theorem 30 implies that if $h$ is computable in $\mathsf{AC}^0$ then $A(h)$ accepts with probability at most $\mathrm{polylog}(n)/n \ll 1/2$.

Two minor comment about proof of Lemma 28: 1) $s_2+s_3+…+ s_{d-1} \leq s_1+s_2+…+s_{d-1}=s$ and hence should it not be 2^{-k}.s_2+…+2^{-k}.s_{d-1}

\leq2^{-k}.s ?2) For someone who is referring to the statement of the switching lemma in the previous chapter, the variable k may probably appear somewhat overloaded.

Thanks for the correction! As for k, let me leave it as-is for now, but I’ll think about it: ‘most’ uses of k in the proof of Lemma 28 use the same k as in the Switching Lemma.

Typo: Just after the proof of Corollary 32, isn’t $Range(f)=\{-1,1\}$?

Yes, thanks.