Let’s further investigate how random restrictions can simplify DNF formulas.

Suppose $f$ is computable by a DNF formula of width $w$ and we apply to it a $\delta$-random restriction with $\delta \ll 1/w$. For each term $T$ in the DNF, one of three things may happen to it under the random restriction. First and by far most likely, one of its literals may be fixed to $\mathsf{False}$, allowing us to delete it. If this doesn’t happen, the second possibility is that all of $T$’s literals are made $\mathsf{True}$, in which case the whole DNF reduces to the constantly $\mathsf{True}$ function. With $\delta \ll 1/w$, this is in turn much more likely than the third possibility, which is that at least one of $T$’s literals is left free, but all the fixed literals are made $\mathsf{True}$. Only in this third case is $T$ not trivialized by the random restriction.

This reasoning might suggest that $f$ is likely to become a constant function under the random restriction. Indeed this is true, as the following theorem shows:

Baby Switching LemmaLet $f : \{-1,1\}^n \to \{-1,1\}$ be computable by a DNF or CNF of width at most $w$ and let $(\boldsymbol{J} \mid \boldsymbol{z})$ be a $\delta$-random restriction. Then \[ \mathop{\bf Pr}[f_{\boldsymbol{J} \mid \boldsymbol{z}} \text{ is not a constant function}] \leq 5 \delta w. \]

This is in fact the $k=1$ case of the following much more powerful theorem:

Håstad’s Switching LemmaLet $f : \{-1,1\}^n \to \{-1,1\}$ be computable by a DNF or CNF of width at most $w$ and let $(\boldsymbol{J} \mid \boldsymbol{z})$ be a $\delta$-random restriction. Then for any $k \in {\mathbb N}$, \[ \mathop{\bf Pr}[{\mathrm{DT}}(f_{\boldsymbol{J} \mid \boldsymbol{z}}) \geq k] \leq (5 \delta w)^k. \]

What is remarkable about this result is that it has no dependence on the *size* of the DNF, or on $n$. In words, Håstad’s Switching Lemma says that when $\delta \ll 1/w$, it’s exponentially unlikely (in $k$) that applying a $\delta$-random restriction to a width-$w$ DNF does not convert (“switch”) it to a decision tree of depth less than $k$. The result is called a “lemma” for historical reasons; in fact, its proof requires some work. You are asked to prove the Baby Switching Lemma in the exercises; for Håstad’s Switching Lemma please consult Håstad’s original proof **[Has87]** or the alternate proof of Razborov **[Raz93,Bea94]**.

Since we have strong results about the Fourier spectra of decision trees (Proposition 3.16), and since we know random restrictions interact nicely with Fourier coefficients (Proposition 17), Håstad’s Switching Lemma allows us to prove some strong results about Fourier concentration of narrow DNF formulas. We start with an intermediate result which will be of use:

Lemma 21Let $f : \{-1,1\}^n \to \{-1,1\}$ and let $(\boldsymbol{J} \mid \boldsymbol{z})$ be a $\delta$-random restriction, $\delta > 0$. Fix $k \in {\mathbb N}^+$ and write $\epsilon = \mathop{\bf Pr}[{\mathrm{DT}}(f_{\boldsymbol{J} \mid \boldsymbol{z}}) \geq k]$. Then the Fourier spectrum of $f$ is $3\epsilon$-concentrated on degree up to $3k/\delta$.

*Proof:* The key observation is that ${\mathrm{DT}}(f_{\boldsymbol{J} \mid \boldsymbol{z}}) < k$ implies $\deg(f_{\boldsymbol{J} \mid \boldsymbol{z}}) < k$ (Proposition 3.16), in which case the Fourier weight of $f_{\boldsymbol{J} \mid \boldsymbol{z}}$ at degree $k$ and above is $0$. Since this weight at most $1$ in all cases we conclude \[ \mathop{\bf E}_{(\boldsymbol{J} \mid \boldsymbol{z})}\Bigl[\sum_{\substack{S \subseteq [n] \\ |S| \geq k}} \widehat{f_{\boldsymbol{J} \mid \boldsymbol{z}}}(S)^2 \Bigr] \leq \epsilon. \] Using Proposition 17 we have \[ \mathop{\bf E}_{(\boldsymbol{J} \mid \boldsymbol{z})}\Bigl[\sum_{\substack{S \subseteq [n] \\ |S| \geq k}} \widehat{f_{\boldsymbol{J} \mid \boldsymbol{z}}}(S)^2 \Bigr] = \sum_{\substack{S \subseteq [n] \\ |S| \geq k}} \mathop{\bf E}_{(\boldsymbol{J} \mid \boldsymbol{z})}[\widehat{f_{\boldsymbol{J} \mid \boldsymbol{z}}}(S)^2] = \sum_{U \subseteq [n]} \mathop{\bf Pr}_{(\boldsymbol{J} \mid \boldsymbol{z})}[|U \cap \boldsymbol{J}| \geq k] \cdot \widehat{f}(U)^2. \] The distribution of random variable $|U \cap \boldsymbol{J}|$ is $\text{Binomial}(|U|,\delta)$. When $|U| \geq 3k/\delta$ this random variable has mean at least $3k$, and a Chernoff bound shows $\mathop{\bf Pr}[|U \cap \boldsymbol{J}| < k] \leq \exp(-\frac23 k) \leq 2/3$. Thus \[ \epsilon \geq \sum_{U \subseteq [n]} \mathop{\bf Pr}_{(\boldsymbol{J} \mid \boldsymbol{z})}[|U \cap \boldsymbol{J}| \geq k] \cdot \widehat{f}(U)^2\geq \sum_{|U| \geq 3k/\delta} (1-2/3) \cdot \widehat{f}(U)^2 \] and hence $\sum_{|U| \geq 3k/\delta} \widehat{f}(U)^2 \leq 3\epsilon$ as claimed. $\Box$

We can now improve the dependence on $\epsilon$ in Corollary 8‘s low-degree spectral concentration for DNFs:

Theorem 22Suppose $f : \{-1,1\}^n \to \{-1,1\}$ is computable by a DNF of width $w$. Then $f$’s Fourier spectrum is $\epsilon$-concentrated on degree up to $O(w \log(1/\epsilon))$.

*Proof:* This follows immediately from Håstad’s Switching Lemma and Lemma 21, taking $\delta = \frac{1}{10w}$ and $k = C \log(1/\epsilon)$ for a sufficiently large constant $C$. $\Box$

In Lemma 21, instead of using the fact that depth-$k$ decision trees have no Fourier weight above degree $k$, we could have used the fact that their Fourier $1$-norm is at most $2^k$. As you are asked to show in the exercises, this would yield:

Lemma 23Let $f : \{-1,1\}^n \to \{-1,1\}$ and let $(\boldsymbol{J} \mid \boldsymbol{z})$ be a $\delta$-random restriction. Then \[ \sum_{U \subseteq [n]} \delta^{|U|} \cdot |\widehat{f}(U)| \leq \mathop{\bf E}_{(\boldsymbol{J} \mid \boldsymbol{z})}[2^{{\mathrm{DT}}(f_{\boldsymbol{J} \mid \boldsymbol{z}})}]. \]

We can combine this with the Switching Lemma to deduce that width-$w$ DNFs have small Fourier $1$-norm at low degree:

Theorem 24Suppose $f : \{-1,1\}^n \to \{-1,1\}$ is computable by a DNF of width $w$. Then for any $k$, \[ \sum_{|U| \leq k} |\widehat{f}(U)| \leq 2 \cdot (20w)^k. \]

*Proof:* Apply Håstad’s Switching Lemma to $f$ with $\delta = \frac{1}{20 w}$ to deduce \[ \mathop{\bf E}_{(\boldsymbol{J} \mid \boldsymbol{z})}[2^{{\mathrm{DT}}(f_{\boldsymbol{J} \mid \boldsymbol{z}})}] \leq \sum_{d=0}^\infty \bigl(\tfrac{5}{20}\bigr)^d \cdot 2^d = 2. \] Thus from Lemma 23 we get \[ 2 \geq \sum_{U \subseteq [n]} \bigl(\tfrac{1}{20 w}\bigr)^{|U|} \cdot |\widehat{f}(U)| \geq \bigl(\tfrac{1}{20 w}\bigr)^{k} \cdot \sum_{|U| \leq k} |\widehat{f}(U)|, \] as needed. $\Box$

Our two theorems about the Fourier structure of DNF are *almost* enough to prove Mansour’s Conjecture:

Theorem 25Let $f : \{-1,1\}^n \to \{-1,1\}$ be computable by a DNF of width $w \geq 2$. Then for any $\epsilon \in (0,1/2]$, the Fourier spectrum of $f$ is $\epsilon$-concentrated on a collection $\mathcal{F}$ with $|\mathcal{F}| \leq w^{O(w \log(1/\epsilon))}$.

*Proof:* Let $k = C w \log(4/\epsilon)$ and let $g = f^{\leq k}$. If $C$ is a large enough constant then Theorem 22 tells us that $\|f – g\|_2^2 \leq \epsilon/4$. Furthermore, Theorem 24 gives $\hat{\lVert} g \hat{\rVert}_1 \leq w^{O(w \log(1/\epsilon))}$. By Exercise 3.15, $g$ is $(\epsilon/4)$-concentrated on some collection $\mathcal{F}$ with $|\mathcal{F}| \leq 4\hat{\lVert} g \hat{\rVert}_1^2/\epsilon \leq w^{O(w \log(1/\epsilon))}$. And so by Exercise 3.16, $f$ is $\epsilon$-concentrated on this same collection. $\Box$

For the interesting case of DNFs of width $O(\log n)$ and constant $\epsilon$, we get concentration on a collection of cardinality $O(\log n)^{O(\log n)} = n^{O(\log \log n)}$, nearly polynomial. Using Proposition 9 (and Exercise 3.16) we get the same deduction for DNFs of size $\mathrm{poly}(n)$; more generally, for size $s$ we have $\epsilon$-concentration on a collection of cardinality at most $(s/\epsilon)^{O(\log \log(s/\epsilon) \log(1/\epsilon))}$.

Hi Ryan,

I really like this section.

Maybe it is worth mentioning that the bound in Theorem 22 is tight by an example of Hastad (which I read about in Mansour’s paper).

Yeah, I should… though it would require sitting down with the Mansour proof for a little while, because he is not precise about what range of epsilon his lower bound holds for. (E.g., it can’t be correct for epsilon as small as 1/n.) Tribes should at least be a tight example for any “constant” epsilon; I’ll try to look into it.

I think he needs w*log(1/epsilon) <= n in order for his example to work.

Looks that way; I’ll to verify carefully. Thanks!

Looks that way; I’ll to verify too. Thanks!

Looks that way; I’ll try to verify too. Thanks!