# §6.5: Highlight: Fooling ${\mathbb F}_2$-polynomials

Recall that a density $\varphi$ is said to be $\epsilon$-biased if its correlation with every ${\mathbb F}_2$-linear function $f$ is at most $\epsilon$ in magnitude. In the lingo of pseudorandomness, one says that $\varphi$ fools the class of ${\mathbb F}_2$-linear functions:

Definition 46 Let $\varphi : {\mathbb F}_2^n \to {\mathbb R}^{\geq 0}$ be a density function and let $\mathcal{C}$ be a class of functions ${\mathbb F}_2^n \to {\mathbb R}$. We say that $\varphi$ $\epsilon$-fools $\mathcal{C}$ if $\Bigl|\mathop{\bf E}_{\boldsymbol{y} \sim \varphi}[f(\boldsymbol{y})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \leq \epsilon$ for all $f \in \mathcal{C}$.

Theorem 30 implies that using just $O(\log(n/\epsilon))$ independent random bits, one can generate a density which $\epsilon$-fools the class of $f : {\mathbb F}_2^n \to \{-1,1\}$ with $\deg_{{\mathbb F}_2}(f) \leq 1$. A natural problem in the field of derandomization is: How many independent random bits are needed to generate a density which $\epsilon$-fools all functions of ${\mathbb F}_2$-degree at most $d$? A naive hope might be that $\epsilon$-biased densities automatically fool functions of ${\mathbb F}_2$-degree $d > 1$. The next example shows that this hope fails badly, even for $d = 2$:

Example 47 Recall the inner product mod $2$ function, $\mathrm{IP}_{n} : {\mathbb F}_2^n \to \{0,1\}$, which has ${\mathbb F}_2$-degree $2$. Let $\varphi : {\mathbb F}_2^n \to {\mathbb R}^{\geq 0}$ be the density of the uniform distribution on the support of $\mathrm{IP}_n$. Now $\mathrm{IP}_n$ is an extremely regular function (see Examples 4), and indeed $\varphi$ is a roughly $2^{-n/2}$-biased density (see the exercises). But $\varphi$ is very bad at fooling at least one function of ${\mathbb F}_2$-degree $2$, namely $\mathrm{IP}_n$ itself: $\mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[\mathrm{IP}_n({\boldsymbol{x}})] \approx 1/2, \qquad \mathop{\bf E}_{\boldsymbol{y} \sim \varphi}[\mathrm{IP}_n(\boldsymbol{y})] = 1.$

The problem of using few random bits to fool $n$-bit, ${\mathbb F}_2$-degree-$d$ functions was first taken up by Luby, Veličković and Wigderson [LVW93]. They showed how to generate a fooling distribution using $\exp(O(\sqrt{d \log(n/d) + \log(1/\epsilon)}))$ independent random bits. There was no improvement on this for $14$ years, at which point Bogdanov and Viola [BV07] achieved $O(\log(n/\epsilon))$ random bits for $d = 2$ and $O(\log n) + \exp(\mathrm{poly}(1/\epsilon))$ random bits for $d = 3$. In general they suggested that ${\mathbb F}_2$-degree-$d$ might be fooled by the sum of $d$ independent draws from a small-bias distribution. Soon thereafter Lovett [Lov08] showed that a sum of $2^d$ independent draws from a small-bias distribution suffices, implying that ${\mathbb F}_2$-degree-$d$ functions can be fooled using just $2^{O(d)} \cdot \log(n/\epsilon)$ random bits. More precisely, if $\varphi$ is any $\epsilon$-biased density on ${\mathbb F}_2^n$ he showed that $\Bigl|\mathop{\bf E}_{\boldsymbol{y}^{(1)}, \dots, \boldsymbol{y}^{(2^d)} \sim \varphi}[f(\boldsymbol{y}^{(1)} + \cdots + \boldsymbol{y}^{(2^d)})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \leq O(\epsilon^{1/4^d}).$ In other words, the $2^d$-fold convolution $\varphi^{* 2^d}$ density fools functions of ${\mathbb F}_2$-degree $d$.

The current state of the art for this problem is Viola’s Theorem, which show that the original idea from [BV07] works: summing $d$ independent draws from an $\epsilon$-biased distribution fools ${\mathbb F}_2$-degree-$d$ polynomials.

Viola’s Theorem Let $\varphi$ be any $\epsilon$-biased density on ${\mathbb F}_2^n$, $0 \leq \epsilon \leq 1$. Let $d \in {\mathbb N}^+$ and define $\epsilon_d = 9\epsilon^{1/2^{d-1}}$. Then the class of all $f : {\mathbb F}_2^n \to \{-1,1\}$ with $\deg_{{\mathbb F}_2}(f) \leq d$ is $\epsilon_d$-fooled by the $d$-fold convolution $\varphi^{* d}$. I.e., $\Bigl|\mathop{\bf E}_{\boldsymbol{y}^{(1)}, \dots, \boldsymbol{y}^{(d)} \sim \varphi}[f(\boldsymbol{y}^{(1)} + \cdots + \boldsymbol{y}^{(d)})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \leq 9\epsilon^{1/2^{d-1}}.$

In light of Theorem 30, Viola’s Theorem implies that one can $\epsilon$-fool $n$-bit functions of ${\mathbb F}_2$-degree $d$ using only $O(d\log n) + O(d 2^d \log(1/\epsilon))$ independent random bits.

The proof of Viola’s Theorem is an induction on $d$. To reduce the case of degree $d+1$ to degree $d$, Viola makes use of a simple concept: directional derivatives.

Definition 48 Let $f : {\mathbb F}_2^n \to {\mathbb F}_2$ and let $y \in {\mathbb F}_2^n$. The directional derivative $\Delta_y f : {\mathbb F}_2^n \to {\mathbb F}_2$ is defined by $\Delta_y f(x) = f(x+y) - f(x).$ Over ${\mathbb F}_2$ we may equivalently write $\Delta_y f(x) = f(x+y) + f(x)$.

As expected, taking a derivative reduces degree by $1$:

Fact 49 For any $f : {\mathbb F}_2^n \to {\mathbb F}_2$ and $y \in {\mathbb F}_2^n$ we have $\deg_{{\mathbb F}_2}(\Delta_y f) \leq \deg_{{\mathbb F}_2}(f) -1$.

In fact, we’ll prove a slightly stronger statement:

Proposition 50 Let $f : {\mathbb F}_2^n \to {\mathbb F}_2$ have $\deg_{{\mathbb F}_2}(f) = d$ and fix $y, y’ \in {\mathbb F}_2^n$. Define $g :{\mathbb F}_2^n \to {\mathbb F}_2$ by $g(x) = f(x+y) – f(x+y’)$. Then $\deg_{{\mathbb F}_2}(g) \leq d-1$.

Proof: In passing from the ${\mathbb F}_2$-polynomial representation of $f(x)$ to that of $g(x)$, each monomial $x^S$ of maximal degree $d$ is replaced by $(x+y)^S – (x+y’)^S$. Upon expansion the monomials $x^S$ cancel, leaving a polynomial of degree at most $d-1$. $\Box$

We are now ready to give the proof of Viola’s Theorem.

Proof of Viola’s Theorem: The proof is by induction on $d$. The $d = 1$ case is immediate (even without the factor of $9$) because $\varphi$ is $\epsilon$-biased. Assume that the theorem holds for general $d \geq 1$ and let $f : {\mathbb F}_2^n \to \{-1,1\}$ have $\deg_{{\mathbb F}_2}(f) \leq d+1$. We split into two cases, depending on whether the bias of $f$ is large or small.

Case 1: $\mathop{\bf E}[f]^2 > \epsilon_d$. In this case, \begin{align*} &\quad \sqrt{\epsilon_d} \cdot \Bigl|\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{*(d+1)}}[f(\boldsymbol{z})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \\ &< |\mathop{\bf E}[f]| \cdot \Bigl|\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{*(d+1)}}[f(\boldsymbol{z})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \\ &= \Bigl|\mathop{\bf E}_{{\boldsymbol{x}}’ \sim {\mathbb F}_2^n, \boldsymbol{z} \sim \varphi^{*(d+1)}}[f({\boldsymbol{x}}')f(\boldsymbol{z})] – \mathop{\bf E}_{{\boldsymbol{x}}’, {\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}}')f({\boldsymbol{x}})]\Bigr| \\ &= \Bigl|\mathop{\bf E}_{\boldsymbol{y} \sim {\mathbb F}_2^n, \boldsymbol{z} \sim \varphi^{*(d+1)}}[f(\boldsymbol{z} + \boldsymbol{y})f(\boldsymbol{z})] – \mathop{\bf E}_{\boldsymbol{y}, {\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}} + \boldsymbol{y})f({\boldsymbol{x}})]\Bigr| \\ &= \Bigl|\mathop{\bf E}_{\boldsymbol{y} \sim {\mathbb F}_2^n, \boldsymbol{z} \sim \varphi^{*(d+1)}}[\Delta_{\boldsymbol{y}}f(\boldsymbol{z})] – \mathop{\bf E}_{\boldsymbol{y}, {\boldsymbol{x}} \sim {\mathbb F}_2^n}[\Delta_{\boldsymbol{y}}f({\boldsymbol{x}})]\Bigr| \\ &\leq \mathop{\bf E}_{\boldsymbol{y} \sim {\mathbb F}_2^n}\Bigl[\Bigl|\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{*(d+1)}}[\Delta_{\boldsymbol{y}}f(\boldsymbol{z})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[\Delta_{\boldsymbol{y}}f({\boldsymbol{x}})\Bigr|\Bigr]. \end{align*} For each outcome $\boldsymbol{y}=y$ the directional derivative $\Delta_{y} f$ has ${\mathbb F}_2$-degree at most $d$ (Fact 49). By induction we know that $\varphi^{* d}$ $\epsilon_d$-fools any such polynomial and it follows (exercise) that $\varphi^{* (d+1)}$ does too. Thus each quantity in the expectation over $\boldsymbol{y}$ is at most $\epsilon_d$ and we conclude $\Bigl|\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{*(d+1)}}[f(\boldsymbol{z})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \leq \frac{\epsilon_d}{\sqrt{\epsilon_d}} = \sqrt{\epsilon_d} = \tfrac13 \epsilon_{d+1} \leq \epsilon_{d+1}.$

Case 2: $\mathop{\bf E}[f]^2 \leq \epsilon_d$. In this case we want to show that $\mathop{\bf E}_{\boldsymbol{w} \sim \varphi^{*(d+1)}}[f(\boldsymbol{w})]^2$ is nearly as small. By Cauchy–Schwarz, \begin{multline*} \mathop{\bf E}_{\boldsymbol{w} \sim \varphi^{*(d+1)}}[f(\boldsymbol{w})]^2 = \mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{* d}}\Bigl[\mathop{\bf E}_{\boldsymbol{y} \sim \varphi}[f(\boldsymbol{z}+\boldsymbol{y})]\Bigr]^2 \leq \mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{* d}}\Bigl[\mathop{\bf E}_{\boldsymbol{y} \sim \varphi}[f(\boldsymbol{z}+\boldsymbol{y})]^2\Bigr] \\= \mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{* d}}\Bigl[\mathop{\bf E}_{\boldsymbol{y}, \boldsymbol{y}' \sim \varphi}[f(\boldsymbol{z}+\boldsymbol{y})f(\boldsymbol{z}+\boldsymbol{y}')]\Bigr] = \mathop{\bf E}_{\boldsymbol{y}, \boldsymbol{y}’ \sim \varphi}\Bigl[\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{* d}}[f(\boldsymbol{z}+\boldsymbol{y})f(\boldsymbol{z}+\boldsymbol{y}')]\Bigr]. \end{multline*} For each outcome of $\boldsymbol{y}=y$, $\boldsymbol{y}’=y’$, the function $f(\boldsymbol{z}+y)f(\boldsymbol{z}+y’)$ is of ${\mathbb F}_2$-degree at most $d$ in the variables $\boldsymbol{z}$, by Proposition 50. Hence by induction we have \begin{align*} \mathop{\bf E}_{\boldsymbol{y}, \boldsymbol{y}’ \sim \varphi}\Bigl[\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{* d}}[f(\boldsymbol{z}+\boldsymbol{y})f(\boldsymbol{z}+\boldsymbol{y}')]\Bigr] & \leq \mathop{\bf E}_{\boldsymbol{y}, \boldsymbol{y}’ \sim \varphi}\Bigl[\mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}}+\boldsymbol{y})f({\boldsymbol{x}}+\boldsymbol{y}')]\Bigr] + \epsilon_d\\ &= \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[\varphi * f({\boldsymbol{x}})^2] +\epsilon_d\\ &= \sum_{\gamma \in {\widehat{{\mathbb F}_2^n}}} \widehat{\varphi}(\gamma)^2 \widehat{f}(\gamma)^2 +\epsilon_d\\ &\leq \widehat{f}(0)^2 + \epsilon^2 \sum_{\gamma \neq 0} \widehat{f}(\gamma)^2 + \epsilon_d\\ &\leq 2\epsilon_d + \epsilon^2, \end{align*} where the last step used the hypothesis of Case $2$. We have thus shown $\mathop{\bf E}_{\boldsymbol{w} \sim \varphi^{*(d+1)}}[f(\boldsymbol{w})]^2 \leq 2\epsilon_d + \epsilon^2 \leq 3\epsilon_d \leq 4 \epsilon_d,$ and hence $|\mathop{\bf E}[f(\boldsymbol{w})]| \leq 2 \sqrt{\epsilon_d}$. Since we are in Case 2, $|\mathop{\bf E}[f]| \leq \sqrt{\epsilon_d}$, and so $\Bigl| \mathop{\bf E}_{\boldsymbol{w} \sim \varphi^{*(d+1)}}[f(\boldsymbol{w})] – \mathop{\bf E}[f] \Bigr| \leq 3 \sqrt{\epsilon_d} = \epsilon_{d+1},$ as needed. $\Box$

We end this section by discussing the tightness of parameters in Viola’s Theorem. First, if we ignore the error parameter then the result is sharp: Lovett and Tzur [LT09] showed that the $d$-fold convolution of $\epsilon$-biased densities cannot in general fool functions of ${\mathbb F}_2$-degree $d+1$. More precisely, for any $d \in {\mathbb N}^+$, $\ell \geq 2d+1$ they give an explicit $\frac{\ell}{2^n}$-biased density on ${\mathbb F}_2^{(\ell+1)n}$ and an explicit function $f : {\mathbb F}_2^{(\ell+1)n} \to \{-1,1\}$ of degree $d+1$ for which $\Bigl|\mathop{\bf E}_{\boldsymbol{w} \sim \varphi^{* d}}[f(\boldsymbol{w})] – \mathop{\bf E}[f]\Bigr| \geq 1-\frac{2d}{2^n}.$ Regarding the error parameter in Viola’s Theorem, it is not known whether the quantity $\epsilon^{1/2^{d-1}}$ can be improved, even in the case $d = 2$. However obtaining even a modest improvement to $\epsilon^{1/1.99^d}$ (for $d$ as large as $\log n$) would constitute a major advance since it would imply progress on the notorious problem of “correlation bounds for polynomials” — see [Vio09a].

### 10 comments to §6.5: Highlight: Fooling ${\mathbb F}_2$-polynomials

• Avishay Tal

In definition 48, it seems that it should be written “\Delta_y f(x) = f(x+y)-f(x)”.

• Ryan O'Donnell

Absolutely, thanks.

• Chung Hoi Wong

Hello, seems that in definition 48, all the z’s should be changed to y.

• Ryan O'Donnell

Thanks, fixed.

• Noam Lifshitz

In example 47, “(see )” is written, without any references.

• Ryan O'Donnell

Thanks! It should point to exercise 7 of this chapter.

PS: one more correction and you’ll have caught up to the mighty LYT! Keep them coming • Gautam Kamath

The word “the” is duplicated in the statement of Viola’s Theorem

• Ryan O'Donnell

Thanks!

• Matt Franklin

There might be a small typo in Case 2 of the proof of Viola’s Theorem
(next to last line on p. 152 of the book).
Should there be an extra pair of parentheses around $\phi \star f(x)$

• Ryan O'Donnell

Thanks — I think that kind of parenthesis-free notation for convolution is okay; on the other hand, it can’t really hurt if I add it.