Recall that a density $\varphi$ is said to be $\epsilon$-biased if its correlation with every ${\mathbb F}_2$-linear function $f$ is at most $\epsilon$ in magnitude. In the lingo of pseudorandomness, one says that $\varphi$ fools the class of ${\mathbb F}_2$-linear functions:
Definition 46 Let $\varphi : {\mathbb F}_2^n \to {\mathbb R}^{\geq 0}$ be a density function and let $\mathcal{C}$ be a class of functions ${\mathbb F}_2^n \to {\mathbb R}$. We say that $\varphi$ $\epsilon$-fools $\mathcal{C}$ if \[ \Bigl|\mathop{\bf E}_{\boldsymbol{y} \sim \varphi}[f(\boldsymbol{y})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \leq \epsilon \] for all $f \in \mathcal{C}$.
Theorem 30 implies that using just $O(\log(n/\epsilon))$ independent random bits, one can generate a density which $\epsilon$-fools the class of $f : {\mathbb F}_2^n \to \{-1,1\}$ with $\deg_{{\mathbb F}_2}(f) \leq 1$. A natural problem in the field of derandomization is: How many independent random bits are needed to generate a density which $\epsilon$-fools all functions of ${\mathbb F}_2$-degree at most $d$? A naive hope might be that $\epsilon$-biased densities automatically fool functions of ${\mathbb F}_2$-degree $d > 1$. The next example shows that this hope fails badly, even for $d = 2$:
Example 47 Recall the inner product mod $2$ function, $\mathrm{IP}_{n} : {\mathbb F}_2^n \to \{0,1\}$, which has ${\mathbb F}_2$-degree $2$. Let $\varphi : {\mathbb F}_2^n \to {\mathbb R}^{\geq 0}$ be the density of the uniform distribution on the support of $\mathrm{IP}_n$. Now $\mathrm{IP}_n$ is an extremely regular function (see Examples 4), and indeed $\varphi$ is a roughly $2^{-n/2}$-biased density (see the exercises). But $\varphi$ is very bad at fooling at least one function of ${\mathbb F}_2$-degree $2$, namely $\mathrm{IP}_n$ itself: \[ \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[\mathrm{IP}_n({\boldsymbol{x}})] \approx 1/2, \qquad \mathop{\bf E}_{\boldsymbol{y} \sim \varphi}[\mathrm{IP}_n(\boldsymbol{y})] = 1. \]
The problem of using few random bits to fool $n$-bit, ${\mathbb F}_2$-degree-$d$ functions was first taken up by Luby, Veličković and Wigderson [LVW93]. They showed how to generate a fooling distribution using $\exp(O(\sqrt{d \log(n/d) + \log(1/\epsilon)}))$ independent random bits. There was no improvement on this for $14$ years, at which point Bogdanov and Viola [BV07] achieved $O(\log(n/\epsilon))$ random bits for $d = 2$ and $O(\log n) + \exp(\mathrm{poly}(1/\epsilon))$ random bits for $d = 3$. In general they suggested that ${\mathbb F}_2$-degree-$d$ might be fooled by the sum of $d$ independent draws from a small-bias distribution. Soon thereafter Lovett [Lov08] showed that a sum of $2^d$ independent draws from a small-bias distribution suffices, implying that ${\mathbb F}_2$-degree-$d$ functions can be fooled using just $2^{O(d)} \cdot \log(n/\epsilon)$ random bits. More precisely, if $\varphi$ is any $\epsilon$-biased density on ${\mathbb F}_2^n$ he showed that \[ \Bigl|\mathop{\bf E}_{\boldsymbol{y}^{(1)}, \dots, \boldsymbol{y}^{(2^d)} \sim \varphi}[f(\boldsymbol{y}^{(1)} + \cdots + \boldsymbol{y}^{(2^d)})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \leq O(\epsilon^{1/4^d}). \] In other words, the $2^d$-fold convolution $\varphi^{* 2^d}$ density fools functions of ${\mathbb F}_2$-degree $d$.
The current state of the art for this problem is Viola’s Theorem, which show that the original idea from [BV07] works: summing $d$ independent draws from an $\epsilon$-biased distribution fools ${\mathbb F}_2$-degree-$d$ polynomials.
Viola’s Theorem Let $\varphi$ be any $\epsilon$-biased density on ${\mathbb F}_2^n$, $0 \leq \epsilon \leq 1$. Let $d \in {\mathbb N}^+$ and define $\epsilon_d = 9\epsilon^{1/2^{d-1}}$. Then the class of all $f : {\mathbb F}_2^n \to \{-1,1\}$ with $\deg_{{\mathbb F}_2}(f) \leq d$ is $\epsilon_d$-fooled by the $d$-fold convolution $\varphi^{* d}$. I.e., \[ \Bigl|\mathop{\bf E}_{\boldsymbol{y}^{(1)}, \dots, \boldsymbol{y}^{(d)} \sim \varphi}[f(\boldsymbol{y}^{(1)} + \cdots + \boldsymbol{y}^{(d)})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \leq 9\epsilon^{1/2^{d-1}}. \]
In light of Theorem 30, Viola’s Theorem implies that one can $\epsilon$-fool $n$-bit functions of ${\mathbb F}_2$-degree $d$ using only $O(d\log n) + O(d 2^d \log(1/\epsilon))$ independent random bits.
The proof of Viola’s Theorem is an induction on $d$. To reduce the case of degree $d+1$ to degree $d$, Viola makes use of a simple concept: directional derivatives.
Definition 48 Let $f : {\mathbb F}_2^n \to {\mathbb F}_2$ and let $y \in {\mathbb F}_2^n$. The directional derivative $\Delta_y f : {\mathbb F}_2^n \to {\mathbb F}_2$ is defined by \[ \Delta_y f(x) = f(x+y) - f(x). \] Over ${\mathbb F}_2$ we may equivalently write $\Delta_y f(x) = f(x+y) + f(x)$.
As expected, taking a derivative reduces degree by $1$:
Fact 49 For any $f : {\mathbb F}_2^n \to {\mathbb F}_2$ and $y \in {\mathbb F}_2^n$ we have $\deg_{{\mathbb F}_2}(\Delta_y f) \leq \deg_{{\mathbb F}_2}(f) -1$.
In fact, we’ll prove a slightly stronger statement:
Proposition 50 Let $f : {\mathbb F}_2^n \to {\mathbb F}_2$ have $\deg_{{\mathbb F}_2}(f) = d$ and fix $y, y’ \in {\mathbb F}_2^n$. Define $g :{\mathbb F}_2^n \to {\mathbb F}_2$ by $g(x) = f(x+y) – f(x+y’)$. Then $\deg_{{\mathbb F}_2}(g) \leq d-1$.
Proof: In passing from the ${\mathbb F}_2$-polynomial representation of $f(x)$ to that of $g(x)$, each monomial $x^S$ of maximal degree $d$ is replaced by $(x+y)^S – (x+y’)^S$. Upon expansion the monomials $x^S$ cancel, leaving a polynomial of degree at most $d-1$. $\Box$
We are now ready to give the proof of Viola’s Theorem.
Proof of Viola’s Theorem: The proof is by induction on $d$. The $d = 1$ case is immediate (even without the factor of $9$) because $\varphi$ is $\epsilon$-biased. Assume that the theorem holds for general $d \geq 1$ and let $f : {\mathbb F}_2^n \to \{-1,1\}$ have $\deg_{{\mathbb F}_2}(f) \leq d+1$. We split into two cases, depending on whether the bias of $f$ is large or small.
Case 1: $\mathop{\bf E}[f]^2 > \epsilon_d$. In this case, \begin{align*} &\quad \sqrt{\epsilon_d} \cdot \Bigl|\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{*(d+1)}}[f(\boldsymbol{z})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \\ &< |\mathop{\bf E}[f]| \cdot \Bigl|\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{*(d+1)}}[f(\boldsymbol{z})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \\ &= \Bigl|\mathop{\bf E}_{{\boldsymbol{x}}’ \sim {\mathbb F}_2^n, \boldsymbol{z} \sim \varphi^{*(d+1)}}[f({\boldsymbol{x}}')f(\boldsymbol{z})] – \mathop{\bf E}_{{\boldsymbol{x}}’, {\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}}')f({\boldsymbol{x}})]\Bigr| \\ &= \Bigl|\mathop{\bf E}_{\boldsymbol{y} \sim {\mathbb F}_2^n, \boldsymbol{z} \sim \varphi^{*(d+1)}}[f(\boldsymbol{z} + \boldsymbol{y})f(\boldsymbol{z})] – \mathop{\bf E}_{\boldsymbol{y}, {\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}} + \boldsymbol{y})f({\boldsymbol{x}})]\Bigr| \\ &= \Bigl|\mathop{\bf E}_{\boldsymbol{y} \sim {\mathbb F}_2^n, \boldsymbol{z} \sim \varphi^{*(d+1)}}[\Delta_{\boldsymbol{y}}f(\boldsymbol{z})] – \mathop{\bf E}_{\boldsymbol{y}, {\boldsymbol{x}} \sim {\mathbb F}_2^n}[\Delta_{\boldsymbol{y}}f({\boldsymbol{x}})]\Bigr| \\ &\leq \mathop{\bf E}_{\boldsymbol{y} \sim {\mathbb F}_2^n}\Bigl[\Bigl|\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{*(d+1)}}[\Delta_{\boldsymbol{y}}f(\boldsymbol{z})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[\Delta_{\boldsymbol{y}}f({\boldsymbol{x}})\Bigr|\Bigr]. \end{align*} For each outcome $\boldsymbol{y}=y$ the directional derivative $\Delta_{y} f$ has ${\mathbb F}_2$-degree at most $d$ (Fact 49). By induction we know that $\varphi^{* d}$ $\epsilon_d$-fools any such polynomial and it follows (exercise) that $\varphi^{* (d+1)}$ does too. Thus each quantity in the expectation over $\boldsymbol{y}$ is at most $\epsilon_d$ and we conclude \[ \Bigl|\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{*(d+1)}}[f(\boldsymbol{z})] – \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}})]\Bigr| \leq \frac{\epsilon_d}{\sqrt{\epsilon_d}} = \sqrt{\epsilon_d} = \tfrac13 \epsilon_{d+1} \leq \epsilon_{d+1}. \]
Case 2: $\mathop{\bf E}[f]^2 \leq \epsilon_d$. In this case we want to show that $\mathop{\bf E}_{\boldsymbol{w} \sim \varphi^{*(d+1)}}[f(\boldsymbol{w})]^2$ is nearly as small. By Cauchy–Schwarz, \begin{multline*} \mathop{\bf E}_{\boldsymbol{w} \sim \varphi^{*(d+1)}}[f(\boldsymbol{w})]^2 = \mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{* d}}\Bigl[\mathop{\bf E}_{\boldsymbol{y} \sim \varphi}[f(\boldsymbol{z}+\boldsymbol{y})]\Bigr]^2 \leq \mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{* d}}\Bigl[\mathop{\bf E}_{\boldsymbol{y} \sim \varphi}[f(\boldsymbol{z}+\boldsymbol{y})]^2\Bigr] \\= \mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{* d}}\Bigl[\mathop{\bf E}_{\boldsymbol{y}, \boldsymbol{y}' \sim \varphi}[f(\boldsymbol{z}+\boldsymbol{y})f(\boldsymbol{z}+\boldsymbol{y}')]\Bigr] = \mathop{\bf E}_{\boldsymbol{y}, \boldsymbol{y}’ \sim \varphi}\Bigl[\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{* d}}[f(\boldsymbol{z}+\boldsymbol{y})f(\boldsymbol{z}+\boldsymbol{y}')]\Bigr]. \end{multline*} For each outcome of $\boldsymbol{y}=y$, $\boldsymbol{y}’=y’$, the function $f(\boldsymbol{z}+y)f(\boldsymbol{z}+y’)$ is of ${\mathbb F}_2$-degree at most $d$ in the variables $\boldsymbol{z}$, by Proposition 50. Hence by induction we have \begin{align*} \mathop{\bf E}_{\boldsymbol{y}, \boldsymbol{y}’ \sim \varphi}\Bigl[\mathop{\bf E}_{\boldsymbol{z} \sim \varphi^{* d}}[f(\boldsymbol{z}+\boldsymbol{y})f(\boldsymbol{z}+\boldsymbol{y}')]\Bigr] & \leq \mathop{\bf E}_{\boldsymbol{y}, \boldsymbol{y}’ \sim \varphi}\Bigl[\mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[f({\boldsymbol{x}}+\boldsymbol{y})f({\boldsymbol{x}}+\boldsymbol{y}')]\Bigr] + \epsilon_d\\ &= \mathop{\bf E}_{{\boldsymbol{x}} \sim {\mathbb F}_2^n}[\varphi * f({\boldsymbol{x}})^2] +\epsilon_d\\ &= \sum_{\gamma \in {\widehat{{\mathbb F}_2^n}}} \widehat{\varphi}(\gamma)^2 \widehat{f}(\gamma)^2 +\epsilon_d\\ &\leq \widehat{f}(0)^2 + \epsilon^2 \sum_{\gamma \neq 0} \widehat{f}(\gamma)^2 + \epsilon_d\\ &\leq 2\epsilon_d + \epsilon^2, \end{align*} where the last step used the hypothesis of Case $2$. We have thus shown \[ \mathop{\bf E}_{\boldsymbol{w} \sim \varphi^{*(d+1)}}[f(\boldsymbol{w})]^2 \leq 2\epsilon_d + \epsilon^2 \leq 3\epsilon_d \leq 4 \epsilon_d, \] and hence $|\mathop{\bf E}[f(\boldsymbol{w})]| \leq 2 \sqrt{\epsilon_d}$. Since we are in Case 2, $|\mathop{\bf E}[f]| \leq \sqrt{\epsilon_d}$, and so \[ \Bigl| \mathop{\bf E}_{\boldsymbol{w} \sim \varphi^{*(d+1)}}[f(\boldsymbol{w})] – \mathop{\bf E}[f] \Bigr| \leq 3 \sqrt{\epsilon_d} = \epsilon_{d+1}, \] as needed. $\Box$
We end this section by discussing the tightness of parameters in Viola’s Theorem. First, if we ignore the error parameter then the result is sharp: Lovett and Tzur [LT09] showed that the $d$-fold convolution of $\epsilon$-biased densities cannot in general fool functions of ${\mathbb F}_2$-degree $d+1$. More precisely, for any $d \in {\mathbb N}^+$, $\ell \geq 2d+1$ they give an explicit $\frac{\ell}{2^n}$-biased density on ${\mathbb F}_2^{(\ell+1)n}$ and an explicit function $f : {\mathbb F}_2^{(\ell+1)n} \to \{-1,1\}$ of degree $d+1$ for which \[ \Bigl|\mathop{\bf E}_{\boldsymbol{w} \sim \varphi^{* d}}[f(\boldsymbol{w})] – \mathop{\bf E}[f]\Bigr| \geq 1-\frac{2d}{2^n}. \] Regarding the error parameter in Viola’s Theorem, it is not known whether the quantity $\epsilon^{1/2^{d-1}}$ can be improved, even in the case $d = 2$. However obtaining even a modest improvement to $\epsilon^{1/1.99^d}$ (for $d$ as large as $\log n$) would constitute a major advance since it would imply progress on the notorious problem of “correlation bounds for polynomials” — see [Vio09a].
In definition 48, it seems that it should be written “\Delta_y f(x) = f(x+y)-f(x)”.
Absolutely, thanks.
Hello, seems that in definition 48, all the z’s should be changed to y.
Thanks, fixed.
In example 47, “(see )” is written, without any references.
Thanks! It should point to exercise 7 of this chapter.
PS: one more correction and you’ll have caught up to the mighty LYT! Keep them coming
The word “the” is duplicated in the statement of Viola’s Theorem
Thanks!
There might be a small typo in Case 2 of the proof of Viola’s Theorem
(next to last line on p. 152 of the book).
Should there be an extra pair of parentheses around $\phi \star f(x)$
Thanks — I think that kind of parenthesis-free notation for convolution is okay; on the other hand, it can’t really hurt if I add it.