§9.1: Low-degree polynomials are reasonable

As anyone who has worked in probability knows, a random variable can sometimes behave in rather “unreasonable” ways. It may be never close to its expectation. It might exceed its expectation almost always, or almost never. It might have finite $1$st, $2$nd, and $3$rd moments, but an infinite $4$th moment. All of this poor behaviour can cause a lot of trouble — wouldn’t it be nice to have a class of “reasonable” random variables?

A very simple condition on a random variable that guarantees some good behaviour is that its $4$th moment is not too large compared to its $2$nd moment.

Definition 1 For a real number $B \geq 1$, we say that the real random variable $\boldsymbol{X}$ is $B$-reasonable if $\mathop{\bf E}[\boldsymbol{X}^4] \leq B\mathop{\bf E}[\boldsymbol{X}^2]^2$. (Equivalently, if $\|\boldsymbol{X}\|_4 \leq B^{1/4} \|\boldsymbol{X}\|_2$.)

The smaller $B$ is, the more “reasonable” $\boldsymbol{X}$ is. This definition is scale-invariant (i.e., $c \boldsymbol{X}$ is $B$-reasonable if and only if $\boldsymbol{X}$ is, for $c \neq 0$) but not translation-invariant ($c + \boldsymbol{X}$ and $\boldsymbol{X}$ may not be equally reasonable). The latter fact can sometimes be awkward, a point we’ll address further in Section 3. Indeed, we’ll later encounter a few alternative conditions that also capture “reasonableness”. For example, in Chapter 12 we’ll consider the analogous $3$rd moment condition, $\mathop{\bf E}[|\boldsymbol{X}|^3] \leq \beta \mathop{\bf E}[\boldsymbol{X}^2]^{3/2}$. Strictly speaking, the $4$th moment condition is stronger: if $\boldsymbol{X}$ is $B$-reasonable then \[ \mathop{\bf E}[|\boldsymbol{X}|^3] = \mathop{\bf E}[|\boldsymbol{X}| \cdot \boldsymbol{X}^2] \leq \sqrt{\mathop{\bf E}[\boldsymbol{X}^2]}\sqrt{\mathop{\bf E}[\boldsymbol{X}^4]} \leq \sqrt{B}\mathop{\bf E}[\boldsymbol{X}^2]^{3/2}; \] on the other hand, there exist random variables with finite $3$rd moment and infinite $4$th moment. However such unusual random variables almost never arise for us, and morally speaking the $4$th and $3$rd moment conditions are about equally good proxies for reasonableness.

Examples 2 If ${\boldsymbol{x}} \sim \{-1,1\}$ is uniformly random then ${\boldsymbol{x}}$ is $1$-reasonable. If $\boldsymbol{g} \sim \mathrm{N}(0,1)$ is a standard Gaussian then $\mathop{\bf E}[\boldsymbol{g}^4] = 3$, so $\boldsymbol{g}$ is $3$-reasonable. If $\boldsymbol{u} \sim [-1,1]$ is uniform then you can calculate that it is $\frac95$-reasonable. In all of these examples $B$ is a “small” constant, and we think of these random variables simply as “reasonable”. An example of an “unreasonable” random variable would be highly biased Bernoulli random variable; say, $\mathop{\bf Pr}[\boldsymbol{y} = 1] = 2^{-n}$, $\mathop{\bf Pr}[\boldsymbol{y} = 0] = 1-2^{-n}$, where $n$ is large. This $\boldsymbol{y}$ is not $B$-reasonable unless $B \geq 2^n$.

Let’s give a few illustrations of why reasonable random variables are nice to work with. First, they have slightly better tail bounds than what you would get out of the Chebyshev inequality:

Proposition 3 Let $\boldsymbol{X} \not \equiv 0$ be $B$-reasonable. Then $\mathop{\bf Pr}[|\boldsymbol{X}| \geq t \|\boldsymbol{X}\|_2] \leq B/t^4$ for all $t > 0$.


Proof: This is immediate from Markov’s inequality: \[ \mathop{\bf Pr}[|\boldsymbol{X}| \geq t\|\boldsymbol{X}\|_2] = \mathop{\bf Pr}[\boldsymbol{X}^4 \geq t^4\|\boldsymbol{X}\|_2^4] \leq \frac{\mathop{\bf E}[\boldsymbol{X}^4]}{t^4 \mathop{\bf E}[\boldsymbol{X}^2]^2} \leq \frac{B}{t^4}. \qquad \Box \]

More interestingly, they also satisfy anticoncentration bounds; e.g., you can upper-bound the probability that they are near $0$.

Proposition 4 Let $\boldsymbol{X} \not \equiv 0$ be $B$-reasonable. Then $\mathop{\bf Pr}[|\boldsymbol{X}| > t \|\boldsymbol{X}\|_2] \geq (1-t^2)^2/B$ for all $t \in [0,1]$.


Proof: Applying the Paley–Zygmund inequality (also called the “second moment method”) to $\boldsymbol{X}^2$, we get \[ \mathop{\bf Pr}[|\boldsymbol{X}| \geq t \|\boldsymbol{X}\|_2] = \mathop{\bf Pr}[\boldsymbol{X}^2 \geq t^2 \mathop{\bf E}[\boldsymbol{X}^2]] \geq (1-t^2)^2\frac{\mathop{\bf E}[\boldsymbol{X}^2]^2}{\mathop{\bf E}[\boldsymbol{X}^4]} \geq \frac{(1-t^2)^2}{B}. \qquad \Box \]

For a generalization of this proposition, see the exercises.

For a discrete random variable $\boldsymbol{X}$, a simple condition which guarantees reasonableness is that $\boldsymbol{X}$ takes on each of its values with nonnegligible probability:

Proposition 5 Let $\boldsymbol{X}$ be a discrete random variable with probability mass function $\pi$. Write \[ \lambda = \min(\pi) = \min_{x \in \mathrm{range}(\boldsymbol{X})}\{\mathop{\bf Pr}[\boldsymbol{X} = x]\}. \] Then $\boldsymbol{X}$ is $(1/\lambda)$-reasonable.


Proof: Let $M = \|\boldsymbol{X}\|_\infty$. Since $\mathop{\bf Pr}[|\boldsymbol{X}| = M] \geq \lambda$ we get \[ \mathop{\bf E}[\boldsymbol{X}^2] \geq \lambda M^2 \quad\Rightarrow\quad M^2 \leq \mathop{\bf E}[\boldsymbol{X}^2]/\lambda. \] On the other hand, \[ \mathop{\bf E}[\boldsymbol{X}^4] = \mathop{\bf E}[\boldsymbol{X}^2 \cdot \boldsymbol{X}^2] \leq M^2 \cdot \mathop{\bf E}[\boldsymbol{X}^2], \] and thus $\mathop{\bf E}[\boldsymbol{X}^4] \leq (1/\lambda)\mathop{\bf E}[\boldsymbol{X}^2]^2$ as required. $\Box$

The converse to Proposition 5 is certainly not true. For example, if $\boldsymbol{X} = \frac{1}{\sqrt{n}} {\boldsymbol{x}}_1 + \cdots + \frac{1}{\sqrt{n}} {\boldsymbol{x}}_n$ where ${\boldsymbol{x}} \sim \{-1,1\}^n$, then $\boldsymbol{X}$ is very close to a standard Gaussian random variable (for $n$ large) and is, unsurprisingly, $3$-reasonable. On the other hand, the “$\lambda$” for this $\boldsymbol{X}$ is tiny, $2^{-n}$.

This discussion raises the issue of how you might try to construct an unreasonable random variable out of independent uniform $\pm 1$ bits. By Proposition 5, at the very least you must use a lot of them. Furthermore, it also seems that they must be combined in a high-degree way. For example, to construct the unreasonable random variable $\boldsymbol{y}$ from Examples 2 requires degree $n$: $\boldsymbol{y} = (1+{\boldsymbol{x}}_1)(1+{\boldsymbol{x}}_2)\cdots(1+{\boldsymbol{x}}_n)/2^n$.

Indeed, the idea that high degree is required for unreasonableness is correct, as the following crucial result shows:

The Bonami Lemma For each $k$, if $f : \{-1,1\}^n \to {\mathbb R}$ has degree at most $k$ and ${\boldsymbol{x}}_1, \dots, {\boldsymbol{x}}_n$ are independent, uniformly random $\pm 1$ bits, then the random variable $f({\boldsymbol{x}})$ is $9^k$-reasonable. I.e., \[ \mathop{\bf E}[f^4] \leq 9^k \mathop{\bf E}[f^2]^2 \quad\Leftrightarrow\quad \|f\|_4 \leq \sqrt{3}^k \|f\|_2. \]

In other words, low-degree polynomials of independent uniform $\pm 1$ bits are reasonable. As we will explain later, the Bonami Lemma is a special case of more general results in the theory of “hypercontractivity”. However, many key theorems using hypercontractivity — e.g., the KKL Theorem, the Invariance Principle — really only need the simple Bonami Lemma. (We should also note that the name “Bonami Lemma” is not standard; however, the result was first proved by Bonami and it’s often used as a lemma, so the name fits. See the discussion at the end of this chapter.)

One pleasant thing about the Bonami Lemma is that once you decide to prove it by induction on $n$, the proof practically writes itself. The only “non-automatic” step is an application of Cauchy–Schwarz.


Proof of the Bonami Lemma: We assume $k \geq 1$ as otherwise $f$ must be constant and the claim is trivial. The proof is by induction on $n$. Again, if $n = 0$ then $f$ must be constant and the claim is trivial. For $n \geq 1$ we can use the decomposition $f(x) = x_n \mathrm{D}_n f(x) + \mathrm{E}_n f(x)$ (Proposition 2.23), where $\deg(\mathrm{D}_n f) \leq k-1$, $\deg(\mathrm{E}_n f) \leq k$, and the polynomials $\mathrm{D}_n f(x)$ and $\mathrm{E}_n f(x)$ don’t depend on $x_n$. For brevity we write $\boldsymbol{f} = f({\boldsymbol{x}})$, $\boldsymbol{d} = \mathrm{D}_nf({\boldsymbol{x}})$, and $\boldsymbol{e} = \mathrm{E}_n f({\boldsymbol{x}})$. Now \begin{align*} \mathop{\bf E}[\boldsymbol{f}^4] &= \mathop{\bf E}[({\boldsymbol{x}}_n \boldsymbol{d} + \boldsymbol{e})^4] \\ &= \mathop{\bf E}[{\boldsymbol{x}}_n^4 \boldsymbol{d}^4] + 4 \mathop{\bf E}[{\boldsymbol{x}}_n^3 \boldsymbol{d}^3 \boldsymbol{e}] + 6 \mathop{\bf E}[{\boldsymbol{x}}_n^2 \boldsymbol{d}^2 \boldsymbol{e}^2] + 4 \mathop{\bf E}[{\boldsymbol{x}}_n \boldsymbol{d} \boldsymbol{e}^3] + \mathop{\bf E}[\boldsymbol{e}^4] \\ &= \mathop{\bf E}[{\boldsymbol{x}}_n^4] \mathop{\bf E}[\boldsymbol{d}^4] + 4 \mathop{\bf E}[{\boldsymbol{x}}_n^3] \mathop{\bf E}[\boldsymbol{d}^3 \boldsymbol{e}] + 6 \mathop{\bf E}[{\boldsymbol{x}}_n^2] \mathop{\bf E}[\boldsymbol{d}^2 \boldsymbol{e}^2] + 4 \mathop{\bf E}[{\boldsymbol{x}}_n]\mathop{\bf E}[\boldsymbol{d} \boldsymbol{e}^3] + \mathop{\bf E}[\boldsymbol{e}^4]. \end{align*} In the last step we used the fact that ${\boldsymbol{x}}_n$ is independent of $\boldsymbol{d}$ and $\boldsymbol{e}$, since $\mathrm{D}_nf$ and $\mathrm{E}_n f$ do not depend on $x_n$. We now use $\mathop{\bf E}[{\boldsymbol{x}}_n] = \mathop{\bf E}[{\boldsymbol{x}}_n^3] = 0$ and $\mathop{\bf E}[{\boldsymbol{x}}_n^2] = \mathop{\bf E}[{\boldsymbol{x}}_n^4] = 1$ to deduce \begin{equation} \label{eqn:bonami-induct} \mathop{\bf E}[\boldsymbol{f}^4] = \mathop{\bf E}[\boldsymbol{d}^4] + 6 \mathop{\bf E}[\boldsymbol{d}^2 \boldsymbol{e}^2] + \mathop{\bf E}[\boldsymbol{e}^4]. \end{equation} A similar (and simpler) sequence of steps shows that \begin{equation} \label{eqn:bonami-last} \mathop{\bf E}[\boldsymbol{f}^2] = \mathop{\bf E}[\boldsymbol{d}^2] + \mathop{\bf E}[\boldsymbol{e}^2]. \end{equation} To upper-bound \eqref{eqn:bonami-induct}, recall that $\boldsymbol{d} = \mathrm{D}_nf({\boldsymbol{x}})$ where $\mathrm{D}_n f$ is a multilinear polynomial of degree at most $k-1$ depending on $n-1$ variables. Thus we can apply the induction hypothesis to deduce $\mathop{\bf E}[\boldsymbol{d}^4] \leq 9^{k-1} \mathop{\bf E}[\boldsymbol{d}^2]^2$. Similarly, $\mathop{\bf E}[\boldsymbol{e}^4] \leq 9^{k} \mathop{\bf E}[\boldsymbol{e}^2]^2$ since $\deg(\mathrm{E}_nf) \leq k$. To bound $\mathop{\bf E}[\boldsymbol{d}^2 \boldsymbol{e}^2]$ we apply Cauchy–Schwarz, getting $\sqrt{\mathop{\bf E}[\boldsymbol{d}^4]}\sqrt{\mathop{\bf E}[\boldsymbol{e}^4]}$ and letting us use induction again. Thus we have \begin{align*} \mathop{\bf E}[\boldsymbol{f}^4] &\leq 9^{k-1} \mathop{\bf E}[\boldsymbol{d}^2]^2 + 6 \sqrt{9^{k-1}\mathop{\bf E}[\boldsymbol{d}^2]^2} \sqrt{9^{k}\mathop{\bf E}[\boldsymbol{e}^2]^2} + 9^{k} \mathop{\bf E}[\boldsymbol{e}^2]^2 \\ &\leq 9^k \Bigl(\mathop{\bf E}[\boldsymbol{d}^2]^2 + 2 \mathop{\bf E}[\boldsymbol{d}^2]\mathop{\bf E}[\boldsymbol{e}^2] + \mathop{\bf E}[\boldsymbol{e}^2]^2\Bigr) = 9^k \Bigl(\mathop{\bf E}[\boldsymbol{d}^2] + \mathop{\bf E}[\boldsymbol{e}^2]\Bigr)^2, \end{align*} where we used $9^{k-1} \mathop{\bf E}[\boldsymbol{d}^2]^2 \leq 9^k \mathop{\bf E}[\boldsymbol{d}^2]^2$. In light of \eqref{eqn:bonami-last}, this completes the proof. $\Box$

Some aspects of the sharpness of the Bonami Lemma are explored in the exercises. Here we make one more observation. At the end of the proof we used the wasteful-looking inequality $9^{k-1} \mathop{\bf E}[\boldsymbol{d}^2]^2 \leq 9^k \mathop{\bf E}[\boldsymbol{d}^2]^2$. Tracing back through the proof, it’s easy to see that it would still be valid even if we just had $\mathop{\bf E}[{\boldsymbol{x}}_i^4] \leq 9$ rather than $\mathop{\bf E}[{\boldsymbol{x}}_i^4] = 1$. For example, the Bonami Lemma holds not just if the ${\boldsymbol{x}}_i$’s are random bits, but if they are standard Gaussians, or are uniform on $[-1,1]$, or there are some of each. We leave the following as an exercise.

Corollary 6 Let ${\boldsymbol{x}}_1, \dots, {\boldsymbol{x}}_n$ be independent, not necessarily identically distributed, random variables satisfying $\mathop{\bf E}[{\boldsymbol{x}}_i] = \mathop{\bf E}[{\boldsymbol{x}}_i^3] = 0$. (This holds if, e.g., each $-{\boldsymbol{x}}_i$ has the same distribution as ${\boldsymbol{x}}_i$.) Assume also that each ${\boldsymbol{x}}_i$ is $B$-reasonable. Let $\boldsymbol{f} = F({\boldsymbol{x}}_1, \dots, {\boldsymbol{x}}_n)$, where $F$ is a multilinear polynomial of degree at most $k$. Then $\boldsymbol{f}$ is $\max(B,9)^k$-reasonable.

As a first application of the Bonami Lemma, let us combine it with Proposition 4 to show that a low-degree function is not too concentrated around its mean:

Theorem 7 Let $f : \{-1,1\}^n \to {\mathbb R}$ be a nonconstant function of degree at most $k$; write $\mu = \mathop{\bf E}[f]$ and $\sigma = \sqrt{\mathop{\bf Var}[f]}$. Then \[ \mathop{\bf Pr}_{{\boldsymbol{x}} \sim \{-1,1\}}[|f({\boldsymbol{x}}) - \mu| > \tfrac{1}{2} \sigma] \geq \tfrac{1}{16} 9^{1-k}. \]


Proof: Let $g = \frac{1}{\sigma}(f – \mu)$, a function of degree at most $k$ satisfying $\|g\|_2 = 1$. By the Bonami Lemma, $g$ is $9^k$-reasonable. The result now follows by applying Proposition 4 to $g$ with $t = \tfrac{1}{2}$. $\Box$

Using this theorem, we can give a short proof of the FKN Theorem from Chapter 2.5: if $f : \{-1,1\}^n \to \{-1,1\}$ has $\mathbf{W}^{1}[f] = 1-\delta$ then $f$ is $O(\delta)$-close to $\pm \chi_i$ for some $i \in [n]$.


Proof of the FKN Theorem: Write $\ell = f^{=1}$, so $\mathop{\bf E}[\ell^2] = 1-\delta$ by assumption. We may assume without loss of generality that $\delta \leq \frac{1}{1600}$. The goal of the proof is to show that $\mathop{\bf Var}[\ell^2]$ is small; specifically we’ll show that $\mathop{\bf Var}[\ell^2] \leq 6400\delta$. This will complete the proof because \[ \tfrac{1}{2}\mathop{\bf Var}[\ell^2] = \sum_{i \neq j} \widehat{f}(i)^2\widehat{f}(j)^2 = \Bigl(\mathop{{\textstyle \sum}}_{i=1}^n \widehat{f}(i)^2\Bigr)^2 – \mathop{{\textstyle \sum}}_{i=1}^n \widehat{f}(i)^4 = (1-\delta)^2 – \mathop{{\textstyle \sum}}_{i=1}^n \widehat{f}(i)^4 \geq (1-2\delta) – \mathop{{\textstyle \sum}}_{i=1}^n \widehat{f}(i)^4 \] (the first equality above is an exercise) and hence $\mathop{\bf Var}[\ell^2] \leq 6400\delta$ implies \[ 1 - 3202\delta \leq \mathop{{\textstyle \sum}}_{i=1}^n \widehat{f}(i)^4 \leq \max_i\{\widehat{f}(i)^2\} \mathop{{\textstyle \sum}}_{i=1}^n \widehat{f}(i)^2 \leq \max_i\{\widehat{f}(i)^2\} \leq \max_i\{|\widehat{f}(i)|\}, \] as required.

To bound $\mathop{\bf Var}[\ell^2]$ we first apply Theorem 7 to the degree-$2$ function $\ell^2$; this yields \[ \mathop{\bf Pr}\Bigl[\bigl|\ell^2 - (1-\delta)\bigr| \geq \tfrac{1}{2} \sqrt{\mathop{\bf Var}[\ell^2]}\Bigr] \geq \tfrac{1}{16} 9^{1-2} = \tfrac{1}{144}. \] Now suppose by way of contradiction that $\mathop{\bf Var}[\ell^2] > 6400\delta$; then the above implies \begin{equation} \label{eqn:fkn1} \tfrac{1}{144} \leq \mathop{\bf Pr}\Bigl[\bigl|\ell^2 - (1-\delta)\bigr| > 40\sqrt{\delta}\Bigr] \leq \mathop{\bf Pr}\Bigl[\bigl|\ell^2 - 1\bigr| > 39\sqrt{\delta}\Bigr]. \end{equation} This says that $|\ell|$ is frequently far from $1$. Since $|f| = 1$ always, we can deduce that $|f – \ell|^2$ is frequently large. More precisely, a short calculation (exercise) shows that $(f-\ell)^2 \geq 169\delta$ whenever $|\ell^2 – 1| > 39\sqrt{\delta}$. But now \eqref{eqn:fkn1} implies $\mathop{\bf E}[(f-\ell)^2] \geq \frac{1}{144} \cdot 169 \delta > \delta$, a contradiction since $\mathop{\bf E}[(f-\ell)^2] = 1- \mathbf{W}^{1}[f] = \delta$ by assumption. $\Box$

4 comments to §9.1: Low-degree polynomials are reasonable

Leave a Reply

  

  

  

You can use most standard LaTeX, as well as these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>