§11.6: The Invariance Principle

Let’s summarize the Variant Berry–Esseen Theorem and proof from the preceding section, using slightly different notation. (Specifically, we’ll rewrite $\boldsymbol{X}_i = a_i {\boldsymbol{x}}_i$ where $\mathop{\bf Var}[{\boldsymbol{x}}_i] = 1$, so $a_i = \pm \sigma_i$.)

We showed that if ${\boldsymbol{x}}_1, \dots, {\boldsymbol{x}}_n, \boldsymbol{y}_1, \dots, \boldsymbol{y}_n$ are independent mean-$0$, variance-$1$ random variables, reasonable in the sense of having third absolute moment at most $B$, and if $a_1, \dots, a_n$ are real constants assumed for normalization to satisfy $\sum_i a_i^2 = 1$, then \begin{gather*} \label{eqn:be-generalizes} a_1 {\boldsymbol{x}}_1 + \cdots + a_n {\boldsymbol{x}}_n \approx a_1 \boldsymbol{y}_1 + \cdots + a_n \boldsymbol{y}_n, \\ \text{with error bound proportional to } B \max\{|a_i|\}. \end{gather*} We think of this as saying that the linear form $a_1 x_1 + \cdots + a_n x_n$ is (roughly) invariant to what independent mean-$0$, variance-$1$, reasonable random variables are substituted for the $x_i$’s, so long as all $|a_i|$’s are “small” (compared to the overall variance). In this section we generalize this statement to degree-$k$ multilinear polynomial forms, $\sum_{|S| \leq k} a_S\,x^S$. The appropriate generalization of the condition that “all $|a_i|$’s are small” is the condition that all “influences” $\sum_{S \ni i} a_S^2$ are small. We refer to these nonlinear generalizations of Berry–Esseen as Invariance Principles.

In this section we’ll develop the most basic Invariance Principle, which involves replacing bits by Gaussians for a single Boolean function $f$. We’ll show that this doesn’t change the distribution of $f$ much provided $f$ has small influences and provided that $f$ is of “constant degree” — or at least, provided $f$ is uniformly noise-stable so that it’s “close to having constant degree”. Invariance Principles in much more general settings are possible — for example the exercises describe variants which handle several functions applied to correlated inputs, and functions on general product spaces. Here we’ll just focus on the simplest possible Invariance Principle, which is already sufficient for the proof of the Majority Is Stablest Theorem in Section 7.

Let’s begin with some notation.

Definition 63 Let $F$ be a formal multilinear polynomial over the sequence of indeterminates $x = (x_1, \dots, x_n)$: \[ F(x) = \sum_{S \subseteq [n]} \widehat{F}(S) \prod_{i \in S} x_i, \] where the coefficients $\widehat{F}(S)$ are real numbers. We introduce the notation \[ \mathop{\bf Var}[F] = \sum_{S \neq \emptyset} \widehat{F}(S)^2, \qquad \mathbf{Inf}_i[F] = \sum_{S \ni i} \widehat{F}(S)^2. \]

Remark 64 To justify this notation, we remark that we’ll always consider $F$ applied to a sequence $\boldsymbol{z} = (\boldsymbol{z}_1, \dots, \boldsymbol{z}_n)$ independent random variables satisfying $\mathop{\bf E}[\boldsymbol{z}_i] = 0$, $\mathop{\bf E}[\boldsymbol{z}_i^2] = 1$. Under these circumstances the collection of monomial random variables $\prod_{i \in S} \boldsymbol{z}_i$ is orthonormal and so it’s easy to see (cf. Chapter 8.2) that \[ \mathop{\bf E}[F(\boldsymbol{z})] = \widehat{F}(\emptyset), \quad \mathop{\bf E}[F(\boldsymbol{z})^2] = \sum_{S \subseteq [n]} \widehat{F}(S)^2, \quad \mathop{\bf Var}[F(\boldsymbol{z})] = \mathop{\bf Var}[F] = \sum_{S \neq \emptyset} \widehat{F}(S)^2. \] We also have $\mathop{\bf E}[\mathop{\bf Var}_{\boldsymbol{z}_i}[F(\boldsymbol{z})]] = \mathbf{Inf}_i[F] = \sum_{S \ni i} \widehat{F}(S)^2$, though we won’t use this.

As in the Berry–Esseen Theorem, to get good error bounds we’ll need our random variables $\boldsymbol{z}_i$ to be “reasonable”. Sacrificing generality for simplicity in this section, we’ll take the bounded $4$th-moment notion from Definition 9.1 which will allow us to use the basic Bonami Lemma (more precisely, Corollary 9.6):

Hypothesis The random variable $\boldsymbol{z}_i$ satisfies $\mathop{\bf E}[\boldsymbol{z}_i] = 0$, $\mathop{\bf E}[\boldsymbol{z}_i^2] = 1$, $\mathop{\bf E}[\boldsymbol{z}_i^3] = 0$, and is “$9$-reasonable” in the sense of Definition 9.1; i.e., $\mathop{\bf E}[\boldsymbol{z}_i^4] \leq 9$.

The main examples we have in mind are that each $\boldsymbol{z}_i$ is either a uniform $\pm 1$ random bit or a standard Gaussian. (There are other possibilities, though; e.g., $\boldsymbol{z}_i$ could be uniform on the interval $[-\sqrt{3}, \sqrt{3}]$.)

We can now prove the most basic Invariance Principle, for low-degree multilinear polynomials of random variables:

Basic Invariance Principle Let $F$ be a formal $n$-variate multilinear polynomial of degree at most $k \in {\mathbb N}$, \[ F(x) = \sum_{S \subseteq [n], |S| \leq k} \widehat{F}(S) \prod_{i \in S} x_i. \] Let ${\boldsymbol{x}} = ({\boldsymbol{x}}_1, \dots, {\boldsymbol{x}}_n)$ and $\boldsymbol{y} = (\boldsymbol{y}_1, \dots, \boldsymbol{y}_n)$ be sequences of independent random variables, each satisfying the above Hypothesis. Assume $\psi : {\mathbb R} \to {\mathbb R}$ is $\mathcal{C}^4$ with $\|\psi^{\prime\prime\prime\prime}\|_\infty \leq C$. Then \begin{equation} \label{eqn:basic-inv-goal} \left|\mathop{\bf E}[\psi(F({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(F(\boldsymbol{y}))]\right| \leq \tfrac{C}{12}\cdot 9^k \cdot \sum_{t=1}^n \mathbf{Inf}_t[F]^2. \end{equation}

Remark 65 The proof will be very similar to the one we used for Berry–Esseen except that we’ll take a $3$rd-order Taylor expansion rather than a $2$nd-order one (so that we can use the easy Bonami Lemma). As you are asked to show in the exercises, had we only required that $\psi$ be $\mathcal{C}^3$ and that the ${\boldsymbol{x}}_i$’s and $\boldsymbol{y}_i$’s be $(2,3,\rho)$-hypercontractive with $2$nd moment equal to $1$, then we could obtain \[ \left|\mathop{\bf E}[\psi(F({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(F(\boldsymbol{y}))]\right| \leq \tfrac{\|\psi^{\prime\prime\prime}\|_\infty}{3}\cdot (1/\rho)^{3k} \cdot \sum_{t=1}^n \mathbf{Inf}_t[F]^{3/2}. \]

Proof: The proof uses the Replacement Method. For $0 \leq t \leq n$ we define \[ \boldsymbol{H}_t = F(\boldsymbol{y}_1, \dots, \boldsymbol{y}_{t}, {\boldsymbol{x}}_{t+1}, \dots, {\boldsymbol{x}}_{n}), \] so $F({\boldsymbol{x}}) = \boldsymbol{H}_0$ and $F(\boldsymbol{y}) = \boldsymbol{H}_n$. We will show that \begin{equation} \label{eqn:basic-inv-step} \left|\mathop{\bf E}[\psi(\boldsymbol{H}_{t-1}) - \psi(\boldsymbol{H}_t)]\right| \leq \tfrac{C}{12}\cdot 9^k \cdot \mathbf{Inf}_t[F]^2; \end{equation} as in our proof of the Berry–Esseen Theorem, this will complete the proof after summing over $t$ and using the triangle inequality. To analyze \eqref{eqn:basic-inv-step} we separate out the part of $F(x)$ that depends on $x_t$; i.e., we write $F(x) = \mathrm{E}_t F(x) + x_t \mathrm{D}_t F(x)$, where the formal polynomials $\mathrm{E}_t F$ and $\mathrm{D}_t F$ are defined by \[ \mathrm{E}_tF(x) = \sum_{S \not \ni t} \widehat{F}(S) \prod_{i \in S} x_i, \qquad \mathrm{D}_tF(x) = \sum_{S \ni t} \widehat{F}(S) \prod_{i \in S \setminus \{t\}} x_i. \] Note that neither $\mathrm{E}_t F$ nor $\mathrm{D}_t F$ depends on the indeterminate $x_t$; thus we can define \begin{align*} \boldsymbol{U}_t &= \mathrm{E}_t F(\boldsymbol{y}_1, \dots, \boldsymbol{y}_{t-1}, \cdot, {\boldsymbol{x}}_{t+1}, \dots, {\boldsymbol{x}}_n), \\ \boldsymbol{\Delta}_t &= \mathrm{D}_t F(\boldsymbol{y}_1, \dots, \boldsymbol{y}_{t-1}, \cdot, {\boldsymbol{x}}_{t+1}, \dots, {\boldsymbol{x}}_n), \end{align*} so that \[ \boldsymbol{H}_{t-1} = \boldsymbol{U}_t + \boldsymbol{\Delta}_t{\boldsymbol{x}}_t, \qquad \boldsymbol{H}_{t} = \boldsymbol{U}_t + \boldsymbol{\Delta}_t\boldsymbol{y}_t. \] We now use a $3$rd-order Taylor expansion to bound \eqref{eqn:basic-inv-step}: \begin{align*} \psi(\boldsymbol{H}_{t-1}) &= \psi(\boldsymbol{U}_t) + \psi’(\boldsymbol{U}_t) \boldsymbol{\Delta}_t {\boldsymbol{x}}_t + \tfrac{1}{2} \psi^{\prime\prime}(\boldsymbol{U}_t) \boldsymbol{\Delta}_t^2 {\boldsymbol{x}}_t^2 + \tfrac16 \psi^{\prime\prime\prime}(\boldsymbol{U}_t) \boldsymbol{\Delta}_t^3 {\boldsymbol{x}}_t^3 + \tfrac{1}{24} \psi^{\prime\prime\prime\prime}(\boldsymbol{U}_t^*) \boldsymbol{\Delta}_t^4 {\boldsymbol{x}}_t^4 \\ \psi(\boldsymbol{H}_{t}) &= \psi(\boldsymbol{U}_t) + \psi’(\boldsymbol{U}_t) \boldsymbol{\Delta}_t \boldsymbol{y}_t + \tfrac{1}{2} \psi^{\prime\prime}(\boldsymbol{U}_t) \boldsymbol{\Delta}_t^2 \boldsymbol{y}_t^2 + \tfrac16 \psi^{\prime\prime\prime}(\boldsymbol{U}_t) \boldsymbol{\Delta}_t^3 \boldsymbol{y}_t^3 + \tfrac{1}{24} \psi^{\prime\prime\prime\prime}(\boldsymbol{U}_t^{**}) \boldsymbol{\Delta}_t^4 \boldsymbol{y}_t^4 \end{align*} for some random variables $\boldsymbol{U}_t^*$ and $\boldsymbol{U}_t^{**}$. As in the proof of the Berry–Esseen Theorem, when we subtract these and take the expectation there are significant simplifications. The $0$th-order terms cancel. As for the $1$st-order terms, \[ \mathop{\bf E}[\psi'(\boldsymbol{U}_t) \boldsymbol{\Delta}_t {\boldsymbol{x}}_t - \psi'(\boldsymbol{U}_t) \boldsymbol{\Delta}_t \boldsymbol{y}_t] = \mathop{\bf E}[\psi'(\boldsymbol{U}_t) \boldsymbol{\Delta}_t \cdot( {\boldsymbol{x}}_t - \boldsymbol{y}_t)] = \mathop{\bf E}(\psi’(\boldsymbol{U}_t) \boldsymbol{\Delta}_t] \cdot \mathop{\bf E}[{\boldsymbol{x}}_t - \boldsymbol{y}_t] = 0. \] The second equality here crucially uses the fact that ${\boldsymbol{x}}_t,\ \boldsymbol{y}_t$ are independent of $\boldsymbol{U}_t,\ \boldsymbol{\Delta}_t$. The final equality only uses the fact that ${\boldsymbol{x}}_t$ and $\boldsymbol{y}_t$ have matching $1$st moments (and not the stronger assumption that both of these $1$st moments are $0$). The $2$nd- and $3$rd-order terms will similarly cancel, using the fact that ${\boldsymbol{x}}_t$ and $\boldsymbol{y}_t$ have matching $2$nd and $3$rd moments. Finally, for the “error” term we’ll just use $|\psi^{\prime\prime\prime\prime}(\boldsymbol{U}_t^*)|, |\psi^{\prime\prime\prime\prime}(\boldsymbol{U}_t^{**})| \leq C$ and the triangle inequality; we thus obtain \[ \left|\mathop{\bf E}[\psi(\boldsymbol{H}_{t-1}) - \psi(\boldsymbol{H}_t)]\right| \leq \tfrac{C}{24} \cdot (\mathop{\bf E}[(\boldsymbol{\Delta}_t{\boldsymbol{x}}_t)^4] + \mathop{\bf E}[(\boldsymbol{\Delta}_t\boldsymbol{y}_t)^4]). \] To complete the proof of \eqref{eqn:basic-inv-step} we now just need to bound \[ \mathop{\bf E}[(\boldsymbol{\Delta}_t{\boldsymbol{x}}_t)^4],\ \mathop{\bf E}[(\boldsymbol{\Delta}_t\boldsymbol{y}_t)^4] \leq 9^k \cdot \mathbf{Inf}_t[F]^2, \] which we’ll do using the Bonami Lemma. We’ll give the proof for $\mathop{\bf E}[(\boldsymbol{\Delta}_t{\boldsymbol{x}}_t)^4]$, the case of $\mathop{\bf E}[(\boldsymbol{\Delta}_t\boldsymbol{y}_t)^4]$ being identical. We have \[ \boldsymbol{\Delta}_t {\boldsymbol{x}}_t = \mathrm{L}_t F(\boldsymbol{y}_1, \dots, \boldsymbol{y}_{t-1}, {\boldsymbol{x}}_t, {\boldsymbol{x}}_{t+1}, \dots, {\boldsymbol{x}}_n), \] where \[ \mathrm{L}_t F(x) = x_t \mathrm{D}_t F(x) = \sum_{S \ni t} \widehat{F}(S) \prod_{i \in S} x_i. \] Since $\mathrm{L}_t F$ has degree at most $k$ we can apply the Bonami Lemma (more precisely, Corollary 9.6) to obtain \[ \mathop{\bf E}[(\boldsymbol{\Delta}_t {\boldsymbol{x}}_t)^4] \leq 9^k \mathop{\bf E}[\mathrm{L}_t F(\boldsymbol{y}_1, \dots, \boldsymbol{y}_{t-1}, {\boldsymbol{x}}_t, {\boldsymbol{x}}_{t+1}, \dots, {\boldsymbol{x}}_n)^2]^2. \] But since $\boldsymbol{y}_1, \dots, \boldsymbol{y}_{t-1}, {\boldsymbol{x}}_{t}, \dots, {\boldsymbol{x}}_{n}$ are independent with mean $0$ and $2$nd moment $1$, we have (see Remark 64) \[ \mathop{\bf E}[\mathrm{L}_t F(\boldsymbol{y}_1, \dots, \boldsymbol{y}_{t-1}, {\boldsymbol{x}}_t, {\boldsymbol{x}}_{t+1}, \dots, {\boldsymbol{x}}_n)^2] = \sum_{S \subseteq [n]} \widehat{\mathrm{L}_t F}(S)^2 = \sum_{S \ni t} \widehat{F}(S)^2 = \mathbf{Inf}_t[F]. \] Thus we indeed have $\mathop{\bf E}[(\boldsymbol{\Delta}_t {\boldsymbol{x}}_t)^4] \leq 9^k \cdot \mathbf{Inf}_t[F]^2$, and the proof is complete. $\Box$

Corollary 66 In the setting of the preceding theorem, if we furthermore have $\mathop{\bf Var}[F] \leq 1$ and $\mathbf{Inf}_t[F] \leq \epsilon$ for all $t \in [n]$, then \[ \left|\mathop{\bf E}[\psi(F({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(F(\boldsymbol{y}))]\right| \leq \tfrac{C}{12}\cdot k9^k \cdot \epsilon. \]

Proof: We have $\sum_t \mathbf{Inf}_t[F]^2 \leq \epsilon \sum_t \mathbf{Inf}_t[F] \leq \sum_{S} |S| \widehat{F}(S)^2 \leq k \mathop{\bf Var}[F]$. $\Box$

Corollary 67 In the setting of the preceding corollary, if we merely have that $\psi : {\mathbb R} \to {\mathbb R}$ is $c$-Lipschitz (rather than $\mathcal{C}^4$), then \[ \left|\mathop{\bf E}[\psi(F({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(F(\boldsymbol{y}))]\right| \leq O(c) \cdot 2^{k} \epsilon^{1/4}. \]

Proof: Just as in the proof of Corollary 59, by using $\widetilde{\psi}_\eta$ from Proposition 58 (which has $\|\widetilde{\psi}_\eta^{\prime\prime\prime\prime}\|_\infty \leq O(c/\eta^3)$) we obtain \[ \left|\mathop{\bf E}[\psi(F({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(F(\boldsymbol{y}))]\right| \leq O(c) \cdot (\eta + k 9^k \epsilon/\eta^3). \] The proof is completed by taking $\eta = \sqrt[4]{k 9^k \epsilon} \leq 2^k \epsilon^{1/4}$. $\Box$

Let’s connect this last corollary back to the study of Boolean functions. Suppose $f : \{-1,1\}^n \to {\mathbb R}$ has $\epsilon$-small influences (in the sense of Definition 6.9) and degree at most $k$. Letting $\boldsymbol{g} = (\boldsymbol{g}_1, \dots, \boldsymbol{g}_n)$ be a sequence of independent standard Gaussians, Corollary 67 tells us that for any Lipschitz $\psi$ we have \begin{equation} \label{eqn:cor-simple-inv} \left|\mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[\psi(f({\boldsymbol{x}}))] – \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[\psi(f(\boldsymbol{g}))]\right| \leq O(2^{k} \epsilon^{1/4}). \end{equation} Here the expression “$f(\boldsymbol{g})$” is an abuse of notation indicating that the real numbers $\boldsymbol{g}_1, \dots, \boldsymbol{g}_n$ are substituted into $f$’s Fourier expansion (multilinear polynomial representation).

At first it may seem peculiar to substitute arbitrary real numbers into the Fourier expansion of a Boolean function. Actually, if all the numbers being substituted are in the range $[-1,1]$ then there’s a natural interpretation: as you were asked to show in Exercise 1.5, if $\mu \in [-1,1]^n$, then $f(\mu) = \mathop{\bf E}[f(\boldsymbol{y})]$ where $\boldsymbol{y} \sim \{-1,1\}^n$ is drawn from the product distribution in which $\mathop{\bf E}[\boldsymbol{y}_i] = \mu_i$. On the other hand, there doesn’t seem to be any obvious meaning when real numbers outside the range $[-1,1]$ are substituted into $f$’s Fourier expansion, as may certainly occur when we consider $f(\boldsymbol{g})$.

Nevertheless, \eqref{eqn:cor-simple-inv} says that when $f$ is a low-degree, small-influence function, the distribution of the random variable $f(\boldsymbol{g})$ will be close to that of $f({\boldsymbol{x}})$. Now suppose $f : \{-1,1\}^n \to \{-1,1\}$ is Boolean-valued and unbiased. Then \eqref{eqn:cor-simple-inv} might seem impossible; how could the continuous random variable $f(\boldsymbol{g})$ essentially be $-1$ with probability $1/2$ and $+1$ with probability $1/2$? The solution to this mystery is that there are no low-degree, small-influence, unbiased Boolean-valued functions. This is a consequence of the OSSS Inequality — more precisely, Exercise 40(b) — which shows that in this setting we will always have $\epsilon \geq 1/k^3$ in \eqref{eqn:cor-simple-inv}, rendering the bound very weak. If the Aaronson–Ambainis Conjecture holds (see the notes for Chapter 8), a similar statement is true even for functions with range $[-1,1]$.

The reason \eqref{eqn:cor-simple-inv} is still useful is that we can apply it to small-influence, low-degree functions which are almost $\{-1,1\}$-valued, or $[-1,1]$-valued. Such functions can arise from truncating a very noise-stable Boolean-valued function to a large but constant degree. For example, we might profitably apply \eqref{eqn:cor-simple-inv} to $f = \mathrm{Maj}_n^{\leq k}$ and then deduce some consequences for $\mathrm{Maj}_n({\boldsymbol{x}})$ using the fact that $\mathop{\bf E}[(\mathrm{Maj}_n^{\leq k}({\boldsymbol{x}}) - \mathrm{Maj}_n({\boldsymbol{x}}))^2] = \mathbf{W}^{> k}[\mathrm{Maj}_n] \leq O(1/\sqrt{k})$ (Corollary 5.20). Let’s consider this sort of idea more generally:

Corollary 68 Let $f : \{-1,1\}^n \to {\mathbb R}$ have $\mathop{\bf Var}[f] \leq 1$. Let $k \geq 0$ and suppose $f^{\leq k}$ has $\epsilon$-small influences. Then for any $c$-Lipschitz $\psi : {\mathbb R} \to {\mathbb R}$ we have \begin{equation} \label{eqn:simple-inv-trunc1} \left|\mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[\psi(f({\boldsymbol{x}}))] – \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[\psi(f(\boldsymbol{g}))]\right| \leq O(c) \cdot \bigl(2^{k} \epsilon^{1/4} + \|f^{> k}\|_2\bigr). \end{equation} In particular, suppose $h : \{-1,1\}^n \to {\mathbb R}$ has $\mathop{\bf Var}[h] \leq 1$ and no $(\epsilon,\delta)$-notable coordinates (we assume $\epsilon \leq 1$, $\delta \leq \frac{1}{20}$). Then \[ \left|\mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[\psi(\mathrm{T}_{1-\delta} h({\boldsymbol{x}}))] – \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[\psi(\mathrm{T}_{1-\delta} h(\boldsymbol{g}))]\right| \leq O(c) \cdot \epsilon^{\delta/3}. \]

Proof: For the first statement we simply decompose $f = f^{\leq k} + f^{> k}$. Then the left-hand side of \eqref{eqn:simple-inv-trunc1} can be written as \begin{multline*} \left|\mathop{\bf E}[\psi(f^{\leq k}({\boldsymbol{x}}) + f^{> k}({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(f^{\leq k}(\boldsymbol{g}) + f^{> k}(\boldsymbol{g}))]\right| \\ \leq \left|\mathop{\bf E}[\psi(f^{\leq k}({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(f^{\leq k}(\boldsymbol{g}))]\right| + c\mathop{\bf E}[|f^{>k}({\boldsymbol{x}})|] + c\mathop{\bf E}[|f^{>k}(\boldsymbol{g})|], \end{multline*} using the fact that $\psi$ is $c$-Lipschitz. The first quantity is at most $O(c) \cdot 2^{k} \epsilon^{1/4}$, by Corollary 67 (even if $k$ is not an integer). As for the other two quantities, Cauchy–Schwarz implies \[ \mathop{\bf E}[|f^{>k}({\boldsymbol{x}})|] \leq \sqrt{\mathop{\bf E}[f^{>k}({\boldsymbol{x}})^2]} = \sqrt{\sum_{|S| > k} \widehat{f}(S)^2} = \|f^{> k}\|_2, \] and the same bound also holds for $\mathop{\bf E}[|f^{>k}(\boldsymbol{g})|]$; this uses the fact that $\mathop{\bf E}[f^{>k}(\boldsymbol{g})^2] = \sum_{|S| > k} \widehat{f}(S)^2$ just as in Remark 64. This completes the proof of \eqref{eqn:simple-inv-trunc1}.

As for the second statement of the corollary, let $f = \mathrm{T}_{1-\delta} h$. The assumptions on $h$ imply that $\mathop{\bf Var}[f] \leq 1$ and that $f^{\leq k}$ has $\epsilon$-small influences for any $k$; the latter is true because \[ \mathbf{Inf}_i[f^{\leq k}] = \sum_{|S| \leq k, S \ni i} (1-\delta)^{2|S|} \widehat{h}(S)^2 \leq \sum_{S \ni i} (1-\delta)^{|S| – 1} \widehat{h}(S)^2 = \mathbf{Inf}_i^{(1-\delta)}[h] \leq \epsilon \] since $h$ has no $(\epsilon, \delta)$-notable coordinate. Furthermore, \[ \|f^{ > k}\|_2^2 = \sum_{|S| > k} (1-\delta)^{2|S|} \widehat{h}(S)^2 \leq (1-\delta)^{2k} \mathop{\bf Var}[h] \leq (1-\delta)^{2k} \leq \exp(-2k\delta) \] for any $k \geq 1$; i.e., $\|f^{ > k}\|_2 \leq \exp(-k\delta)$. So applying the first part of the corollary gives \begin{equation} \label{eqn:weird-balance} \left|\mathop{\bf E}[\psi(f({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(f(\boldsymbol{g}))]\right| \leq O(c) \cdot \bigl(2^{k} \epsilon^{1/4} + \exp(-k\delta)\bigr) \end{equation} for any $k \geq 0$. Choosing $k = \frac13 \ln(1/\epsilon)$, the right-hand side of \eqref{eqn:weird-balance} becomes \[ O(c) \cdot \left(\epsilon^{-(1/3)\ln 2} \epsilon^{1/4} + \epsilon^{\delta/3}\right) \leq O(c) \cdot \epsilon^{\delta/3}, \] where the inequality uses the assumption $\delta \leq \frac{1}{20}$ (numerically, $\frac14 – \frac13 \ln 2 \approx \frac{1}{53}$). This completes the proof of the second statement of the corollary. $\Box$

Finally, if we think of the Basic Invariance Principle as the nonlinear analogue of our Variant Berry–Esseen Theorem, it’s natural to ask for the nonlinear analogue of the Berry–Esseen Theorem itself, i.e., a statement showing cdf-closeness of $F({\boldsymbol{x}})$ and $F(\boldsymbol{g})$. It’s straightforward to obtain a Lévy distance bound just as in the degree-$1$ case, Corollary 61; in the exercises you are asked to show the following:

Corollary 69 In the setting of Corollary 66 we have the Lévy distance bound $d_{L}(F({\boldsymbol{x}}),F(\boldsymbol{y})) \leq O(2^k\epsilon^{1/5})$. In the setting of Remark 65 we have the bound $d_{L}(F({\boldsymbol{x}}),F(\boldsymbol{y})) \leq (1/\rho)^{O(k)} \epsilon^{1/8}$.

Suppose we now want actual cdf-closeness in the case that $\boldsymbol{y} \sim \mathrm{N}(0,1)^n$. In the degree-$1$ (Berry–Esseen) case we used the fact that degree-$1$ polynomials of independent Gaussians have good anticoncentration. The analogous statement for higher-degree polynomials of Gaussians is not so easy to prove; however, Carbery and Wright [CW01] have obtained the following essentially optimal result:

Carbery–Wright Theorem Let $p : {\mathbb R}^n \to {\mathbb R}$ be a polynomial (not necessarily multilinear) of degree at most $k$, let $\boldsymbol{g} \sim \mathrm{N}(0,1)^n$, and assume $\mathop{\bf E}[p(\boldsymbol{g})^2] = 1$. Then for all $\epsilon > 0$, \[ \mathop{\bf Pr}[|p(\boldsymbol{g})| \leq \epsilon] \leq O(k \epsilon^{1/k}), \] where the $O(\cdot)$ hides a universal constant.

Using this theorem it’s not hard (exercise) to obtain:

Theorem 70 Let $f : \{-1,1\}^n \to {\mathbb R}$ be of degree at most $k$, with $\epsilon$-small influences and $\mathop{\bf Var}[f] = 1$. Then for all $u \in {\mathbb R}$, \[ \left| \mathop{\bf Pr}[f({\boldsymbol{x}}) \leq u] – \mathop{\bf Pr}[f(\boldsymbol{g}) \leq u] \right| \leq O(k) \cdot \epsilon^{1/(4k+1)}, \] where the $O(\cdot)$ hides a universal constant.

2 comments to §11.6: The Invariance Principle

Leave a Reply




You can use most standard LaTeX, as well as these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>