§5.2: Majority, and the Central Limit Theorem

Majority is one of the more important functions in boolean analysis and its study motivates the introduction of one of the more important tools: the Central Limit Theorem (CLT).

In this section we will show how the CLT can be used to estimate the total influence and the noise stability of $\mathrm{Maj}_n$. Though we already determined $\mathbf{I}[\mathrm{Maj}_n] \sim \sqrt{2/\pi}\sqrt{n}$ in Exercise 2.18 using binomial coefficients and Stirling’s formula, computations using the CLT are more flexible and extend to other linear threshold functions.

We begin with a reminder about the Central Limit Theorem. Suppose $\boldsymbol{X}_1, \dots, \boldsymbol{X}_n$ are independent random variables and $\boldsymbol{S} = \boldsymbol{X}_1 + \cdots + \boldsymbol{X}_n$. Roughly speaking, the CLT says that so long as no $\boldsymbol{X}_i$ is too dominant in terms of variance, the distribution of $\boldsymbol{S}$ is close to that of a Gaussian random variable with the same mean and variance. We give a precise statement below in the form of the Berry–Esseen Theorem. The CLT also extends to the multidimensional case (sums of independent random vectors); we give a precise statement in the exercises. In Chapter 13 we will show one way to prove such Central Limit Theorems.

Let’s see how we can use the CLT to obtain the estimate $\mathbf{I}[\mathrm{Maj}_n] \sim \sqrt{2/\pi}\sqrt{n}$. Recall the proof of Theorem 2.32, which shows that $\mathrm{Maj}_n$ maximizes $\sum_{i=1}^n \widehat{f}(i)$ among all $f : \{-1,1\}^n \to \{-1,1\}$. In it we saw that \begin{equation} \label{eqn:sum-deg-1-maj} \mathbf{I}[\mathrm{Maj}_n] = \sum_{i=1}^n \widehat{\mathrm{Maj}_n}(i) = \mathop{\bf E}_{{\boldsymbol{x}}}[\mathrm{Maj}_n({\boldsymbol{x}})(\mathop{{\textstyle \sum}}_i {\boldsymbol{x}}_i)] = \mathop{\bf E}_{{\boldsymbol{x}}}[|\mathop{{\textstyle \sum}}_i {\boldsymbol{x}}_i|]. \end{equation} When using the Central Limit Theorem, it’s convenient to define majority (equivalently) as \[ \mathrm{Maj}_n(x) = \mathrm{sgn}\Bigl(\mathop{{\textstyle \sum}}_{i=1}^n \tfrac{1}{\sqrt{n}} x_i\Bigr). \] This motivates writing \eqref{eqn:sum-deg-1-maj} as \begin{equation} \label{eqn:sum-deg-1-maj2} \mathbf{I}[\mathrm{Maj}_n] = \sqrt{n} \cdot \mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[|\mathop{{\textstyle \sum}}_i \tfrac{1}{\sqrt{n}}{\boldsymbol{x}}_i|]. \end{equation} If we introduce $\boldsymbol{S} = \sum_{i=1}^n \tfrac{1}{\sqrt{n}} {\boldsymbol{x}}_i$, then $\boldsymbol{S}$ has mean $0$ and variance $\sum_i (1/\sqrt{n})^2 = 1$. Thus the CLT tells us that the distribution of $\boldsymbol{S}$ is close (for large $n$) to that of a standard Gaussian, $\boldsymbol{Z} \sim \mathrm{N}(0,1)$. So as $n \to \infty$ we have \begin{equation} \label{eqn:abs-first-moment-gaussian} \mathop{\bf E}_{{\boldsymbol{x}}}[|\boldsymbol{S}|] \sim \mathop{\bf E}_{\boldsymbol{Z} \sim \mathrm{N}(0,1)}[|\boldsymbol{Z}|] = 2\int_{0}^\infty z \cdot \tfrac{1}{\sqrt{2\pi}} e^{-z^2/2}\,dz = -\sqrt{2/\pi} e^{-z^2/2}\Bigm|_{0}^\infty = \sqrt{2/\pi}, \end{equation} which when combined with \eqref{eqn:sum-deg-1-maj2} gives us the estimate $\mathbf{I}[\mathrm{Maj}_n] \sim \sqrt{2/\pi} \sqrt{n}$. To make this kind of estimate more precise we state the Berry–Esseen Theorem, which is a strong version of the CLT giving explicit error bounds rather than just limiting statements.

Berry–Esseen (Central Limit) Theorem Let $\boldsymbol{X}_1, \dots, \boldsymbol{X}_n$ be independent random variables with $\mathop{\bf E}[\boldsymbol{X}_i] = 0$ and $\mathop{\bf Var}[\boldsymbol{X}_i] = \sigma_i^2$, and assume $\sum_{i=1}^n \sigma_i^2 = 1$. Let $\boldsymbol{S} = \sum_{i=1}^n \boldsymbol{X}_i$ and let $\boldsymbol{Z} \sim \mathrm{N}(0,1)$ be a standard Gaussian. Then for all $u \in {\mathbb R}$, \[ |\mathop{\bf Pr}[\boldsymbol{S} \leq u] – \mathop{\bf Pr}[\boldsymbol{Z} \leq u]| \leq B \beta, \] where \[ \beta = \sum_{i=1}^n \|\boldsymbol{X}_i\|_3^3 \] and $B$ is a universal constant. (For definiteness, $B = .56$ is acceptable.)

Remark 12 If all of the $\boldsymbol{X}_i$’s satisfy $|\boldsymbol{X}_i| \leq \epsilon$ with probability $1$ then we can use the bound \[ \beta = \sum_{i=1}^n \mathop{\bf E}[|\boldsymbol{X}_i|^3] \leq \epsilon \cdot \sum_{i=1}^n \mathop{\bf E}[|\boldsymbol{X}_i|^2] = \epsilon \cdot \sum_{i=1}^n \sigma_i^2 = \epsilon. \]

The exercises contain additional observations about this theorem.

Our most frequent use of the Berry–Esseen Theorem will be in analyzing random sums \[ \boldsymbol{S} = \sum_{i=1}^n a_i {\boldsymbol{x}}_i, \] where ${\boldsymbol{x}} \sim \{-1,1\}^n$ and the constants $a_i \in {\mathbb R}$ are normalized so that $\sum_i a_i^2 = 1$. For majority, all of the $a_i$’s were equal to $\tfrac{1}{\sqrt{n}}$. But from Remark 12 we see that $\boldsymbol{S}$ is close in distribution to a standard Gaussian so long as each $|a_i|$ is small. For example, in the exercises you are asked to show the following:

Theorem 13 Let $a_1, \dots, a_n \in {\mathbb R}$ satisfy $\sum_i a_i^2 = 1$ and $|a_i| \leq \epsilon$ for all $i$. Then \[ \left|\mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[|\mathop{{\textstyle \sum}}_i a_i {\boldsymbol{x}}_i|] – \sqrt{2/\pi}\right| \leq C \epsilon, \] where $C$ is a universal constant.

Theorem 13 justifies \eqref{eqn:abs-first-moment-gaussian} with an error bound of $O(1/\sqrt{n})$, yielding the more precise estimate $\mathbf{I}[\mathrm{Maj}_n] = \sqrt{2/\pi}\sqrt{n} \pm O(1)$ (cf. Exercise 2.18 which gives an even better error bound).

Now let’s turn to the noise stability of majority. Theorem 2.44 stated the formula \begin{equation} \label{eqn:re-maj-stab} \lim_{n \to \infty} \mathbf{Stab}_\rho[\mathrm{Maj}_n] = \tfrac{2}{\pi} \arcsin \rho. \end{equation} Let’s now spend some time justifying this using the multidimensional CLT. (For complete details, see the exercises.) By definition, \begin{equation} \label{eqn:stab-maj-justify1} \mathbf{Stab}_\rho[\mathrm{Maj}_n] = \mathop{\bf E}_{\substack{({\boldsymbol{x}}, \boldsymbol{y}) \\ {\text{ $\rho$-correlated}}}}[\mathrm{Maj}_n({\boldsymbol{x}}) \cdot \mathrm{Maj}_n(\boldsymbol{y})] = \mathop{\bf E}_{{\substack{({\boldsymbol{x}}, \boldsymbol{y}) \\ {\text{ $\rho$-correlated}}}}}[\mathrm{sgn}(\mathop{{\textstyle \sum}}_i \tfrac{1}{\sqrt{n}} {\boldsymbol{x}}_i) \cdot \mathrm{sgn}(\mathop{{\textstyle \sum}}_i \tfrac{1}{\sqrt{n}} \boldsymbol{y}_i)]. \end{equation} For each $i \in [n]$ let’s stack $\tfrac{1}{\sqrt{n}} {\boldsymbol{x}}_i$ and $\tfrac{1}{\sqrt{n}} \boldsymbol{y}_i$ into a $2$-dimensional vector and then write \begin{equation} \label{eqn:Svec-cov} \vec{\boldsymbol{S}} = \sum_{i=1}^n \begin{bmatrix} \tfrac{1}{\sqrt{n}}{\boldsymbol{x}}_i \\ \tfrac{1}{\sqrt{n}}\boldsymbol{y}_i \end{bmatrix} \in {\mathbb R}^2. \end{equation} We are summing $n$ independent random vectors, so the multidimensional Central Limit Theorem tells us that the distribution of $\vec{\boldsymbol{S}}$ is close to that of a $2$-dimensional Gaussian $\vec{\boldsymbol{Z}}$ with the same mean and covariance matrix, namely \[ \vec{\boldsymbol{Z}} \sim \mathrm{N}\left(\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}\right). \] Continuing from \eqref{eqn:stab-maj-justify1}, \begin{align*} \mathbf{Stab}_\rho[\mathrm{Maj}_n] &= \mathop{\bf E}[\mathrm{sgn}(\vec{\boldsymbol{S}}_1)\cdot \mathrm{sgn}(\vec{\boldsymbol{S}}_2)] \\ &= \mathop{\bf Pr}[\mathrm{sgn}(\vec{\boldsymbol{S}}_1) = \mathrm{sgn}(\vec{\boldsymbol{S}}_2)] – \mathop{\bf Pr}[\mathrm{sgn}(\vec{\boldsymbol{S}}_1) \neq \mathrm{sgn}(\vec{\boldsymbol{S}}_2)] \\ &= 2\mathop{\bf Pr}[\mathrm{sgn}(\vec{\boldsymbol{S}}_1) = \mathrm{sgn}(\vec{\boldsymbol{S}}_2)] – 1 = 4\mathop{\bf Pr}[\vec{\boldsymbol{S}} \in Q_{++}] – 1, \end{align*} where $Q_{++}$ denotes the upper-right quadrant of ${\mathbb R}^2$ and the last step uses the symmetry $\mathop{\bf Pr}[\vec{\boldsymbol{S}} \in Q_{++}] = \mathop{\bf Pr}[\vec{\boldsymbol{S}} \in Q_{--}]$. The $2$-dimensional Central Limit Theorem lets us deduce \[ \lim_{n \to \infty} \mathop{\bf Pr}[\vec{\boldsymbol{S}} \in Q_{++}] = \mathop{\bf Pr}[\vec{\boldsymbol{Z}} \in Q_{++}]. \] So to justify the noise stability formula \eqref{eqn:re-maj-stab} for majority, it remains to verify \[ 4\mathop{\bf Pr}[\vec{\boldsymbol{Z}} \in Q_{++}] – 1 = \tfrac{2}{\pi} \arcsin \rho \quad\Leftrightarrow\quad \mathop{\bf Pr}[\vec{\boldsymbol{Z}} \in Q_{++}] = \tfrac14 + \tfrac{1}{2\pi} \arcsin \rho. \] And this in turn is an old identity known as Sheppard’s Formula (which you are asked to prove in the exercises).

Sheppard’s Formula Let $\boldsymbol{z}_1$, $\boldsymbol{z}_2$ be standard Gaussian random variables with correlation $\mathop{\bf E}[\boldsymbol{z}_1 \boldsymbol{z}_2] = \rho \in [-1,1]$. Then \[ \mathop{\bf Pr}[\boldsymbol{z}_1 > 0, \boldsymbol{z}_2 > 0] = \tfrac14 + \tfrac{1}{2\pi} \arcsin \rho. \]

This completes the justification of the formula $\mathbf{Stab}_\rho[\mathrm{Maj}_n] \to \tfrac{2}{\pi} \arcsin \rho$. You may have noticed that once we applied the $2$-dimensional Central Limit Theorem to \eqref{eqn:stab-maj-justify1}, the remainder of the derivation had nothing to do with majority. In fact, the same analysis works for any linear threshold function $\mathrm{sgn}(a_1 x_1 + \cdots + a_n x_n)$, the only difference being the “error term” arising from the CLT. As in Theorem 13, this error is small so long as no coefficient $a_i$ is too dominant:

Theorem 14 Let $f : \{-1,1\}^n \to \{-1,1\}$ be an unbiased LTF, $f(x) = \mathrm{sgn}(a_1 x_1 + \cdots + a_n x_n)$ with $\sum_i a_i^2 = 1$ and $|a_i| \leq \epsilon$ for all $i$. Then for any $\rho \in (-1,1)$, \[ \Bigl|\mathbf{Stab}_\rho[f] – \tfrac{2}{\pi} \arcsin \rho\Bigr| \leq O\Bigl(\tfrac{\epsilon}{\sqrt{1-\rho^2}}\Bigr). \]

You are asked to prove Theorem 14 in the exercises. In the particular case of $\mathrm{Maj}_n$ where $a_i = \tfrac{1}{\sqrt{n}}$ for all $i$ we can make a slightly stronger claim (again, see the exercises):

Theorem 15 For any $\rho \in [0,1)$, $\mathbf{Stab}_\rho[\mathrm{Maj}_n]$ is a decreasing function of $n$, with \[ \tfrac{2}{\pi} \arcsin \rho \leq \mathbf{Stab}_\rho[\mathrm{Maj}_n] \leq \tfrac{2}{\pi} \arcsin \rho + O\Bigl(\tfrac{1}{\sqrt{1-\rho^2}\sqrt{n}}\Bigr). \]

We end this section by mentioning another way in which the majority function is extremal — among all unbiased functions with small influences, it has (essentially) the largest noise stability:

Majority Is Stablest Theorem (simplified) Fix $\rho \in (0,1)$. Then for any $f : \{-1,1\}^n \to \{-1,1\}$ with $\mathop{\bf E}[f] = 0$ and $\mathbf{MaxInf}[f] \leq \tau$, \[ \mathbf{Stab}_\rho[f] \leq \tfrac{2}{\pi} \arcsin \rho + o_\tau(1). \]

For sufficiently small $\rho$, we’ll prove this in Section 4 of this chapter. The proof of the full Majority Is Stablest Theorem will have to wait until Chapter 13.

6 comments to §5.2: Majority, and the Central Limit Theorem

Leave a Reply

  

  

  

You can use most standard LaTeX, as well as these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>