§5.5: Highlight: Peres’s Theorem

Theorem 14 says that if $f$ is an unbiased linear threshold function $f(x) = \mathrm{sgn}(a_1 x_1 + \cdots + a_n x_n)$ in which all $a_i$’s are “small” then the noise stability $\mathbf{Stab}_\rho[f]$ is at least (roughly) $\frac{2}{\pi} \arcsin \rho$. Rephrasing in terms of noise sensitivity, this means $\mathbf{NS}_\delta[f]$ is at most (roughly) $\tfrac{2}{\pi} \sqrt{\delta} + O(\delta^{3/2})$ (see the statement of Theorem 2.44).

On the other hand, if some $a_i$ were particularly large then $f$ would be pushed in the direction of the dictator function $\chi_i$, which has $\mathbf{NS}_\delta[\chi_i] = \delta \ll \sqrt{\delta}$. This observation suggests that all unbiased LTFs $f$ should have $\mathbf{NS}_\delta[f] \leq O(\sqrt{\delta})$. The unbiasedness assumption also seems inessential, since biasing a function should tend to decrease its noise sensitivity.

Indeed the idea here is correct, as was shown by Peres in 1999:

Peres’s Theorem Let $f : \{-1,1\}^n \to \{-1,1\}$ be any linear threshold function. Then $\mathbf{NS}_\delta[f] \leq O(\sqrt{\delta})$.

Pleasantly, the proof is quite simple and uses no heavy tools like the Central Limit Theorem. Before getting to it, let’s remark on an application to learning theory.

Peres’s Theorem immediately gives that LTFs have their Fourier spectrum $\epsilon$-concentrated up to degree $O(1/\epsilon^2)$ (Proposition 3.3) and hence the class of LTFs is learnable from random examples with error $\epsilon$ in time $n^{O(1/\epsilon^2)}$ (Corollary 3.34). The latter result is not too impressive since it’s been long known that LTFs are learnable in time $\mathrm{poly}(n, 1/\epsilon)$ using linear programming. However the noise sensitivity approach is much more flexible. Consider the concept class \[ \mathcal{C} = \{h = g(f_1, \dots, f_s) \mid \text{$f_1, \dots, f_s : \{-1,1\}^n \to \{-1,1\}$ are LTFs}\}. \] For each $h : \{-1,1\}^n \to \{-1,1\}$ in $\mathcal{C}$, Peres’s Theorem and a union bound (Exercise 2.38) imply that $\mathbf{NS}_\delta[h] \leq O(s \sqrt{\delta})$. Thus from Corollary 3.34 we get that the class $\mathcal{C}$ is learnable in time $n^{O(s^2/\epsilon^2)}$. This is the only known way of showing even that an AND of two LTFs is learnable with error $.01$ in time $\mathrm{poly}(n)$.

The trick for proving Peres’s Theorem is to employ a fairly general technique for bounding noise sensitivity using average sensitivity (total influence):

Theorem 31 Let $\delta \in (0,1/2]$ and let $A : {\mathbb N}^+ \to {\mathbb R}$. Let $\mathcal{C}$ be a class of boolean-valued functions closed under negation and identification of input variables. Suppose that each $f \in \mathcal{C}$ with domain $\{-1,1\}^n$ has $\mathbf{I}[f] \leq A(n)$. Then each $f \in \mathcal{C}$ has $\mathbf{NS}_\delta[f] \leq \frac{1}{m}A(m)$, where $m = \lfloor 1/\delta \rfloor$.

Proof: Fix any $f : \{-1,1\}^n \to \{-1,1\}$ from $\mathcal{C}$. Since noise sensitivity is an increasing function of the noise parameter (see the discussion surrounding Proposition 2.50) we may replace $\delta$ by $1/m$. Thus our task is to upper-bound $\mathbf{NS}_{1/m}[f] = \mathop{\bf Pr}[f({\boldsymbol{x}}) \neq f(\boldsymbol{y})]$ where ${\boldsymbol{x}} \sim \{-1,1\}^n$ is uniformly random and $\boldsymbol{y} \in \{-1,1\}^n$ is formed from ${\boldsymbol{x}}$ by negating each bit independently with probability $1/m$. The rough idea of the proof is that this is equivalent to randomly partitioning ${\boldsymbol{x}}$’s bits into $m$ parts and then negating a randomly chosen part.

More precisely, let $z \in \{-1,1\}^n$ and let $\pi : [n] \to [m]$ be a partition of $[n]$ into $m$ parts. Define \[ g_{z, \pi} : \{-1,1\}^m \to \{-1,1\}, \quad g_{z, \pi}(w) = f(z \circ w^{\pi}), \] where $\circ$ denotes entry-wise multiplication and $w^{\pi} = (w_{\pi(1)}, \dots, w_{\pi(n)}) \in \{-1,1\}^n$. Since $g_{z, \pi}$ is derived from $f$ by negating and identifying input variables it follows that $g_{z,\pi} \in \mathcal{C}$. So by assumption $g_{z,\pi}$ has total influence $\mathbf{I}[g_{z,\pi}] \leq A(m)$ and hence average influence $\pmb{\mathcal E}[g_{z,\pi}] \leq \frac{1}{m} A(m)$ (see Exercise 2.37).

Now suppose $\boldsymbol{z} \sim \{-1,1\}^n$ and $\boldsymbol{\pi} : [n] \to [m]$ are chosen uniformly at random. We certainly have \[ \mathop{\bf E}_{\boldsymbol{z}, \boldsymbol{\pi}}[\pmb{\mathcal E}[g_{\boldsymbol{z},\boldsymbol{\pi}}]] \leq \tfrac{1}{m}A(m). \] To complete the proof we will show that the left-hand side above is precisely $\mathbf{NS}_{1/m}[f]$. Recall that in the experiment for average influence $\pmb{\mathcal E}[g]$ we choose $\boldsymbol{w} \sim \{-1,1\}^m$ and $\boldsymbol{j} \sim [m]$ uniformly at random and check if $g(\boldsymbol{w}) \neq g(\boldsymbol{w}^{\oplus \boldsymbol{j}})$. Thus \[ \mathop{\bf E}_{\boldsymbol{z}, \boldsymbol{\pi}}[\pmb{\mathcal E}[g_{\boldsymbol{z},\boldsymbol{\pi}}]] = \mathop{\bf Pr}_{\boldsymbol{z}, \boldsymbol{\pi}, \boldsymbol{w}, \boldsymbol{j}}[g_{\boldsymbol{z}, \boldsymbol{\pi}}(\boldsymbol{w}) \neq g_{\boldsymbol{z}, \boldsymbol{\pi}}(\boldsymbol{w}^{\oplus \boldsymbol{j}})] = \mathop{\bf Pr}_{\boldsymbol{w}, \boldsymbol{\pi}, \boldsymbol{j}, \boldsymbol{z}}[f(\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}}) \neq f(\boldsymbol{z} \circ (\boldsymbol{w}^{\oplus \boldsymbol{j}})^{\boldsymbol{\pi}})]. \] It is not hard to see that the joint distribution of $\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}}$, $\boldsymbol{z} \circ (\boldsymbol{w}^{\oplus \boldsymbol{j}})^{\boldsymbol{\pi}}$ is the same as that of ${\boldsymbol{x}}$, $\boldsymbol{y}$. To be precise, define $\boldsymbol{J} = \boldsymbol{\pi}^{-1}(\boldsymbol{j})$, distributed as a random subset of $[n]$ in which each coordinate is included with probability $1/m$; and, define $\boldsymbol{\lambda} \in \{-1,1\}^n$ by $\boldsymbol{\lambda}_i = -1$ if and only if $i \in \boldsymbol{J}$. Then \[ \mathop{\bf Pr}_{\boldsymbol{w}, \boldsymbol{\pi}, \boldsymbol{j}, \boldsymbol{z}}[f(\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}}) \neq f(\boldsymbol{z} \circ (\boldsymbol{w}^{\oplus \boldsymbol{j}})^{\boldsymbol{\pi}})] = \mathop{\bf Pr}_{\boldsymbol{w}, \boldsymbol{\pi}, \boldsymbol{j}, \boldsymbol{z}}[f(\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}}) \neq f(\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}} \circ \boldsymbol{\lambda})]. \] But for every outcome of $\boldsymbol{w}$, $\boldsymbol{\pi}$, $\boldsymbol{j}$ (and hence $\boldsymbol{J}$, $\boldsymbol{\lambda}$), we may replace $\boldsymbol{z}$ with $\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}}$ since they have the same distribution, namely uniform on $\{-1,1\}^n$. Then the above becomes \[ \mathop{\bf Pr}_{\boldsymbol{w}, \boldsymbol{\pi}, \boldsymbol{j}, \boldsymbol{z}}[f(\boldsymbol{z}) \neq f(\boldsymbol{z} \circ \boldsymbol{\lambda})] = \mathbf{NS}_{1/m}[f], \] as claimed. $\Box$

Peres’s Theorem is now a simple corollary of Theorem 31.

Proof of Peres’s Theorem: Let $\mathcal{C}$ be the class of all linear threshold functions. This class is indeed closed under negating and identifying variables. Since each linear threshold function on $m$ bits is unate (i.e., monotone up to negation of some input coordinates, see Exercises 2.5, 2.6), its total influence is at most $\sqrt{m}$ (see Exercise 2.19). Applying Theorem 31 we get that for any LTF $f$ and any $\delta \in (0,1/2]$, \begin{align*} \mathbf{NS}_\delta[f] \leq \tfrac{1}{m}\sqrt{m} &= 1/\sqrt{m} \qquad \text{(for $m = \lfloor 1/\delta \rfloor$)} \\ &\leq O(\sqrt{\delta}). \qquad \qquad \qquad \qquad \qquad\Box \end{align*}

Remark 32 Our proof of Peres’s Theorem attained the upper-bound $\sqrt{1/\lfloor 1/\delta \rfloor}$. This is at most $\sqrt{3/2} \sqrt{\delta}$ for all $\delta \in (0,1/2]$ and it’s also $\sqrt{\delta} + O(\delta^{3/2})$ for small $\delta$. To further improve the constant we can use Theorem 2.32 in place of Exercise 2.19; it implies that all unate $m$-bit functions have total influence at most $\sqrt{2/\pi}\sqrt{m} + O(m^{-1/2})$. This lets us obtain the bound $\mathbf{NS}_\delta[f] \leq \sqrt{2/\pi} \sqrt{\delta} + O(\delta^{3/2})$ for all LTF $f$.

Recall from Theorem 2.44 that $\mathbf{NS}_\delta[\mathrm{Maj}_n] \sim \tfrac{2}{\pi} \sqrt{\delta}$ for large $n$. Thus the constant $\sqrt{2/\pi}$ in the bound from Remark 32 is fairly close to optimal. It seems quite likely that majority’s $\tfrac{2}{\pi}$ is the correct constant here. There is still slack in Peres’s proof because the random functions $g_{\boldsymbol{z}, \boldsymbol{\pi}}$ arising in Theorem 31 are unlikely to be majorities, even if $f$ is. The most elegant possible result in this direction would be to prove the following conjecture of Benjamini, Kalai, and Schramm:

Majority Is Least Stable Conjecture Let $f : \{-1,1\}^n \to \{-1,1\}$ be a linear threshold function, $n$ odd. Then for all $\rho \in [0,1]$, $\mathbf{Stab}_\rho[f] \geq \mathbf{Stab}_\rho[\mathrm{Maj}_n]$.

(This is a precise statement about majority’s noise stability within the class of LTFs; the Majority Is Stablest Theorem refers to its noise stability within the class of small-influence functions.) A very important problem in this area is to extend Peres’s Theorem to polynomial threshold functions. Let \[ \mathrm{PTF}_{n,k} = \{f : \{-1,1\}^n \to \{-1,1\} \mid f \text{ is a PTF of degree at most $k$}\}. \] The goal here is to prove that $\mathbf{NS}_\delta[f] \leq O_k(1) \sqrt{\delta}$ for all $f \in \mathrm{PTF}_{n,k}$. This is currently not known even for $k = 2$. Since $\bigcup_n \mathrm{PTF}_{n,k}$ is closed under negating and identifying variables, a natural approach is to use Theorem 31; it would suffice to show that $\mathbf{I}[f] \leq O_k(1) \sqrt{n}$ for all $f \in \mathrm{PTF}_{n,k}$. In fact, such a total influence bound is also necessary; see the exercises. The desired total influence bound here is a very old conjecture of Gotsman and Linial:

Gotsman–Linial Conjecture Let $f \in \mathrm{PTF}_{n,k}$. Then $\mathbf{I}[f] \leq O_k(1) \sqrt{n}$. More strongly, $\mathbf{I}[f] \leq O(k) \sqrt{n}$. Most strongly, the $f \in \mathrm{PTF}_{n,k}$ of maximal total influence is the symmetric one $f(x) = \mathrm{sgn}(p(x_1 + \cdots + x_n))$, where $p$ is a degree-$k$ univariate polynomial which alternates sign on the $k+1$ values of $x_1 + \cdots + x_n$ closest to $0$.

The strongest form of the Gotsman–Linial Conjecture for $k = 1$ is true by Theorem 2.32. The best results towards the conjecture for higher $k$ are the upper bounds $2n^{1-1/2^k}$ (good for small $k$; see the exercises) and $2^{O(k)} n^{1 – 1/(Ck)}$ (better for large $k$), where $C$ is a fairly small universal constant. Note that the latter implies $\mathbf{NS}_{\delta}[f] \leq 2^{O(k)} \delta^{1/(Ck)}$ for all $f \in \mathrm{PTF}_{n,k}$, a bound which is at least independent of $n$.

5 comments to §5.5: Highlight: Peres’s Theorem

Leave a Reply to Noam Lifshitz Cancel reply




You can use most standard LaTeX, as well as these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>