<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Analysis of Boolean Functions</title>
	<atom:link href="http://www.contrib.andrew.cmu.edu/%7Eryanod/index.php?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.contrib.andrew.cmu.edu/~ryanod</link>
	<description>Ryan O&#039;Donnell  (ryan@analysisofbooleanfunctions.org)</description>
	<lastBuildDate>Mon, 14 May 2012 19:08:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Hiatus</title>
		<link>http://www.contrib.andrew.cmu.edu/~ryanod/?p=977</link>
		<comments>http://www.contrib.andrew.cmu.edu/~ryanod/?p=977#comments</comments>
		<pubDate>Mon, 14 May 2012 19:07:25 +0000</pubDate>
		<dc:creator>Ryan O'Donnell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.contrib.andrew.cmu.edu/~ryanod/?p=977</guid>
		<description><![CDATA[<p>This blog will go on hiatus for the summer (except for responding to comments). See you in September.</p> ]]></description>
			<content:encoded><![CDATA[<p>This blog will go on hiatus for the summer (except for responding to comments).  See you in September.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.contrib.andrew.cmu.edu/~ryanod/?feed=rss2&#038;p=977</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 5 notes</title>
		<link>http://www.contrib.andrew.cmu.edu/~ryanod/?p=968</link>
		<comments>http://www.contrib.andrew.cmu.edu/~ryanod/?p=968#comments</comments>
		<pubDate>Mon, 14 May 2012 19:06:46 +0000</pubDate>
		<dc:creator>Ryan O'Donnell</dc:creator>
				<category><![CDATA[Chapter 5: Majority and threshold functions]]></category>
		<category><![CDATA[Chapter notes]]></category>

		<guid isPermaLink="false">http://www.contrib.andrew.cmu.edu/~ryanod/?p=968</guid>
		<description><![CDATA[<p>Chow&#8217;s Theorem was proved by independently by Chow [Cho61] and by Tannenbaum [Tan61] in 1961; see also [Elg61]. The generalization to PTFs (Theorem 6) is due to Bruck [Bru90], as is Theorem 8 and Exercise 13. Theorems 2 and 7 are from [GL94] and may be called the Gotsman&#8211;Linial Theorems; this work also contains the [...]]]></description>
			<content:encoded><![CDATA[<p>Chow&#8217;s Theorem was proved by independently by Chow <b>[Cho61]</b> and by Tannenbaum <b>[Tan61]</b> in 1961; see also <b>[Elg61]</b>. The generalization to PTFs (Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#thmptf-chow" target="_blank">6</a>) is due to Bruck <b>[Bru90]</b>, as is Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#thmbruck" target="_blank">8</a> and Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=917#exbruck-separation" target="_blank">13</a>. Theorems <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#thmgotsman-linial1" target="_blank">2</a> and <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#thmgotsman-linial2" target="_blank">7</a> are from <b>[GL94]</b> and may be called the Gotsman&#8211;Linial Theorems; this work also contains the Gotsman&#8211;Linial Conjecture and Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=917#exno-beat-gl" target="_blank">11</a>. The Conjecture <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#conjltf-weight-1" target="_blank">Conjecture</a> following Theorem 1 should be considered folklore. Corollary <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#corbruck-smolensky" target="_blank">11</a> was proved by Bruck and Smolensky <b>[BS92]</b>; they also essentially proved Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#thmbruck-smolensky" target="_blank">10</a> (but see <b>[SB91]</b>). Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=917#exkrause-pudlak" target="_blank">13</a> is usually credited to Krause and Pudl&aacute;k <b>[KP97]</b>. The upper bound in Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=917#exchow-counting" target="_blank">4</a> is asymptotically sharp <b>[Zue89]</b>. Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#exPTF-extremal1" target="_blank">15</a> is from <b>[OS08a]</b>.</p>
<p>
Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=368#thmmaj-maximizes-deg-1-sum" target="_blank">2.32</a> and Proposition <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=439#propweight-1-same" target="_blank">2.57</a>, discussed in Section <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#secmajority" target="_blank">2</a>, were essentially proved by Titsworth in 1962 <b>[Tit62]</b> (see also <b>[Tit63]</b>). More precisely, Titsworth solved a version of the problem from Exercise <a href="#extitsworth">7</a>. His motivation was in fact the construction of &#8220;interplanetary ranging systems&#8221; for measuring deep space distances; e.g. the distance from Earth to Venus. The connection between ranging systems and boolean functions was suggested by his advisor, Golomb. Titsworth <b>[Tit62]</b> was also the first to compute the Fourier expansion of $\mathrm{Maj}_n$. His approach involved generating functions and contour integration. Other approaches have used special properties of binomial coefficients <b>[Bra87]</b> or of Kravchuk polynomials <b>[Kal02]</b>. The asymptotics of $\mathbf{W}^{k}[\mathrm{Maj}_n]$ described in Section <a href="#secmaj-coefficients">3</a> may have first appeared in <b>[Kal02]</b>, with the error bounds being from <b>[O'D03]</b>. Kravchuk polynomials were introduced in <b>[Kra29]</b>.</p>
<p>
The Berry&#8211;Esseen Theorem is due independently to Berry <b>[Ber41]</b> and Esseen <b>[Ess42]</b>. Shevtsova <b>[She10a]</b> has the record for the smallest known constant $B$ that works therein: $.56$, as mentioned in our theorem statement. The nonuniform version described in Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=934#more-934#exl1-be" target="_blank">32</a> is due to Bikelis <b>[Bik66]</b>. The multidimensional version Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=934#thmmultidim-BE" target="_blank">33</a> stated in Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=934#exltf-stab-error" target="_blank">34</a> is due to Bentkus <b>[Ben04]</b>. Sheppard proved his formula in 1899 <b>[She99]</b>. The results of Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#thmmaj-stab-precise" target="_blank">15</a> may have appeared first in <b>[O'D04,O'D03]</b>.</p>
<p>
The Level-1 Inequality should probably be considered folklore; it was perhaps first published in <b>[Tal96]</b> and we have followed his proof. The first half of the $\frac{2}{\pi}$ Theorem is from <b>[KKMO07]</b>; the second half is from <b>[MORS10]</b>. The previous best closeness result for the FKN Theorem had $\delta/2$ in place of our $\delta/4$. That result was from <b>[FKN02]</b>; their proof (like ours) relies on having a separate proof of closeness $O(\delta)$. Kindler and Safra <b>[KS02]</b> (see also <b>[Kin02]</b>) gave a self-contained proof of the $\delta/2$ bound relying only on the Hoeffding bound. The result of Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=934#exrefined-level-1" target="_blank">6</a> is from <b>[KKMO07]</b>; Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=934#exrocco-problem" target="_blank">40</a> was suggested by Servedio.</p>
<p>
Peres&#8217;s Theorem was published in 2004 <b>[Per04]</b> but was mentioned by Benjamini, Kalai, and Schramm <b>[BKS99]</b>, who proved the worse upper bound $O(\delta^{1/4})$. The proof we presented is a simplification due to Gopalan, and incorporates an idea of <b>[DHK+10,HKM10a]</b>. These latter two works are responsible for the best known bounds in the direction of the Gotsman&#8211;Linial Conjecture, mentioned at the end of Section <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=898#secperes" target="_blank">5</a> (in particular, for Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=934#exgl-weak" target="_blank">43</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.contrib.andrew.cmu.edu/~ryanod/?feed=rss2&#038;p=968</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 5 exercises, continued</title>
		<link>http://www.contrib.andrew.cmu.edu/~ryanod/?p=934</link>
		<comments>http://www.contrib.andrew.cmu.edu/~ryanod/?p=934#comments</comments>
		<pubDate>Fri, 04 May 2012 14:39:34 +0000</pubDate>
		<dc:creator>Ryan O'Donnell</dc:creator>
				<category><![CDATA[Chapter 5: Majority and threshold functions]]></category>
		<category><![CDATA[Exercises]]></category>

		<guid isPermaLink="false">http://www.contrib.andrew.cmu.edu/~ryanod/?p=934</guid>
		<description><![CDATA[<p></p> Complete the proof of Theorem 19 by showing that $(1-\tfrac{k+1}{n} +\tfrac{k}{n^2})^{-1/2} \leq 1+2k/n$ for all $1 \leq k \leq n/2$. Using just the facts that $\mathbf{Stab}_\rho[\mathrm{Maj}_n] \to \frac{2}{\pi} \arcsin \rho$ for all $\rho \in [-1,1]$ and that $\mathbf{Stab}_\rho[\mathrm{Maj}_n] = \sum_{k \geq 0} \mathbf{W}^{k}[\mathrm{Maj}_n] \rho^k$, deduce that $\lim_{n \to \infty} \mathbf{W}^{k}[\mathrm{Maj}_n] \to [\rho^k](\frac{2}{\pi} \arcsin \rho)$ [...]]]></description>
			<content:encoded><![CDATA[<p><span id="more-934"></span></p>
<ol start="25">
<li> <a name="exmaj-annoying"></a> Complete the proof of Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=877#thmmaj-asymptotic-weight" target="_blank">19</a> by showing that $(1-\tfrac{k+1}{n} +\tfrac{k}{n^2})^{-1/2} \leq 1+2k/n$ for all $1 \leq k \leq n/2$.
<li> <a name="exmaj-stab-series"></a> Using just the facts that $\mathbf{Stab}_\rho[\mathrm{Maj}_n] \to \frac{2}{\pi} \arcsin \rho$ for all $\rho \in [-1,1]$ and that $\mathbf{Stab}_\rho[\mathrm{Maj}_n] = \sum_{k \geq 0} \mathbf{W}^{k}[\mathrm{Maj}_n] \rho^k$, deduce that $\lim_{n \to \infty} \mathbf{W}^{k}[\mathrm{Maj}_n] \to [\rho^k](\frac{2}{\pi} \arcsin \rho)$ for all $k \in {\mathbb N}$. (Hint: by induction on $k$, always taking $\rho$ &#8220;small enough&#8221;.)
<li> <a name="exL1-maj"></a> </p>
<ol type="a">
<li> For $0 \leq j \leq m$ integers, show that $\hat{\lVert} \mathrm{Maj}_{2m+1}^{=2j+1} \hat{\rVert}_1 = \binom{m}{j} \frac{1}{2j+1} \cdot \frac{2m+1}{2^{2m}}\binom{2m}{m}$.
<li> Deduce that $\hat{\lVert} \mathrm{Maj}_{2m+1} \hat{\rVert}_1 = \mathop{\bf E}\left[\frac{1}{2\boldsymbol{X}+1}\right] \cdot \frac{2m+1}{2^{m}}\binom{2m}{m}$, where $\boldsymbol{X} \sim \mathrm{Binomial}(m,1/2)$.
<li> Deduce that $\hat{\lVert} \mathrm{Maj}_n \hat{\rVert}_1 \sim \frac{2}{\sqrt{\pi}} \frac{1}{\sqrt{n}} 2^{n/2}$.
</ol>
<li> <a name="exmaj-asympt-asympt"></a>
<ol type="a">
<li> Show that for each odd $k \in {\mathbb N}$, \[ \left(\tfrac{2}{\pi}\right)^{3/2} k^{-3/2} \leq [\rho^k] (\tfrac{2}{\pi} \arcsin \rho) \leq \left(\tfrac{2}{\pi}\right)^{3/2} k^{-3/2}(1+O(1/k)). \] (Hint: Stirling&#8217;s approximation.)
<li> Prove Corollary <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=877#cormaj-asympt-asympt" target="_blank">20</a>. (Hint: for the second statement you&#8217;ll need to approximate the sum $\sum_{\text{odd } j &gt; k} \left(\tfrac{2}{\pi}\right)^{3/2} j^{-3/2}$ by an integral.)
</ol>
<li> <a name="exkravchuk"></a> For integer $0 \leq k \leq n$, define $\mathcal{K}_k : \{-1,1\}^n \to {\mathbb R}$ by $\mathcal{K}_k(x) = \sum_{|S| = k} x^S$. Since $\mathcal{K}_k$ is symmetric, the value $\mathcal{K}_k(x)$ depends only on the number $z$ of $-1$&#8217;s in $x$; or equivalently, on $\sum_{i=1}^n x_i$. Thus we may define $K_k : \{0, 1, \dots, n\} \to {\mathbb R}$ by $K_k(z) = \mathcal{K}_k(x)$ for any $x$ with $\sum_i x_i = n &#8211; 2z$.
<ol type="a">
<li> Show that $K_k(z)$ can be expressed as a degree-$k$ polynomial in $z$. It is called the <em>Kravchuk (<em>or</em> Krawtchouk) polynomial</em> of degree $k$. (The dependence on $n$ is usually implicit.)
<li> Show that $\sum_{k=0}^n \mathcal{K}_k(x) = 2^n \cdot 1_{(1, \dots, 1)}(x)$.
<li> Show for $\rho \in [-1,1]$ that $\sum_{k=0}^n \mathcal{K}_k(x)\rho^k = 2^n\mathop{\bf Pr}[\boldsymbol{y} = (1, \dots, 1)]$, where $\boldsymbol{y} = N_\rho(x)$.
<li> Deduce the generating function identity $K_k(z) = [\rho^k]((1-\rho)^z(1+\rho)^{n-z})$.
</ol>
<li> <a name="excube-weight1"></a> Prove Proposition <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=885#propcube-weight1" target="_blank">21</a>.
<li> <a name="exball-weight1"></a> Prove Proposition <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=885#propball-weight1" target="_blank">22</a> using the Central Limit Theorem. (Hint for $\mathbf{W}^{1}[f_n]$: use symmetry to show it equals the square of $\mathop{\bf E}[f_n({\boldsymbol{x}})\mathop{{\textstyle \sum}} \frac{1}{\sqrt{n}} {\boldsymbol{x}}_i]$.)
<li> <a name="exl1-be"></a> Consider the setting of Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#thml1-be" target="_blank">13</a>. Let $\boldsymbol{S} = \sum_i a_i {\boldsymbol{x}}_i$ where ${\boldsymbol{x}} \sim \{-1,1\}^n$, and let $\boldsymbol{Z} \sim \mathrm{N}(0,1)$. </p>
<ol type="a">
<li> Show that $\mathop{\bf Pr}[|\boldsymbol{S}| \geq t]$, $\mathop{\bf Pr}[|\boldsymbol{Z}| \geq t] \leq 2\exp(-t^2/2)$ for all $t \geq 0$.
<li> Recalling $\mathop{\bf E}[|\boldsymbol{Y}|] = \int_0^\infty \mathop{\bf Pr}[|\boldsymbol{Y}| \geq t]\,dt$ for any random variable $\boldsymbol{Y}$, use the Berry&#8211;Esseen Theorem (and Remark <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#remsimpler-BE" target="_blank">12</a>, Exercise 16) to show \[ \Bigl|\mathop{\bf E}[|\boldsymbol{S}|] &#8211; \mathop{\bf E}[|\boldsymbol{Z}|]\Bigr| \leq O(\epsilon T + \exp(-T^2/2)) \] for any $T \geq 1$.
<li> Deduce $|\mathop{\bf E}[|\boldsymbol{S}|] &#8211; \sqrt{2/\pi}| \leq O(\epsilon\sqrt{\log(1/\epsilon)})$.
<li> Improve $O(\epsilon\sqrt{\log(1/\epsilon)})$ to the bound $O(\epsilon)$ stated in Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#thml1-be" target="_blank">13</a> by using the <em>nonuniform Berry&#8211;Esseen Theorem</em>, which states that the bound $B\beta$ in the Berry&#8211;Esseen Theorem can be improved to $C\beta\cdot\frac{1}{1+|u|^3}$ for some constant $C$.
</ol>
<li> Consider the sequence of LTFs defined in Proposition <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=885#propball-weight1" target="_blank">22</a>. Show that \[ \lim_{n \to \infty} \mathbf{Stab}_\rho[f_n] = \Lambda_\rho(\mu). \] Here $\mu = \overline{\Phi}(t)$ and $\Lambda_\rho(\mu)$ is the <em>Gaussian quadrant probability</em> defined by $\Lambda_\rho(\mu) = \mathop{\bf Pr}[\boldsymbol{z}_1 &gt; t, \boldsymbol{z}_2 &gt; t]$, where $\boldsymbol{z}_1, \boldsymbol{z}_2$ are standard Gaussians with correlation $\mathop{\bf E}[\boldsymbol{z}_1\boldsymbol{z}_2] = \rho$.
<li> <a name="exltf-stab-error"></a> 																												 In this exercise you will complete the justification of Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#thmltf-stab-error" target="_blank">14</a> using the following multidimensional Berry-Esseen Theorem: </p>
<blockquote><p><b>Theorem 33</b> <a name="thmmultidim-BE"></a> Let $\boldsymbol{X}_1, \dots, \boldsymbol{X}_n$ be independent ${\mathbb R}^d$-valued random vectors, each having mean zero. Write $\boldsymbol{S} = \sum_{i=1}^n \boldsymbol{X}_i$ and assume $\Sigma = \mathop{\bf Cov}[\boldsymbol{S}]$ is invertible. Let $\boldsymbol{Z} \sim \mathrm{N}(0,\Sigma)$ be a $d$-dimensional Gaussian with the same mean and covariance matrix as $\boldsymbol{S}$. Then for all convex sets $U \subseteq {\mathbb R}$, \[ |\mathop{\bf Pr}[\boldsymbol{S} \in U] &#8211; \mathop{\bf Pr}[\boldsymbol{Z} \in U]| \leq C d^{1/4} \beta, \] where $C$ is a universal constant, $\beta = \sum_{i=1}^n \mathop{\bf E}[\|\Sigma^{-1/2} \boldsymbol{X}_i\|_2^3]$, and $\|\cdot\|_2$ denotes the Euclidean norm on ${\mathbb R}^d$. </p></blockquote>
<p><ol type="a">
<li> Let $\Sigma = \begin{bmatrix} 1 &#038; \rho \\ \rho &#038; 1 \end{bmatrix}$ where $\rho \in (-1,1)$. Show that \[ \Sigma^{-1} = \begin{bmatrix} 1 &#038; -\rho \\ 0 &#038; 1 \end{bmatrix}\begin{bmatrix} 1 &#038; 0 \\ 0 &#038; (1 -\rho^2)^{-1} \end{bmatrix} \begin{bmatrix} 1 &#038; 0 \\ -\rho &#038; 1 \end{bmatrix}. \]
<li> Compute $y^\top \Sigma^{-1} y$ for $y = \begin{bmatrix} \pm a \\ \pm a \end{bmatrix} \in {\mathbb R}^2$.
<li> Complete the proof of Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#thmltf-stab-error" target="_blank">14</a>.
</ol>
<li> <a name="exfkn-optimality"></a> Show that Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=885#thmFKN-improvement" target="_blank">30</a> is essentially optimal by exhibiting functions $f : \{-1,1\}^n \to \{-1,1\}$ with $\widehat{f}(1) = 1-\delta/2$ and $\mathbf{W}^{1}[f] \geq 1 &#8211; \delta + \Omega(\delta^2 \log(1/\delta))$, for a sequence of $\delta$ tending to $0$.
<li> <a name="exlevel-1-quick"></a> Prove Corollary <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=885#corlevel-1-quick" target="_blank">29</a>.
<li> <a name="exFKN-improvement"></a> Fill in the details of the proof of Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=885#thmFKN-improvement" target="_blank">30</a>.
<li> <a name="exgl-equiv"></a> From Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=898#thmns-to-as" target="_blank">31</a> we know that the statement &#8220;$\mathbf{I}[f] \leq O_k(1) \sqrt{n}$ for all $f \in \mathrm{PTF}_n$&#8221; implies the statement &#8220;$\mathbf{NS}_{\delta}[f] \leq O_k(1) \sqrt{\delta}$ for all $f \in \bigcup_n \mathrm{PTF}_{n,k}$&#8221;. Show also the converse implication. (Hint: Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=494#exaverage-influence" target="_blank">2.37</a>.)
<li> Estimate carefully the asymptotics of $\mathbf{I}[f]$, where $f \in \mathrm{PTF}_{n,k}$ is as in the strongest form of the Gotsman&#8211;Linial Conjecture.
<li> <a name="exrocco-problem"></a> Let $A, B \subseteq \{-1,1\}^n$ have cardinality $\alpha 2^n$ each, $\alpha \leq 1/2$. Thinking of $\{-1,1\}^n \subset {\mathbb R}^n$, let $\mu_A, \mu_B \in {\mathbb R}^n$ be the centers of mass of $A$, $B$. Show that these centers are close in Euclidean distance: $\|\mu_A &#8211; \mu_B\|_2 \leq O(\sqrt{\log(1/\alpha)})$.
<li> Show that the Gaussian isoperimetric function satisfies $\mathcal{U} \cdot \mathcal{U}\ \prime\prime = -1$ on $(0,1)$. Deduce that $\mathcal{U}$ is concave.
<li> <a name="exrefined-level-1"></a> Fix $\alpha \in (0, 1/2)$. Let $f : \{-1,1\}^n \to [-1,1]$ satisfy $\mathop{\bf E}[|f|] \leq \alpha$ and $|\widehat{f}(i)| \leq \epsilon$ for all $i \in [n]$. Show that $\mathbf{W}^{1}[f] \leq \mathcal{U}(\alpha) + C \epsilon$, where $\mathcal{U}$ is the Gaussian isoperimetric function and where the constant $C$ may depend on $\alpha$. (Hint: you will need the nonuniform Berry&#8211;Esseen Theorem from Exercise 32.)
<li> <a name="exgl-weak"></a> In this exercise you will show by induction on $k$ that $\mathbf{Inf}[f] \leq 2 n^{1-1/2^k}$ for all $f : \{-1,1\}^n \to \{-1,1\}$ which are degree-$k$ PTFs. The $k = 0$ case is trivial. So for $k &gt; 0$, suppose $f = \mathrm{sgn}(p)$ where $p : \{-1,1\}^n \to {\mathbb R}$ is a degree-$k$ polynomial which is never $0$. </p>
<ol type="a">
<li> Show for $i \in [n]$ that $\mathop{\bf E}[f({\boldsymbol{x}}) {\boldsymbol{x}}_i \mathrm{sgn}(\mathrm{D}_i p({\boldsymbol{x}}))] = \mathbf{Inf}_i[f]$. (Hint: first use the decomposition $f = x_i \mathrm{D}_i f + \mathrm{E}_i f$ to reach $\mathop{\bf E}[\mathrm{D}_i f \cdot \mathrm{sgn}(\mathrm{D}_i p)]$; then show that $\mathrm{D}_if = \mathrm{sgn}(\mathrm{D}_i p)$ whenever $\mathrm{D}_i f \neq 0$.)
<li> Conclude that $\mathbf{I}[f] \leq \mathop{\bf E}[|\sum_i {\boldsymbol{x}}_i \mathrm{sgn}(\mathrm{D}_i p({\boldsymbol{x}}))|]$. Remark: when $k = 2$ and thus each $\mathrm{sgn}(\mathrm{D}_i p)$ is an LTF, it is conjectured that this bound is still $O(\sqrt{n})$.
<li> Apply Cauchy&#8211;Schwarz and deduce \[ \mathbf{I}[f] \leq \sqrt{n + \sum_{i \neq j} \mathop{\bf E}[{\boldsymbol{x}}_i {\boldsymbol{x}}_j \mathrm{sgn}(\mathrm{D}_i p({\boldsymbol{x}})) \mathrm{sgn}(\mathrm{D}_j p({\boldsymbol{x}}))]}. \]
<li> Use Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=465#exxixj" target="_blank">2.15</a> and the AM-GM inequality to obtain $\mathbf{I}[f] \leq \sqrt{n + \sum_i \mathbf{I}[\mathrm{sgn}(\mathrm{D}_i p)]}$.
<li> Complete the induction.
</ol>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.contrib.andrew.cmu.edu/~ryanod/?feed=rss2&#038;p=934</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Various notes</title>
		<link>http://www.contrib.andrew.cmu.edu/~ryanod/?p=940</link>
		<comments>http://www.contrib.andrew.cmu.edu/~ryanod/?p=940#comments</comments>
		<pubDate>Thu, 03 May 2012 03:30:18 +0000</pubDate>
		<dc:creator>Ryan O'Donnell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.contrib.andrew.cmu.edu/~ryanod/?p=940</guid>
		<description><![CDATA[<p>Russell Impagliazzo, Cris Moore, and Alex Russell just posted their new proof of the sharp form of the &#8220;Level 1 Inequality&#8221; (AKA Talagrand&#8217;s Lemma, AKA Chang&#8217;s Lemma). It&#8217;s completely beautiful, and basically 3 lines long. What&#8217;s doubly cool is that they pretty much came up with it on the spot during a lecture on the [...]]]></description>
			<content:encoded><![CDATA[<p>Russell Impagliazzo, Cris Moore, and Alex Russell just posted their <a href ="http://arxiv.org/abs/1205.0263">new proof</a> of the sharp form of the &#8220;<a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=885">Level 1 Inequality</a>&#8221; (AKA Talagrand&#8217;s Lemma, AKA Chang&#8217;s Lemma).  It&#8217;s completely beautiful, and basically 3 lines long.  What&#8217;s doubly cool is that they pretty much came up with it on the spot during a lecture on the topic at <a href = "http://www.cs.mcgill.ca/~denis/barbados.html">Denis Th&eacute;rien&#8217;s 2012 Barbados workshop</a>. </p>
<p>Speaking of which, Li-Yang Tan heroically scribed those Barbados lectures and did a terrific job.  You can find the resulting <a href = "http://arxiv.org/abs/1205.0314">notes on analysis of boolean functions here</a>.</p>
<p>Finally, the <a href = "http://arxiv.org/abs/1204.6447">open problems in analysis of boolean functions</a>, compiled at the Simons Symposium, have been arXived.   Daniel Kane even solved one of them in the meantime!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.contrib.andrew.cmu.edu/~ryanod/?feed=rss2&#038;p=940</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Chapter 5 exercises</title>
		<link>http://www.contrib.andrew.cmu.edu/~ryanod/?p=917</link>
		<comments>http://www.contrib.andrew.cmu.edu/~ryanod/?p=917#comments</comments>
		<pubDate>Sat, 28 Apr 2012 21:11:54 +0000</pubDate>
		<dc:creator>Ryan O'Donnell</dc:creator>
				<category><![CDATA[Chapter 5: Majority and threshold functions]]></category>
		<category><![CDATA[Exercises]]></category>

		<guid isPermaLink="false">http://www.contrib.andrew.cmu.edu/~ryanod/?p=917</guid>
		<description><![CDATA[<p></p> </p> Suppose $f : \{-1,1\}^n \to \{-1,1\}$ is an LTF. Show that it can be expressed as $f(x) = \mathrm{sgn}(a_0 + a_1 x_1 + \cdots a_n x_n)$ where the $a_i$&#8217;s are integers. (Hint: first obtain rational $a_i$&#8217;s by a perturbation.) Show also that a degree-$d$ PTF has a representation in which all of the [...]]]></description>
			<content:encoded><![CDATA[<p><span id="more-917"></span></p>
<ol>
<li> <a name="exLTF-integer"></a> </p>
<ol type="a">
<li> Suppose $f : \{-1,1\}^n \to \{-1,1\}$ is an LTF. Show that it can be expressed as $f(x) = \mathrm{sgn}(a_0 + a_1 x_1 + \cdots a_n x_n)$ where the $a_i$&#8217;s are integers. (Hint: first obtain rational $a_i$&#8217;s by a perturbation.)
<li> Show also that a degree-$d$ PTF has a representation in which all of the degree-$d$ polynomial&#8217;s coefficients are integers.
</ol>
<li> Let $f(x) = \mathrm{sgn}(a_0 + a_1 x_1 + \cdots a_n x_n)$ be an LTF.
<ol type="a">
<li> Show that if $a_0 = 0$ then $\mathop{\bf E}[f] = 0$. (Hint: show that $f$ is in fact an odd function.)
<li> Show that if $a_0 \geq 0$ then $\mathop{\bf E}[f] \geq 0$. Show that the converse need not hold.
<li> Suppose $g : \{-1,1\}^n \to \{-1,1\}$ is an LTF with $\mathop{\bf E}[f] = 0$. Show that $g$ can be represented as $g(x) = \mathrm{sgn}(c_1 x_1 + \cdots + c_n x_n)$.
</ol>
<li> Suppose $f(x) = \mathrm{sgn}(a_0 + a_1 x_1 + \cdots a_n x_n)$ is an LTF with $|a_1| \geq |a_2| \geq \cdots \geq |a_n|$. Show that $\mathbf{Inf}_1[f] \geq \mathbf{Inf}_2[f] \geq \cdots \geq \mathbf{Inf}_n[f]$. (Hint: why does it suffice to prove this for $n=2$?)
<li> <a name="exchow-counting"></a>
<ol type="a">
<li> Show that the number of functions $f : \{-1,1\}^n \to \{-1,1\}$ which are LTFs is at most $2^{n^2 + O(n)}$. (Hint: Chow&#8217;s Theorem.)
<li> More generally, show that the number of functions $f : \{-1,1\}^n \to \{-1,1\}$ which are degree-$k$ PTFs is at most $2^{n^{k+1} + O(n)}$.
</ol>
<li> <a name="exfinish-gotsman-linial1"></a>
<ol type="a">
<li> Suppose $\ell : \{-1,1\}^n \to {\mathbb R}$ is defined by $\ell(x) = a_0 + a_1 x_1 + \cdots a_n x_n$. Define $\widetilde{\ell} : \{-1,1\}^{n+1} \to {\mathbb R}$ by $\widetilde{\ell}(x_0, \dots, x_n) = a_0 x_0 + \ell(x)$. Show that $\|\widetilde{\ell}\|_1 = \|\ell\|_1$ and $\|\widetilde{\ell}\|^2_2 = \|\ell\|_2^2$.
<li> Complete the proof of Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#thmgotsman-linial1" target="_blank">2</a>.
</ol>
<li> <a name="exLTF-KKL"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$ be an unbiased linear threshold function. Show that $\mathbf{Inf}_i[f] \geq \frac{1}{\sqrt{2 n}}$ for some $i \in [n]$, improving the KKL Theorem for LTFs.
<li> <a name="extitsworth"></a> Consider the following &#8220;correlation distillation&#8221; problem (cf. Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=494#exnicd">2.49</a>). For each $i \in [n]$ there is a number $\rho_i \in [-1,1]$ and an independent sequence of pairs of $\rho_i$-correlated bits, $(\boldsymbol{a}_1^{(1)}, \boldsymbol{b}_2^{(1)})$, $(\boldsymbol{a}_1^{(2)}, \boldsymbol{b}_2^{(2)})$, $(\boldsymbol{a}_1^{(3)}, \boldsymbol{b}_2^{(3)})$, etc. Party $A$ on Earth has access to the stream of bits $\boldsymbol{a}_1^{(1)}$, $\boldsymbol{a}_1^{(2)}$, $\boldsymbol{a}_1^{(3)}$,$\dots$. and a party $B$ on Venus has access to the stream $\boldsymbol{b}_1^{(1)}$, $\boldsymbol{b}_1^{(2)}$, $\boldsymbol{b}_1^{(3)}$, $\dots$. Neither party knows the numbers $\rho_1, \dots, \rho_n$. The goal is for $B$ to estimate these correlations. To assist in this, $A$ can send a small number of bits to $B$. A reasonable strategy is for $A$ to send $f(\boldsymbol{a}^{(1)})$, $f(\boldsymbol{a}^{(2)})$, $f(\boldsymbol{a}^{(3)})$, $\dots$ to $B$, where $f : \{-1,1\}^n \to \{-1,1\}$ is some boolean function. Using this information $B$ can try to estimate $\mathop{\bf E}[f(\boldsymbol{a}) \boldsymbol{b}_i]$ for each $i$.
<ol type="a">
<li> Show that $\mathop{\bf E}[f(\boldsymbol{a}) \boldsymbol{b}_i] = \widehat{f}(i) \rho_i$.
<li> <a name="extitsworthb"></a> This motivates choosing an $f$ for which all $\widehat{f}(i)$ are large. If we also insist all $\widehat{f}(i)$ be equal, show that majority functions $f$ maximize this common value.
</ol>
<li> <a name="exrandom-Fourier2"></a> For $n \geq 2$, let $\boldsymbol{f} : \{-1,1\}^n \to \{-1,1\}$ be a randomly chosen function (as in Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=301#exrandom-Fourier" target="_blank">1.8</a>). Show that $\hat{\lVert} \boldsymbol{f} \hat{\rVert}_\infty \leq 2\sqrt{n} 2^{-n/2}$ except with probability at most $2^{-n}$.
<li> <a name="exptf-chow"></a> Prove Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#thmptf-chow" target="_blank">6</a>.
<li> <a name="exparity-ptf"></a> </p>
<ol type="a">
<li> Give as simple a proof as you can that the parity function $\chi_{[n]} : \{-1,1\}^n \to \{-1,1\}$ is not a PTF of degree $n-1$.
<li> Show that if $f : \{-1,1\}^n \to \{-1,1\}$ is not $\pm \chi_{[n]}$ then it <em>is</em> a PTF of degree $n-1$. (Hint: consider $f^{\leq n-1}$.)
</ol>
<li> <a name="exno-beat-gl"></a> For each $k \in {\mathbb N}^+$, show that there is a degree-$k$ PTF $f$ with $\mathbf{W}^{\leq k}[f] &lt; 2^{1-k}$.
<li> <a name="exbruck-separation"></a> In this exercise you will show that threshold-of-parities circuits can be effectively simulated by threshold-of-threshold circuits, but not the converse.
<ol type="a">
<li> Let $f : \{-1,1\}^n \to \{-1,1\}$ be a symmetric function. Show that $f$ is computable as the <em>sum</em> of at most $2n$ LTFs, plus a constant.
<li> Deduce that if $f : \{-1,1\}^n \to \{-1,1\}$ is computable by a size-$s$ threshold-of-parities circuit then it is also computable by a size-$2ns$ threshold-of-thresholds circuit.
<li> Show that the complete quadratic function $\mathrm{CQ}_n : {\mathbb F}_2^{n} \to \{-1,1\}$ (see Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=301#excompute-expansions" target="_blank">1.1</a>) is computable by a size-$2n$ threshold-of-thresholds circuit.
<li> Assume $n$ even. Show that any threshold-of-parities circuit for $\mathrm{CQ}_n$ requires size $2^{n/2}$.
</ol>
<li> <a name="exkrause-pudlak"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$ be computable by a DNF of size $s$. Show that $f$ has a PTF representation of sparsity $O(n s^2)$. (Hint: approximate the ANDs using Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#thmbruck-smolensky" target="_blank">10</a>.)
<li> <a name="exbruck-separation2"></a> In contrast to the previous exercise, show that there is a function $f : \{-1,1\}^n \to \{-1,1\}$ computable by a depth-$3$ $\mathsf{AC}^0$ circuit (see Chapter <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=819" target="_blank">4.5</a>) but requiring threshold-of-parities circuits of size at least $n^{\log n}$. (Hint: involve the inner product mod $2$ function and Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=833#excompute-parity" target="_blank">4.12</a>.)
<li> <a name="exPTF-extremal1"></a> Let $\mathcal{F}$ be a nonempty collection of subsets $S \subseteq [n]$. For each $a \in \{-1,1\}^n$, write $1_{\{a\}} : \{-1,1\}^n \to \{0,1\}$ for the indicator of $\{a\}$, write $1_{\{a\}}^\mathcal{F} : \{-1,1\}^n \to {\mathbb R}$ for $\sum_{S \in \mathcal{F}} \widehat{1_{\{a\}}}(S)\,\chi_S$, and write $\psi_a = \frac{2^n}{|\mathcal{F}|} \cdot 1_{\{a\}}^\mathcal{F}$. </p>
<ol type="a">
<li> Show that $\psi_a(a) = 1$ and $\mathop{\bf E}[\psi_a^2] = \tfrac{1}{|\mathcal{F}|}$. Show also that for all $x \in \{-1,1\}^n$, $\psi_a(x) = \psi_x(a)$ and $\sum_{a : a \neq x} \psi_a(x)^2 = \frac{2^n}{|\mathcal{F}|} &#8211; 1$.
<li> Fix $0 &lt; \epsilon &lt; 1$ and suppose that $|\mathcal{F}| \geq (1 &#8211; \tfrac{\epsilon^2}{6n}) 2^n$. Let $\boldsymbol{f} : \{-1,1\}^n \to \{-1,1\}$ be a random function as in Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=301#exrandom-Fourier" target="_blank">1.8</a>. Show that for each $x \in \{-1,1\}^n$, except with probability at most $4^{-n}$ it holds that $|\sum_{a : a \neq x} \boldsymbol{f}(a) \psi_a(x)| &lt; \epsilon$.
<li> Deduce that for all but a $2^{-n}$ fraction of functions $f : \{-1,1\}^n \to \{-1,1\}$, there a multilinear polynomial $q : \{-1,1\}^n \to {\mathbb R}$ supported on the monomials $\{\chi_S : S \in \mathcal{F}\}$ such that $\|f &#8211; q\|_\infty &lt; \epsilon$.
<li> Deduce that all but a $2^{-n}$ fraction of functions $f : \{-1,1\}^n \to \{-1,1\}$ have PTF representation of degree at most $n/2 + O(\sqrt{n \log n})$.
</ol>
<li> <a name="exBE-interval"></a>
<ol type="a">
<li> Show that in the Berry&#8211;Esseen Theorem we can also conclude \[ |\mathop{\bf Pr}[\boldsymbol{S} &lt; u] &#8211; \mathop{\bf Pr}[\boldsymbol{Z} &lt; u]| \leq B \epsilon. \] (Hint: you&#8217;ll need that $\lim_{\delta \to 0^+} \mathop{\bf Pr}[\boldsymbol{Z} \leq u - \delta] = \mathop{\bf Pr}[\boldsymbol{Z} \leq u]$.)
<li> Deduce that if $I \subseteq {\mathbb R}$ is any interval we can also conclude 				\[ 					|\mathop{\bf Pr}[\boldsymbol{S} \in I] &#8211; \mathop{\bf Pr}[\boldsymbol{Z} \in I]| \leq 2B \beta. 				\]
</ol>
<li> <a name="exno-restrict"></a> Show that the assumptions $\mathop{\bf E}[\boldsymbol{X}_i] = 0$ and $\sum_{i=1}^n \mathop{\bf Var}[\boldsymbol{X}_i] = 1$ in the Berry&#8211;Esseen Theorem are not restrictive, as follows. Let $\boldsymbol{X}_1, \dots, \boldsymbol{X}_n$ be independent random variables with finite means and variances. Let $\boldsymbol{S} = \sum_{i=1}^n \boldsymbol{X}_i$ and let $\boldsymbol{Z} \sim \mathrm{N}(\mu,\sigma^2)$, where $\mu = \sum_{i=1}^n \mathop{\bf E}[\boldsymbol{X}_i]$ and $\sigma^2 = \sum_{i=1}^n \mathop{\bf Var}[\boldsymbol{X}_i]$. Assuming $\sigma^2 &gt; 0$, show that for all $u \in {\mathbb R}$, \[ |\mathop{\bf Pr}[\boldsymbol{S} \leq u] &#8211; \mathop{\bf Pr}[\boldsymbol{Z} \leq u]| \leq B \epsilon / \sigma^3, \] where \[ \epsilon = \sum_{i=1}^n \|\boldsymbol{X}_i - \mathop{\bf E}[\boldsymbol{X}_i]\|_3^3. \]
<li> <a name="exarcsin"></a> </p>
<ol type="a">
<li> Use the generalized Binomial Theorem to compute the power series for $(1-z^2)^{-1/2}$, valid for $|z| &lt; 1$.
<li> Integrate to obtain the power series for $\arcsin z$ given in <a href = "http://www.contrib.andrew.cmu.edu/~ryanod/?p=877" target="_blank">Section 3</a>, valid for $|z| &lt; 1$.
<li> Confirm that equality holds also for $z = \pm 1$.
</ol>
<li> <a name="exSvec-cov"></a> Verify that the random vector $\vec{\boldsymbol{S}}$ defined in equation (6) of <a href = "http://www.contrib.andrew.cmu.edu/~ryanod/?p=866" target="_blank">Section 2</a> has $\mathop{\bf E}[\vec{\boldsymbol{S}}_1] = \mathop{\bf E}[\vec{\boldsymbol{S}}_2] = 0$, $\mathop{\bf E}[\vec{\boldsymbol{S}}_1^2] = \mathop{\bf E}[\vec{\boldsymbol{S}}_2^2] = 1$, $\mathop{\bf E}[\vec{\boldsymbol{S}}_1\vec{\boldsymbol{S}}_2] = \rho$; i.e., $\mathop{\bf E}[\vec{\boldsymbol{S}}] = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$ and $\mathop{\bf Cov}[\vec{\boldsymbol{S}}] = \begin{bmatrix} 1 &#038; \rho \\ \rho &#038; 1 \end{bmatrix}$.
<li> <a name="exsheppard"></a> In this exercise you will prove Sheppard&#8217;s Formula. 		</p>
<ol type="a">
<li> Let $\boldsymbol{g}_1, \boldsymbol{g}_2$ be independent standard Gaussians and let 			 \[ 					\boldsymbol{g}_1' = \boldsymbol{g}_1, \qquad \boldsymbol{g}_2' = \rho \boldsymbol{g}_1 + \sqrt{1-\rho^2} \boldsymbol{g}_2. 				\] 				Show that $(\boldsymbol{g}_1&#8242;, \boldsymbol{g}_2&#8242;)$ has the same distribution as the random vector $(\boldsymbol{z}_1, \boldsymbol{z}_2)$ in Sheppard&#8217;s formula.
<li> Assume for now that $\rho \in [0,1)$. Show that $\boldsymbol{g}_1&#8242;, \boldsymbol{g}_2&#8242; &gt; 0$ if and only if $(\boldsymbol{g}_1, \boldsymbol{g}_2)$ is in a certain region $R \subseteq {\mathbb R}^2$ consisting of a quadrant plus a wedge of angle $\arcsin \rho$.
<li> Using the circular symmetry of the random vector $(\boldsymbol{g}_1, \boldsymbol{g}_2)$, deduce Sheppard&#8217;s formula for $\rho \in [0,1)$.
<li> Treat the cases $\rho \in (-1,0)$, $\rho = \pm 1$.
</ol>
<li> <a name="exmaj-coeffs-symm"></a> Prove Corollary <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=877#cormaj-coeffs-symm" target="_blank">17</a>.
<li> <a name="exmaj-coeffs-decr"></a> Fix $n$ odd. Using Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=877#thmmaj-coeffs" target="_blank">16</a> show that $|\widehat{\mathrm{Maj}_n}(S)|$ is a decreasing function of $|S|$ for odd $1 \leq |S| \leq \frac{n-1}{2}$. Deduce (using also Corollary <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=877#cormaj-coeffs-symm" target="_blank">17</a>) that $\hat{\lVert} \mathrm{Maj}_n \hat{\rVert}_\infty = \mathrm{Maj}_n(\{1\}) \sim \frac{\sqrt{2/\pi}}{\sqrt{n}}$.
<li> <a name="exmaj-weight-decreasing"></a> Prove Corollary <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=877#cormaj-weight-decreasing" target="_blank">18</a>.
<li> <a name="exmaj-stab-error2"></a> 							 Prove Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#thmmaj-stab-precise" target="_blank">15</a>. (Hint: Corollary <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=877#cormaj-weight-decreasing" target="_blank">18</a>). </p>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.contrib.andrew.cmu.edu/~ryanod/?feed=rss2&#038;p=917</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>§5.5: Highlight: Peres&#8217;s Theorem</title>
		<link>http://www.contrib.andrew.cmu.edu/~ryanod/?p=898</link>
		<comments>http://www.contrib.andrew.cmu.edu/~ryanod/?p=898#comments</comments>
		<pubDate>Fri, 20 Apr 2012 21:10:42 +0000</pubDate>
		<dc:creator>Ryan O'Donnell</dc:creator>
				<category><![CDATA[All chapter sections]]></category>
		<category><![CDATA[Book]]></category>
		<category><![CDATA[Chapter 5: Majority and threshold functions]]></category>
		<category><![CDATA[linear threshold function]]></category>
		<category><![CDATA[Majority Is Least Stable Conjecture]]></category>
		<category><![CDATA[noise sensitivity]]></category>
		<category><![CDATA[Peres's Theorem]]></category>
		<category><![CDATA[polynomial threshold function]]></category>

		<guid isPermaLink="false">http://www.contrib.andrew.cmu.edu/~ryanod/?p=898</guid>
		<description><![CDATA[<p> Theorem 14 says that if $f$ is an unbiased linear threshold function $f(x) = \mathrm{sgn}(a_1 x_1 + \cdots + a_n x_n)$ in which all $a_i$&#8217;s are &#8220;small&#8221; then the noise stability $\mathbf{Stab}_\rho[f]$ is at least (roughly) $\frac{2}{\pi} \arcsin \rho$. Rephrasing in terms of noise sensitivity, this means $\mathbf{NS}_\delta[f]$ is at most (roughly) $\tfrac{2}{\pi} \sqrt{\delta} [...]]]></description>
			<content:encoded><![CDATA[<p> <a name="secperes"></a> Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#thmltf-stab-error" target="_blank">14</a> says that if $f$ is an unbiased linear threshold function $f(x) = \mathrm{sgn}(a_1 x_1 + \cdots + a_n x_n)$ in which all $a_i$&#8217;s are &#8220;small&#8221; then the noise stability $\mathbf{Stab}_\rho[f]$ is at least (roughly) $\frac{2}{\pi} \arcsin \rho$. Rephrasing in terms of noise sensitivity, this means $\mathbf{NS}_\delta[f]$ is at most (roughly) $\tfrac{2}{\pi} \sqrt{\delta} + O(\delta^{3/2})$ (see the statement of Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=406#thmmaj-stab" target="_blank">2.44</a>). <span id="more-898"></span> On the other hand, if some $a_i$ were particularly <em>large</em> then $f$ would be pushed in the direction of the dictator function $\chi_i$, which has $\mathbf{NS}_\delta[\chi_i] = \delta \ll \sqrt{\delta}$. This observation suggests that <em>all</em> unbiased LTFs $f$ should have $\mathbf{NS}_\delta[f] \leq O(\sqrt{\delta})$. The unbiasedness assumption also seems inessential, since biasing a function should tend to decrease its noise sensitivity.</p>
<p>
Indeed the idea here is correct, as was shown by Peres in 1999: </p>
<blockquote><p> <b>Peres&#8217;s Theorem</b>  Let $f : \{-1,1\}^n \to \{-1,1\}$ be any linear threshold function. Then $\mathbf{NS}_\delta[f] \leq O(\sqrt{\delta})$. </p></blockquote>
<p>
Pleasantly, the proof is quite simple and uses no heavy tools like the Central Limit Theorem. Before getting to it, let&#8217;s remark on an application to learning theory.</p>
<p>
Peres&#8217;s Theorem immediately gives that LTFs have their Fourier spectrum $\epsilon$-concentrated up to degree $O(1/\epsilon^2)$ (Proposition <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=535#propns-concentration" target="_blank">3.3</a>) and hence the class of LTFs is learnable from random examples with error $\epsilon$ in time $n^{O(1/\epsilon^2)}$ (Corollary <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=574#corlearn-ns" target="_blank">3.34</a>). The latter result is not too impressive since it&#8217;s been long known that LTFs are learnable in time $\mathrm{poly}(n, 1/\epsilon)$ using linear programming. However the noise sensitivity approach is much more flexible. Consider the concept class \[ \mathcal{C} = \{h = g(f_1, \dots, f_s) \mid \text{$f_1, \dots, f_s : \{-1,1\}^n \to \{-1,1\}$ are LTFs}\}. \] For each $h : \{-1,1\}^n \to \{-1,1\}$ in $\mathcal{C}$, Peres&#8217;s Theorem and a union bound (Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=494#exns-union-bound" target="_blank">2.38</a>) imply that $\mathbf{NS}_\delta[h] \leq O(s \sqrt{\epsilon})$. Thus from Corollary <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=574#corlearn-ns" target="_blank">3.34</a> we get that the class $\mathcal{C}$ is learnable in time $n^{O(s^2/\epsilon^2)}$. This is the only known way of showing even that an AND of two LTFs is learnable with error $.01$ in time $\mathrm{poly}(n)$.</p>
<p>
The trick for proving Peres&#8217;s Theorem is to employ a fairly general technique for bounding noise sensitivity using <em>average sensitivity</em> (total influence): </p>
<blockquote><p><b>Theorem 31</b> <a name="thmns-to-as"></a> Let $\delta \in (0,1/2]$ and let $A : {\mathbb N}^+ \to {\mathbb R}$. Let $\mathcal{C}$ be a class of boolean-valued functions closed under negation and identification of input variables. Suppose that each $f \in \mathcal{C}$ with domain $\{-1,1\}^n$ has $\mathbf{I}[f] \leq A(n)$. Then each $f \in \mathcal{C}$ has $\mathbf{NS}_\delta[f] \leq \frac{1}{m}A(m)$, where $m = \lfloor 1/\delta \rfloor$. </p></blockquote>
<p> <br/><em>Proof:</em>  Fix any $f : \{-1,1\}^n \to \{-1,1\}$ from $\mathcal{C}$. Since noise sensitivity is an increasing function of the noise parameter (see the discussion surrounding Proposition <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=406#proptinf-is-stab-deriv" target="_blank">2.50</a>) we may replace $\delta$ by $1/m$. Thus our task is to upper-bound $\mathbf{NS}_{1/m}[f] = \mathop{\bf Pr}[f({\boldsymbol{x}}) \neq f(\boldsymbol{y})]$ where ${\boldsymbol{x}} \sim \{-1,1\}^n$ is uniformly random and $\boldsymbol{y} \in \{-1,1\}^n$ is formed from ${\boldsymbol{x}}$ by negating each bit independently with probability $1/m$. The rough idea of the proof is that this is equivalent to randomly partitioning ${\boldsymbol{x}}$&#8217;s bits into $m$ parts and then negating a randomly chosen part.</p>
<p>
 More precisely, let $z \in \{-1,1\}^n$ and let $\pi : [n] \to [m]$ be a partition of $[n]$ into $m$ parts. Define \[ g_{z, \pi} : \{-1,1\}^m \to \{-1,1\}, \quad g_{z, \pi}(w) = f(z \circ w^{\pi}), \] where $\circ$ denotes entry-wise multiplication and $w^{\pi} = (w_{\pi(1)}, \dots, w_{\pi(n)}) \in \{-1,1\}^n$. Since $g_{z, \pi}$ is derived from $f$ by negating and identifying input variables it follows that $g_{z,\pi} \in \mathcal{C}$. So by assumption $g_{z,\pi}$ has total influence $\mathbf{I}[g_{z,\pi}] \leq A(m)$ and hence <em>average</em> influence $\pmb{\mathcal E}[g_{z,\pi}] \leq \frac{1}{m} A(m)$ (see Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=494#exaverage-influence" target="_blank">2.37</a>).</p>
<p>
 Now suppose $\boldsymbol{z} \sim \{-1,1\}^n$ and $\boldsymbol{\pi} : [n] \to [m]$ are chosen uniformly at random. We certainly have \[ \mathop{\bf E}_{\boldsymbol{z}, \boldsymbol{\pi}}[\pmb{\mathcal E}[g_{\boldsymbol{z},\boldsymbol{\pi}}]] \leq \tfrac{1}{m}A(m). \] To complete the proof we will show that the left-hand side above is precisely $\mathbf{NS}_{1/m}[f]$. Recall that in the experiment for average influence $\pmb{\mathcal E}[g]$ we choose $\boldsymbol{w} \sim \{-1,1\}^m$ and $\boldsymbol{j} \sim [m]$ uniformly at random and check if $g(\boldsymbol{w}) \neq g(\boldsymbol{w}^{\oplus \boldsymbol{j}})$. Thus \[ \mathop{\bf E}_{\boldsymbol{z}, \boldsymbol{\pi}}[\pmb{\mathcal E}[g_{\boldsymbol{z},\boldsymbol{\pi}}]] = \mathop{\bf Pr}_{\boldsymbol{z}, \boldsymbol{\pi}, \boldsymbol{w}, \boldsymbol{j}}[g_{\boldsymbol{z}, \boldsymbol{\pi}}(\boldsymbol{w}) \neq g_{\boldsymbol{z}, \boldsymbol{\pi}}(\boldsymbol{w}^{\oplus \boldsymbol{j}})] = \mathop{\bf Pr}_{\boldsymbol{w}, \boldsymbol{\pi}, \boldsymbol{j}, \boldsymbol{z}}[f(\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}}) \neq f(\boldsymbol{z} \circ (\boldsymbol{w}^{\oplus \boldsymbol{j}})^{\boldsymbol{\pi}})]. \] It is not hard to see that the joint distribution of $\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}}$, $\boldsymbol{z} \circ (\boldsymbol{w}^{\oplus \boldsymbol{j}})^{\boldsymbol{\pi}}$ is the same as that of ${\boldsymbol{x}}$, $\boldsymbol{y}$. To be precise, define $\boldsymbol{J} = \boldsymbol{\pi}^{-1}(\boldsymbol{j})$, distributed as a random subset of $[n]$ in which each coordinate is included with probability $1/m$; and, define $\boldsymbol{\lambda} \in \{-1,1\}^n$ by $\boldsymbol{\lambda}_i = -1$ if and only if $i \in \boldsymbol{J}$. Then \[ \mathop{\bf Pr}_{\boldsymbol{w}, \boldsymbol{\pi}, \boldsymbol{j}, \boldsymbol{z}}[f(\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}}) \neq f(\boldsymbol{z} \circ (\boldsymbol{w}^{\oplus \boldsymbol{j}})^{\boldsymbol{\pi}})] = \mathop{\bf Pr}_{\boldsymbol{w}, \boldsymbol{\pi}, \boldsymbol{j}, \boldsymbol{z}}[f(\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}}) \neq f(\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}} \circ \boldsymbol{\lambda})]. \] But for every outcome of $\boldsymbol{w}$, $\boldsymbol{\pi}$, $\boldsymbol{j}$ (and hence $\boldsymbol{J}$, $\boldsymbol{\lambda}$), we may replace $\boldsymbol{z}$ with $\boldsymbol{z} \circ \boldsymbol{w}^{\boldsymbol{\pi}}$ since they have the same distribution, namely uniform on $\{-1,1\}^n$. Then the above becomes \[ \mathop{\bf Pr}_{\boldsymbol{w}, \boldsymbol{\pi}, \boldsymbol{j}, \boldsymbol{z}}[f(\boldsymbol{z}) \neq f(\boldsymbol{z} \circ \boldsymbol{\lambda})] = \mathbf{NS}_{1/m}[f], \] as claimed. $\Box$</p>
<p>
Peres&#8217;s Theorem is now a simple corollary of Theorem <a href="#thmns-to-as">31</a>.</p>
<p>
<br/><em>Proof of Peres&#8217;s Theorem:</em>  Let $\mathcal{C}$ be the class of all linear threshold functions. This class is indeed closed under negating and identifying variables. Since each linear threshold function on $m$ bits is <em>unate</em> (i.e., monotone up to negation of some input coordinates, see Exercises <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=465#exdeg-1-vs-inf" target="_blank">2.5</a>, <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=465#exltf-unate" target="_blank">2.6</a>), its total influence is at most $\sqrt{m}$ (see Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=465#exsimple-mono-tinf" target="_blank">2.19</a>). Applying Theorem <a href="#thmns-to-as">31</a> we get that for any LTF $f$ and any $\delta \in (0,1/2]$, \begin{align*} \mathbf{NS}_\delta[f] \leq \tfrac{1}{m}\sqrt{m} &#038;= 1/\sqrt{m} \qquad \text{(for $m = \lfloor 1/\delta \rfloor$)} \\ &#038;\leq O(\sqrt{\delta}). \qquad \qquad \qquad \qquad \qquad\Box \end{align*} </p>
<blockquote><p><b>Remark 32</b> <a name="remperes"></a> Our proof of Peres&#8217;s Theorem attained the upper-bound $\sqrt{1/\lfloor 1/\delta \rfloor}$. This is at most $\sqrt{3/2} \sqrt{\delta}$ for all $\delta \in (0,1/2]$ and it&#8217;s also $\sqrt{\delta} + O(\delta^{3/2})$ for small $\delta$. To further improve the constant we can use Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=368#thmmaj-maximizes-deg-1-sum" target="_blank">2.32</a> in place of Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=465#exsimple-mono-tinf" target="_blank">2.19</a>; it implies that all unate $m$-bit functions have total influence at most $\sqrt{2/\pi}\sqrt{m} + O(m^{-1/2})$. This lets us obtain the bound $\mathbf{NS}_\delta[f] \leq \sqrt{2/\pi} \sqrt{\delta} + O(\delta^{3/2})$ for all LTF $f$. </p></blockquote>
<p><p>
Recall from Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=406#thmmaj-stab" target="_blank">2.44</a> that $\mathbf{NS}_\delta[\mathrm{Maj}_n] \sim \tfrac{2}{\pi} \sqrt{\delta}$ for large $n$. Thus the constant $\sqrt{2/\pi}$ in the bound from Remark <a href="#remperes">32</a> is fairly close to optimal. It seems quite likely that majority&#8217;s $\tfrac{2}{\pi}$ is the correct constant here. There is still slack in Peres&#8217;s proof because the random functions $g_{\boldsymbol{z}, \boldsymbol{\pi}}$ arising in Theorem <a href="#thmns-to-as">31</a> are unlikely to be majorities, even if $f$ is. The most elegant possible result in this direction would be to prove the following conjecture of Benjamini, Kalai, and Schramm: </p>
<blockquote><p> <b>Majority Is Least Stable Conjecture</b> Let $f : \{-1,1\}^n \to \{-1,1\}$ be a linear threshold function, $n$ odd. Then for all $\rho \in [0,1]$, $\mathbf{Stab}_\rho[f] \geq \mathbf{Stab}_\rho[\mathrm{Maj}_n]$.</p></blockquote>
<p>    (This is a precise statement about majority&#8217;s noise stability within the class of LTFs; the Majority Is Stablest Theorem refers to its noise stability within the class of small-influence functions.) A very important problem in this area is to extend Peres&#8217;s Theorem to <em>polynomial threshold functions</em>. Let \[ \mathrm{PTF}_{n,k} = \{f : \{-1,1\}^n \to \{-1,1\} \mid f \text{ is a PTF of degree at most $k$}\}. \] The goal here is to prove that $\mathbf{NS}_\delta[f] \leq O_k(1) \sqrt{\delta}$ for all $f \in \mathrm{PTF}_{n,k}$. This is currently not known even for $k = 2$. Since $\bigcup_n \mathrm{PTF}_{n,k}$ is closed under negating and identifying variables, a natural approach is to use Theorem <a href="#thmns-to-as">31</a>; it would suffice to show that $\mathbf{I}[f] \leq O_k(1) \sqrt{n}$ for all $f \in \mathrm{PTF}_{n,k}$. In fact, such a total influence bound is also <em>necessary</em>; see the exercises. The desired total influence bound here is a very old conjecture of Gotsman and Linial: </p>
<blockquote><p> <b>Gotsman&#8211;Linial Conjecture</b>  Let $f \in \mathrm{PTF}_{n,k}$. Then $\mathbf{I}[f] \leq O_k(1) \sqrt{n}$. More strongly, $\mathbf{I}[f] \leq O(k) \sqrt{n}$. Most strongly, the $f \in \mathrm{PTF}_{n,k}$ of maximal total influence is the symmetric one $f(x) = \mathrm{sgn}(p(x_1 + \cdots + x_n))$, where $p$ is a degree-$k$ univariate polynomial which alternates sign on the $k+1$ values of $x_1 + \cdots + x_n$ closest to $0$.</p></blockquote>
<p>  The strongest form of the Gotsman&#8211;Linial Conjecture for $k = 1$ is true by Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=368#thmmaj-maximizes-deg-1-sum" target="_blank">2.32</a>. The best results towards the conjecture for higher $k$ are the upper bounds $2n^{1-1/2^k}$ (good for small $k$; see the exercises) and $2^{O(k)} n^{1 &#8211; 1/(Ck)}$ (better for large $k$), where $C$ is a fairly small universal constant. Note that the latter implies $\mathbf{NS}_{\delta}[f] \leq 2^{O(k)} \delta^{1/(Ck)}$ for all $f \in \mathrm{PTF}_{n,k}$, a bound which is at least independent of $n$. 	 </p>
]]></content:encoded>
			<wfw:commentRss>http://www.contrib.andrew.cmu.edu/~ryanod/?feed=rss2&#038;p=898</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>§5.4: Degree-1 weight</title>
		<link>http://www.contrib.andrew.cmu.edu/~ryanod/?p=885</link>
		<comments>http://www.contrib.andrew.cmu.edu/~ryanod/?p=885#comments</comments>
		<pubDate>Fri, 13 Apr 2012 18:15:39 +0000</pubDate>
		<dc:creator>Ryan O'Donnell</dc:creator>
				<category><![CDATA[All chapter sections]]></category>
		<category><![CDATA[Chapter 5: Majority and threshold functions]]></category>
		<category><![CDATA[2/pi Theorem]]></category>
		<category><![CDATA[Chang's Inequality]]></category>
		<category><![CDATA[degree-1]]></category>
		<category><![CDATA[FKN Theorem]]></category>
		<category><![CDATA[Fourier weight]]></category>
		<category><![CDATA[Gaussian isoperimetry]]></category>
		<category><![CDATA[Level-1 Inequality]]></category>
		<category><![CDATA[Majority Is Stablest Theorem]]></category>
		<category><![CDATA[small-set expansion]]></category>

		<guid isPermaLink="false">http://www.contrib.andrew.cmu.edu/~ryanod/?p=885</guid>
		<description><![CDATA[<p> In this section we prove two theorems about the degree-$1$ Fourier weight of boolean functions: \[ \mathbf{W}^{1}[f] = \sum_{i=1}^n \widehat{f}(i)^2. \] This important quantity can be given a combinatorial interpretation thanks to the noise stability formula $\mathbf{Stab}_\rho[f] = \sum_{k\geq 0}\rho^k \cdot \mathbf{W}^{k}[f]$: \[ \text{For $f : \{-1,1\}^n \to {\mathbb R}$,} \quad \mathbf{W}^{1}[f] = \frac{d}{d [...]]]></description>
			<content:encoded><![CDATA[<p> <a name="secweight-level-1"></a> In this section we prove two theorems about the degree-$1$ Fourier weight of boolean functions: \[ \mathbf{W}^{1}[f] = \sum_{i=1}^n \widehat{f}(i)^2. \]<br />
<span id="more-885"></span><br />
This important quantity can be given a combinatorial interpretation thanks to the noise stability formula $\mathbf{Stab}_\rho[f] = \sum_{k\geq 0}\rho^k \cdot \mathbf{W}^{k}[f]$: \[ \text{For $f : \{-1,1\}^n \to {\mathbb R}$,} \quad \mathbf{W}^{1}[f] = \frac{d}{d \rho} \mathbf{Stab}_\rho[f]\Bigm|_{\rho = 0}. \] Thinking of $\|f\|_2$ as constant and $\rho \to 0$, the noise stability formula implies \[ \mathbf{Stab}_\rho[f] = \mathop{\bf E}[f]^2 + \mathbf{W}^{1}[f]\rho \pm O(\rho^2), \] or equivalently, \[ \mathop{\bf Cov}_{\substack{({\boldsymbol{x}}, \boldsymbol{y}) \\ \text{ $\rho$-correlated}}}[f({\boldsymbol{x}}),f(\boldsymbol{y})] = \mathbf{W}^{1}[f] \rho \pm O(\rho^2). \] In other words, for $f : \{-1,1\}^n \to \{-1,1\}$ the degree-$1$ weight quantifies the extent to which $\mathop{\bf Pr}[f({\boldsymbol{x}}) = f(\boldsymbol{y})]$ increases when ${\boldsymbol{x}}$ and $\boldsymbol{y}$ go from being uncorrelated to being slightly correlated.</p>
<p>
There is an additional viewpoint if we think of $f$ as the indicator of a subset $A \subseteq \{-1,1\}^n$ and its noise sensitivity $\mathbf{NS}_{\delta}[f]$ as a notion of $A$&#8217;s &#8220;surface area&#8221;, or &#8220;noisy boundary size&#8221;. For nearly maximal noise rates &#8212; i.e., $\delta = \tfrac{1}{2} &#8211; \tfrac{1}{2} \rho$ where $\rho$ is small &#8212; we have that $A$&#8217;s noisy boundary size is &#8220;small&#8221; if and only if $\mathbf{W}^{1}[f]$ is &#8220;large&#8221; (vis-a-vis $A$&#8217;s measure). Two examples suggest themselves when thinking of subsets of the Hamming cube with small &#8220;boundary&#8221;: subcubes and Hamming balls. </p>
<blockquote><p><b>Proposition 21</b> <a name="propcube-weight1"></a> Let $f : {\mathbb F}_2^n \to \{0,1\}$ be the indicator of a subcube of codimension $k \geq 1$ (e.g., the $\mathrm{AND}_k$ function). Then $\mathop{\bf E}[f] = 2^{-k}$, $\mathbf{W}^{1}[f] = k 2^{-2k}$. </p></blockquote>
<blockquote><p><b>Proposition 22</b> <a name="propball-weight1"></a> Fix $t \in {\mathbb R}$. Consider the sequence of LTFs $f_n : \{-1,1\}^n \to \{0,1\}$ defined by $f_n(x) = 1$ if and only if $\sum_{i=1}^n \tfrac{1}{\sqrt{n}} x_i &gt; t$. (I.e., $f_n$ is the indicator of the Hamming ball $\{x : \mathrm{Dist}(x, (1, \dots, 1)) &lt; \frac{n}{2} &#8211; \frac{t}{2}\sqrt{n}\}$.) Then \[ \lim_{n \to \infty} \mathop{\bf E}[f_n] = \overline{\Phi}(t), \qquad \lim_{n \to \infty} \mathbf{W}^{1}[f_n] = \phi(t)^2. \] </p></blockquote>
<p> You are asked to verify these facts in the exercises. Regarding Proposition <a href="#propball-weight1">22</a>, it&#8217;s natural for $\phi(t)$ to arise since $\mathbf{W}^{1}[f_n]$ is related to the influences of $f_n$, and coordinates are influential for $f_n$ if and only if $\sum_{i=1}^n \tfrac{1}{\sqrt{n}} x_i {\approx t}$. If we write $\alpha = \lim_{n \to \infty} \mathop{\bf E}[f_n]$ then this proposition can be thought of as saying that $\mathbf{W}^{1}[f_n] \to \mathcal{U}(\alpha)^2$, where $\mathcal{U}$ is defined as follows: </p>
<blockquote><p><b>Definition 23</b> The <em>Gaussian isoperimetric function</em> $\mathcal{U} : [0,1] \to [0,\frac{1}{\sqrt{2\pi}}]$ is defined by $\mathcal{U} = \phi \circ \Phi^{-1}$. This function is symmetric about $1/2$; i.e., $\mathcal{U} = \phi \circ \overline{\Phi}{}^{-1}$. </p></blockquote>
<p> The name of this function will be explained when we prove the Gaussian isoperimetric inequality in Chapter 11. For now we&#8217;ll just use the following fact: </p>
<blockquote><p><b>Proposition 24</b> <a name="propgiso-asympt"></a> For $\alpha \to 0^+$, $\mathcal{U}(\alpha) \sim \alpha \sqrt{2 \ln(1/\alpha)}$. </p></blockquote>
<p> <br/><em>Proof:</em>  Write $\alpha = \overline{\Phi}(t)$, where $t \to \infty$. We use the fact that $\overline{\Phi}(t) \sim \phi(t)/t$. Thus \begin{gather*} \alpha \sim \tfrac{1}{\sqrt{2\pi} t}\exp(-t^2/2) \quad \Rightarrow \quad t \sim \sqrt{2\ln (1/\alpha)} \\ \phi(t) \sim \overline{\Phi}(t) \cdot t \quad \Rightarrow \quad \mathcal{U}(\alpha) \sim \alpha \cdot t \sim \alpha \sqrt{2\ln(1/\alpha)}. \Box \end{gather*} </p>
<p>
Given Propositions <a href="#propcube-weight1">21</a> and <a href="#propball-weight1">22</a>, let&#8217;s consider the degree-$1$ Fourier weight of subcubes and Hamming balls asymptotically as their &#8220;volume&#8221; $\alpha = \mathop{\bf E}[f]$ tends to $0$. For the subcubes we have $\mathbf{W}^{1}[f] = \alpha^2 \log(1/\alpha)$. For the Hamming balls we have $\mathbf{W}^{1}[f_n] \to \mathcal{U}(\alpha)^2 \sim 2\alpha^2 \ln(1/\alpha)$. So in both cases we have an upper bound of $O(\alpha^2 \log(1/\alpha))$.</p>
<p>
You should think of this upper bound $O(\alpha^2 \log(1/\alpha))$ as being unusually small. The obvious a priori upper bound, given that $f : \{-1,1\}^n \to \{0,1\}$ has $\mathop{\bf E}[f] = \alpha$, is \[ \mathbf{W}^{1}[f] \leq \mathop{\bf Var}[f] = \alpha(1-\alpha) \sim \alpha. \] Yet subcubes and Hamming balls have degree-$1$ weight which is almost quadratically smaller. In fact the first theorem we will show in this section is the following: </p>
<blockquote><p> <b>Level-1 Inequality</b>  Let $f : \{-1,1\}^n \to \{0,1\}$ have mean $\mathop{\bf E}[f] = \alpha \leq 1/2$. Then \[ \mathbf{W}^{1}[f] \leq O(\alpha^2\log(1/\alpha)). \]
</p></blockquote>
<p> (For the case $\alpha \geq 1/2$, replace $f$ by $1-f$.)  Thus <em>all</em> small subsets of $\{-1,1\}^n$ have unusually small $\mathbf{W}^{1}[f]$; or equivalently (in some sense), unusually large &#8220;noisy boundary&#8221;. This is another key illustration of the idea that the Hamming cube is a &#8220;small-set expander&#8221;. </p>
<blockquote><p><b>Remark 25</b> <a name="remimproved-level-1"></a> The bound in the Level-1 Inequality has a sharp form, $\mathbf{W}^{1}[f] \leq (2+o_\alpha(1)) \alpha^2\ln(1/\alpha)$. Thus Hamming balls are in fact the &#8220;asymptotic maximizers&#8221; of $\mathbf{W}^{1}[f]$ among sets of small volume $\alpha$. Also, the inequality holds more generally for $f : \{-1,1\}^n \to [-1,1]$ with $\alpha = \mathop{\bf E}[|f|]$. </p></blockquote>
<blockquote><p><b>Remark 26</b> The name &#8220;Level-1 Inequality&#8221; is not completely standard; e.g., in additive combinatorics the result would be called <em>Chang&#8217;s Inequality</em>. We use this name because we will also generalize to &#8220;Level-$k$ Inequalities&#8221; in Chapter 10. </p></blockquote>
<p><p>
So far we considered maximizing degree-$1$ weight among subsets of the Hamming cube of a fixed small volume, $\alpha$. The second theorem in this section is concerned with what happens when there is no volume constraint. In this case, maximizing examples tend to have volume $\alpha = 1/2$; switching the notation to $f : \{-1,1\}^n \to \{-1,1\}$, this corresponds to $f$ being unbiased ($\mathop{\bf E}[f] = 0$). The unbiased Hamming ball is $\mathrm{Maj}_n$, which we know has $\mathbf{W}^{1}[\mathrm{Maj}_n] \to \tfrac{2}{\pi}$. This is quite large. But unbiased subcubes are just the dictators $\chi_i$ and their negations; these have $\mathbf{W}^{1}[\pm \chi_i] = 1$ which is obviously maximal.</p>
<p>
Thus the question of which $f : \{-1,1\}^n \to \{-1,1\}$ maximizes $\mathbf{W}^{1}[f]$ has a trivial answer. But this answer is arguably unsatisfactory, since dictators (and their negations) are not &#8220;really&#8221; functions of $n$ bits. Indeed, when we studied social choice in Chapter 2 we were motivated to rule out functions $f$ having a coordinate with unfairly large influence. And in fact Proposition <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=439#propweight-1-same" target="_blank">2.57</a> showed that if all $\widehat{f}(i)$ are equal (and hence small) then $\mathbf{W}^{1}[f] \leq \frac{2}{\pi} + o_n(1)$. The second theorem of this section significantly generalizes Proposition <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=439#propweight-1-same" target="_blank">2.57</a>:<br />
<blockquote> <b>The ${\frac{\mathbf 2}{\boldsymbol{\pi}}}$ Theorem</b>  Let $f : \{-1,1\}^n \to \{-1,1\}$ satisfy $|\widehat{f}(i)| \leq \epsilon$ for all $i \in [n]$. Then \begin{equation} \label{eqn:regular-W1-bound} \mathbf{W}^{1}[f] \leq \tfrac{2}{\pi} +O(\epsilon). \end{equation} Further, if $\mathbf{W}^{1}[f] \geq \tfrac{2}{\pi} &#8211; \epsilon$ then $f$ is $O(\sqrt{\epsilon})$-close to the LTF $\mathrm{sgn}(f^{=1})$.</p></blockquote>
<p>  Functions $f$ with $|\widehat{f}(i)| \leq \epsilon$ for all $i \in [n]$ are called <em>$(\epsilon,1)$-regular</em>; see Chapter 6. So the $\frac{2}{\pi}$ Theorem says (roughly speaking) that within the class of $(\epsilon,1)$-regular functions, the maximal degree-$1$ weight is $\frac{2}{\pi}$, and any function achieving this is an unbiased LTF. Further, from Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#thmltf-stab-error" target="_blank">14</a> we know that <em>all</em> unbiased LTFs which are $(\epsilon,1)$-regular achieve this. </p>
<blockquote><p><b>Remark 27</b> Since we have $\mathbf{Stab}_\rho[f] \approx \mathbf{W}^{1}[f] \rho$ and $\frac{2}{\pi} \arcsin \rho \approx \frac{2}{\pi} \rho$ when $\rho$ is small, the $\frac{2}{\pi}$ Theorem gives the Majority Is Stablest Theorem in the limit $\rho \to 0^+$. </p></blockquote>
<p><p>
Let&#8217;s now discuss how we&#8217;ll prove our two theorems about degree-$1$ weight. Let $f : \{-1,1\}^n \to \{0,1\}$ and $\alpha = \mathop{\bf E}[f]$; we think of $\alpha$ as small for the Level-1 Inequality and $\alpha = 1/2$ for the $\frac{2}{\pi}$ Theorem. By Plancherel, $\mathbf{W}^{1}[f] = \mathop{\bf E}[f({\boldsymbol{x}}) L({\boldsymbol{x}})]$, where \[ L(x) = f^{=1}(x) = \widehat{f}(1)x_1 + \cdots + \widehat{f}(n) x_n. \] To upper-bound $\mathop{\bf E}[f({\boldsymbol{x}}) L({\boldsymbol{x}})]$, consider that as ${\boldsymbol{x}}$ varies the real number $L({\boldsymbol{x}})$ may be rather large or small, but $f({\boldsymbol{x}})$ is always $0$ or $1$. Given that $f({\boldsymbol{x}})$ is $1$ on only a $\alpha$ fraction of ${\boldsymbol{x}}$&#8217;s, the &#8220;worst case&#8221; for $\mathop{\bf E}[f({\boldsymbol{x}}) L({\boldsymbol{x}})]$ would be if $f(x)$ were $1$ precisely on the $\alpha$ fraction of $x$&#8217;s where $L(x)$ is largest. In other words, \begin{equation} \label{eqn:talagrand1} \mathbf{W}^{1}[f] = \mathop{\bf E}[f({\boldsymbol{x}}) L({\boldsymbol{x}})] \leq \mathop{\bf E}[\boldsymbol{1}_{\{L({\boldsymbol{x}}) \geq t\}} \cdot L({\boldsymbol{x}})], \end{equation} where $t$ is chosen so that \begin{equation} \label{eqn:what-is-t} \mathop{\bf Pr}[L({\boldsymbol{x}}) \geq t] \approx \alpha. \end{equation} But now we can analyze \eqref{eqn:talagrand1} quite effectively using tools such as Hoeffding&#8217;s bound and the Central Limit Theorem, since $L({\boldsymbol{x}})$ is just a linear combination of independent $\pm 1$ random bits. In particular $L({\boldsymbol{x}})$ has mean $0$ and standard deviation $\sigma = \sqrt{\mathbf{W}^{1}[f]}$ so by the CLT it acts like the Gaussian $\boldsymbol{Z} \sim \mathrm{N}(0,\sigma^2)$, at least if we assume all $|\widehat{f}(i)|$ are small. If we are thinking of $\alpha = 1/2$, then $t = 0$ and we get \[ \sigma^2 = \mathbf{W}^{1}[f] \leq \mathop{\bf E}[\boldsymbol{1}_{\{L({\boldsymbol{x}}) \geq 0\}} \cdot L({\boldsymbol{x}})] \approx \mathop{\bf E}[\boldsymbol{1}_{\{\boldsymbol{Z} \geq 0\}} \cdot \boldsymbol{Z}] = \tfrac{1}{\sqrt{2\pi}} \sigma; \] This implies $\sigma^2 \lessapprox \tfrac{1}{2\pi}$, as claimed in the $\frac{2}{\pi}$ Theorem (after adjusting $f$&#8217;s range to $\{-1,1\}$). If we are instead thinking of $\alpha$ as small then \eqref{eqn:what-is-t} suggest taking $t \sim \sigma \sqrt{2\ln(1/\alpha)}$ so that $\mathop{\bf Pr}[\boldsymbol{Z} \geq t] \approx \alpha$. Then a calculation akin to the one in Proposition <a href="#propgiso-asympt">24</a> implies \[ \mathbf{W}^{1}[f] \leq \mathop{\bf E}[\boldsymbol{1}_{\{L({\boldsymbol{x}}) \geq t\}} \cdot L({\boldsymbol{x}})] \approx \alpha \cdot \sigma \sqrt{2\ln(1/\alpha)}, \] from which the Level-1 Inequality follows. In fact, we don&#8217;t even need all $|\widehat{f}(i)|$ small for this latter analysis; for large $t$ it&#8217;s possible to upper-bound \eqref{eqn:talagrand1} using only Hoeffding&#8217;s bound: </p>
<blockquote><p><b>Lemma 28</b> <a name="lemlinear-tail"></a> Let $\ell(x) = a_1 x_1 + \cdots + a_n x_n$, where $\sum_{i} a_i^2 = 1$. Then for any $s \geq 1$, \[ \mathop{\bf E}[\boldsymbol{1}_{\{|\ell({\boldsymbol{x}})| &gt; s\}} \cdot |\ell({\boldsymbol{x}})|] \leq (2s+2)\exp(-\tfrac{s^2}{2}). \] </p></blockquote>
<p> <br/><em>Proof:</em>  We have \begin{align*} \mathop{\bf E}[\boldsymbol{1}_{\{|\ell({\boldsymbol{x}})| &gt; s\}} \cdot |\ell({\boldsymbol{x}})|] &#038;= s\mathop{\bf Pr}[|\ell({\boldsymbol{x}})| &gt; s] + \int_{s}^\infty \mathop{\bf Pr}[|\ell({\boldsymbol{x}})| &gt; u]\,du \\ &#038;\leq 2s\exp(-\tfrac{s^2}{2}) + \int_{s}^\infty 2\exp(-\tfrac{u^2}{2})\,du, \end{align*} using Hoeffding&#8217;s bound. But for $s \geq 1$, \[ \int_{s}^\infty 2\exp(-\tfrac{u^2}{2})\,du \leq \int_{s}^\infty u \cdot 2\exp(-\tfrac{u^2}{2})\,du = 2\exp(-\tfrac{s^2}{2}). \Box \]</p>
<p>
We now give formal proofs of the two theorems, commenting that rather than $L(x)$ it&#8217;s more convenient to work with \[ \ell(x) = \tfrac{1}{\sigma} f^{=1}(x) = \tfrac{\widehat{f}(1)}{\sigma}x_1 + \cdots + \tfrac{\widehat{f}(n)}{\sigma}x_n. \]</p>
<p>
<br/><em>Proof of the Level-1 Inequality:</em>  Following Remark <a href="#remimproved-level-1">25</a> we let $f : \{-1,1\}^n \to [-1,1]$ and $\alpha = \mathop{\bf E}[|f|]$. We may assume $\sigma = \sqrt{\mathbf{W}^{1}[f]} &gt; 0$. Writing $\ell = \frac{1}{\sigma} f^{=1}$ we have $ \langle f, \ell \rangle = \tfrac{1}{\sigma} \langle f, f^{=1} \rangle = \tfrac{1}{\sigma} \mathbf{W}^{1}[f] = \sigma $ and hence \[ \sigma = \langle f, \ell \rangle = \mathop{\bf E}[\boldsymbol{1}_{\{|\ell({\boldsymbol{x}})| \leq s\}} \cdot f({\boldsymbol{x}}) \ell({\boldsymbol{x}})] + \mathop{\bf E}[\boldsymbol{1}_{\{|\ell({\boldsymbol{x}})| &gt; s\}} \cdot f({\boldsymbol{x}}) \ell({\boldsymbol{x}})] \] holds for any $s \geq 1$. The first expectation above is at most $\mathop{\bf E}[s |f({\boldsymbol{x}})|] = \alpha s$, and the second is at most $(2+2s)\exp(-s^2/2) \leq 4s\exp(-s^2/2)$ by Lemma <a href="#lemlinear-tail">28</a>. Hence \[ \sigma \leq \alpha s + 4s \exp(-s^2/2). \] The optimal choice of $s$ is $s = (\sqrt{2} + o_\alpha(1)) \sqrt{\ln(1/\alpha)}$, yielding \[ \sigma \leq (\sqrt{2} + o(1)) \alpha\sqrt{\ln(1/\alpha)}. \] Squaring this establishes the claim $\sigma^2 \leq (2+o_\alpha(1)) \alpha^2\ln(1/\alpha)$. $\Box$</p>
<p><p>
<br/><em>Proof of the $\frac{2}{\pi}$ Theorem:</em>  We may assume $\sigma = \sqrt{\mathbf{W}^{1}[f]} \geq 1/2$: for the theorem&#8217;s first statement this is because otherwise there is nothing to prove; for the theorem&#8217;s second statement this is because we may assume $\epsilon$ sufficiently small.</p>
<p>
 We start by proving \eqref{eqn:regular-W1-bound}. Let $\ell = \frac{1}{\sigma} f^{=1}$, so $\|\ell\|_2 = 1$ and $\widehat{\ell}(i) \leq 2\epsilon$ for all $i \in [n]$. We have \begin{equation} \label{eqn:ltf-test1} \sigma = \langle f, \ell \rangle \leq \mathop{\bf E}[|\ell|] \leq \sqrt{\tfrac{2}{\pi}} + C \epsilon \end{equation} for some constant $C$, where we used Theorem <a href="#thml1-be">13</a>. Squaring this proves \eqref{eqn:regular-W1-bound}. We observe that \eqref{eqn:regular-W1-bound} therefore holds even for $f : \{-1,1\}^n \to [-1,1]$.</p>
<p>
 Now suppose we also have $\mathbf{W}^{1}[f] \geq \tfrac{2}{\pi} &#8211; \epsilon$; i.e., \[ \sigma \geq \sqrt{\tfrac{2}{\pi} - \epsilon} \geq \sqrt{\tfrac{2}{\pi}} - 2\epsilon. \] Thus the first inequality in \eqref{eqn:ltf-test1} must be close to tight; specifically, \begin{equation} \label{eqn:ltf-test-contra} (C+2)\epsilon \geq \mathop{\bf E}[|\ell|] &#8211; \langle f, \ell\rangle = \mathop{\bf E}[(\mathrm{sgn}(\ell({\boldsymbol{x}})) - f({\boldsymbol{x}})) \cdot \ell({\boldsymbol{x}})]. \end{equation} By the Berry&#8211;Esseen Theorem (and Remark <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#remsimpler-BE" target="_blank">12</a> and an exercise), \[ \mathop{\bf Pr}[|\ell| \leq K\sqrt{\epsilon}] \leq \mathop{\bf Pr}[|\mathrm{N}(0,1)| \leq K\sqrt{\epsilon}] + .56 \cdot 2\epsilon \leq \tfrac{1}{\sqrt{2\pi}} \cdot 2K\sqrt{\epsilon} + 1.12 \epsilon \leq 2K\sqrt{\epsilon} \] for any constant $K \geq 1$. We therefore have the implication \begin{align*} \mathop{\bf Pr}[f \neq \mathrm{sgn}(\ell)] \geq 3K\sqrt{\epsilon} &#038;\Rightarrow \mathop{\bf Pr}[f({\boldsymbol{x}}) \neq \mathrm{sgn}(\ell({\boldsymbol{x}}))\ \wedge\ |\ell({\boldsymbol{x}})| &gt; K\sqrt{\epsilon}] \geq K \sqrt{\epsilon} \\ &#038;\Rightarrow \mathop{\bf E}[(\mathrm{sgn}(\ell({\boldsymbol{x}})) - f({\boldsymbol{x}})) \cdot \ell({\boldsymbol{x}})] \geq K\sqrt{\epsilon} \cdot 2(K\sqrt{\epsilon}) = 2K^2 \epsilon. \end{align*} This contradicts \eqref{eqn:ltf-test-contra} for $K = \sqrt{C+2}$, say. Thus $\mathop{\bf Pr}[f \neq \mathrm{sgn}(\ell)] \leq 3\sqrt{C+2}\sqrt{\epsilon}$, completing the proof. $\Box$</p>
<p>
For an interpolation between these two theorems, see the exercises.</p>
<p>
We conclude this section with an application of the Level-1 Inequality. First, a quick corollary which we leave for the exercises: </p>
<blockquote><p><b>Corollary 29</b> <a name="corlevel-1-quick"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$ have $|\mathop{\bf E}[f]| \geq 1 &#8211; \delta \geq 0$. Then $\mathbf{W}^{1}[f] \leq 4\delta^2 \log(2/\delta)$. </p></blockquote>
<p> In Chapter <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=439" target="_blank">2.5</a> we stated the FKN Theorem, which says that if $f : \{-1,1\}^n \to \{-1,1\}$ has $\mathbf{W}^{1}[f] \geq 1 &#8211; \delta$ then it must be $O(\delta)$-close to a dictator or negated-dictator. The following theorem shows that once the FKN Theorem is proved, it can be strengthened to give an essentially optimal (see the exercises) closeness bound: </p>
<blockquote><p><b>Theorem 30</b> <a name="thmFKN-improvement"></a> Suppose the FKN Theorem holds with closeness bound $C\delta$, where $C \geq 1$ is a universal constant. Then in fact it holds with bound $\delta/4 + \eta$, where $\eta = 16C^2 \delta^2 \max(\log(1/C\delta),1)$. </p></blockquote>
<p> <br/><em>Proof:</em>  Suppose $f : \{-1,1\}^n \to \{-1,1\}$ has $\mathbf{W}^{1}[f] \geq 1 &#8211; \delta \geq 0$. By assumption $f$ is $C\delta$-close to $\pm \chi_i$ for some $i \in [n]$, say $i = n$. Thus we have \[ |\widehat{f}(n)| \geq 1 - 2C\delta \] and our task is to show that in fact $|\widehat{f}(n)| \geq 1 &#8211; \delta/2 &#8211; 2\eta$. We may assume $\delta \leq \frac{1}{10C}$ as otherwise $1 &#8211; \delta/2 &#8211; 2\eta &lt; 0$ (exercise) and there is nothing to prove. By employing the trick from Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=494#exFKN-0-and-1" target="_blank">2.42</a> we may also assume $\mathop{\bf E}[f] = 0$.</p>
<p>
 Consider the restriction of $f$ given by fixing coordinate $n$ to $b \in \{-1,1\}$; i.e., $f_{[n-1] \mid b}$. For both choices of $b$ we have $|\mathop{\bf E}[f_{[n-1] \mid b}]| \geq 1 &#8211; 2C\delta$ and so Corollary <a href="#corlevel-1-quick">29</a> implies $\mathbf{W}^{1}[f_{[n-1] \mid b}] \leq 16C^2\delta^2 \log(1/C\delta)$. Thus \[ 16C^2\delta^2 \log(1/C\delta) \geq \mathop{\bf E}_{\boldsymbol{b}}[\mathbf{W}^{1}[f_{[n-1] \mid \boldsymbol{b}}]] = \sum_{j &lt; n} (\widehat{f}(\{j\})^2 + \widehat{f}(\{j, n\})^2) \geq \sum_{j &lt; n} \widehat{f}(j)^2, \] by Corollary <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=560#corexpected-restrict-coeffs" target="_blank">3.21</a>. It follows that \[ \widehat{f}(n)^2 = \mathbf{W}^{1}[f] &#8211; \sum_{j&lt;n}\widehat{f}(j)^2 \geq 1- \delta &#8211; 16C^2\delta^2 \log(1/C\delta), \] and the proof is completed by the fact that \[ 1- \delta - 16C^2\delta^2 \log(1/C\delta) \geq (1-\delta/2 - 2\eta)^2 \] when $\delta \leq \frac{1}{10C}$ (exercise). $\Box$</p>
]]></content:encoded>
			<wfw:commentRss>http://www.contrib.andrew.cmu.edu/~ryanod/?feed=rss2&#038;p=885</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>§5.3: The Fourier coefficients of Majority</title>
		<link>http://www.contrib.andrew.cmu.edu/~ryanod/?p=877</link>
		<comments>http://www.contrib.andrew.cmu.edu/~ryanod/?p=877#comments</comments>
		<pubDate>Fri, 06 Apr 2012 13:30:54 +0000</pubDate>
		<dc:creator>Ryan O'Donnell</dc:creator>
				<category><![CDATA[All chapter sections]]></category>
		<category><![CDATA[Chapter 5: Majority and threshold functions]]></category>
		<category><![CDATA[majority]]></category>

		<guid isPermaLink="false">http://www.contrib.andrew.cmu.edu/~ryanod/?p=877</guid>
		<description><![CDATA[<p> In this section we will analyze the Fourier coefficients of $\mathrm{Maj}_n$. In fact, we give an explicit formula for them in Theorem 16 below. But most of the time this formula is not too useful; instead, it&#8217;s better to understand the Fourier coefficients of $\mathrm{Maj}_n$ asymptotically as $n \to \infty$. </p> <p> Let&#8217;s begin [...]]]></description>
			<content:encoded><![CDATA[<p><a name="secmaj-coefficients"></a> In this section we will analyze the Fourier coefficients of $\mathrm{Maj}_n$. In fact, we give an explicit formula for them in Theorem <a href="#thmmaj-coeffs">16</a> below. But most of the time this formula is not too useful; instead, it&#8217;s better to understand the Fourier coefficients of $\mathrm{Maj}_n$ asymptotically as $n \to \infty$.<br />
<span id="more-877"></span></p>
<p>
Let&#8217;s begin with a few basic observations. First, $\mathrm{Maj}_n$ is a symmetric function and hence $\widehat{\mathrm{Maj}_n}(S)$ only depends on $|S|$ (Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=301#exgolomb" target="_blank">1.29</a>). Second, $\mathrm{Maj}_n$ is an odd function and hence $\widehat{\mathrm{Maj}_n}(S) = 0$ whenever $|S|$ is even (Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=301#exodd-even" target="_blank">1.9</a>). It remains to determine the Fourier coefficients $\widehat{\mathrm{Maj}_n}(S)$ for $|S|$ odd. By symmetry, $\widehat{\mathrm{Maj}_n}(S)^2 = \mathbf{W}^{k}[\mathrm{Maj}_n]/\binom{n}{k}$ for all $|S| = k$, so if we are content to know the magnitudes of $\mathrm{Maj}_n$&#8217;s Fourier coefficients, it suffices to determine the quantities $\mathbf{W}^{k}(\mathrm{Maj}_n)$.</p>
<p>
In fact, for each $k \in {\mathbb N}$ the quantity $\mathbf{W}^{k}(\mathrm{Maj}_n)$ converges to a fixed constant as $n \to \infty$. We can deduce this using our analysis of the noise stability of majority. From the previous section we know that for all $|\rho| \leq 1$, \begin{equation} \label{eqn:maj-stab-series} \lim_{n \to \infty} \mathbf{Stab}_\rho[\mathrm{Maj}_n] = \tfrac{2}{\pi} \arcsin \rho = \tfrac{2}{\pi}\Bigl(\rho + \tfrac16 \rho^3 + \tfrac{3}{40} \rho^5 + \tfrac{5}{112} \rho^7 + \cdots \Bigr), \end{equation} where we have used the power series for $\arcsin$, \begin{equation} \label{eqn:arcsin} \arcsin z = \sum_{k \text{ odd}} \ \frac{2}{k2^k} \binom{k-1}{\frac{k-1}{2}} \cdot z^k, \end{equation} valid for $|\rho| \leq 1$ (see the exercises). Comparing \eqref{eqn:maj-stab-series} with the formula \[ \mathbf{Stab}_\rho[\mathrm{Maj}_n] = \sum_{k \geq 0} \mathbf{W}^{k}[\mathrm{Maj}_n] \cdot \rho^k \] suggests the following: For each fixed $k \in {\mathbb N}$, \begin{equation} \label{eqn:maj-one-function} \lim_{n \to \infty} \mathbf{W}^{k}[\mathrm{Maj}_n] = [\rho^k] (\tfrac{2}{\pi} \arcsin \rho) = \begin{cases} \frac{4}{\pi k2^k} \binom{k-1}{\frac{k-1}{2}} &#038; \text{if $k$ odd,} \\ 0 &#038; \text{if $k$ even.} \end{cases} \end{equation}   (Here $[z^k]F(z)$ denotes the coefficient on $z^k$ in power series $F(z)$.) Indeed, we prove this identity below in Theorem <a href="#thmmaj-asymptotic-weight">19</a>. The noise stability method which suggests it can also be made formal (see the exercises).</p>
<p>
Identity \eqref{eqn:maj-one-function} is one way to formulate precisely the statement that the &#8220;Fourier spectrum of $\mathrm{Maj}_n$ converges&#8221;. Introducing notation such as &#8220;$\mathbf{W}^{k}(\mathrm{Maj})$&#8221; for the quantity in \eqref{eqn:maj-one-function}, we have the further asymptotics \begin{equation} \label{eqn:maj-asympt-asympt} \begin{aligned} \text{for $k$ odd,} \qquad \mathbf{W}^{k}(\mathrm{Maj}) &#038;\sim \left(\tfrac{2}{\pi}\right)^{3/2} k^{-3/2},\\ \mathbf{W}^{&gt;k}(\mathrm{Maj}) &#038;\sim \left(\tfrac{2}{\pi}\right)^{3/2} k^{-1/2} \qquad \text{as $k \to \infty$}. \end{aligned} \end{equation} (You are asked to show the above in the exercises.) The estimates \eqref{eqn:maj-asympt-asympt}, together with the precise value $\mathbf{W}^{1}(\mathrm{Maj}) = \tfrac{2}{\pi}$, are usually all you need to know about the Fourier coefficients of majority.</p>
<p>
Nevertheless, let&#8217;s now compute the Fourier coefficients of $\mathrm{Maj}_n$ exactly. </p>
<blockquote><p><b>Theorem 16</b> <a name="thmmaj-coeffs"></a> If $|S|$ is even then $\widehat{\mathrm{Maj}_n}(S) = 0$. If $|S| = k$ is odd, \[ \widehat{\mathrm{Maj}_n}(S) = (-1)^{\frac{k-1}{2}} \frac{\binom{\frac{n-1}{2}}{\frac{k-1}{2}}}{\binom{n-1}{k-1}} \cdot \tfrac{2}{2^n} {\textstyle \binom{n-1}{\frac{n-1}{2}}}. \] </p></blockquote>
<p> <br/><em>Proof:</em>  The first statement holds because $\mathrm{Maj}_n$ is an odd function; henceforth we assume $|S| = k$ is odd. The trick will be to compute the Fourier expansion of majority&#8217;s <em>derivative</em> $\mathrm{D}_n \mathrm{Maj}_n = \mathrm{Half}_{n-1} : \{-1,1\}^{n-1} \to \{0,1\}$, the $0$-$1$ indicator of the set of $(n-1)$-bit strings with exactly half of their coordinates equal to $-1$. By the derivative formula and the fact that $\mathrm{Maj}_n$ is symmetric, $\widehat{\mathrm{Maj}_n}(S) = \widehat{\mathrm{Half}_{n-1}}(T)$ for any $T \subseteq [n-1]$ with $|T| = k-1$. So writing $n-1 = 2m$ and $k-1 = 2j$, it suffices to show \begin{equation} \label{eqn:half-formula} \widehat{\mathrm{Half}_{2m}}([2j]) = (-1)^{j} \frac{\binom{m}{j}}{\binom{2m}{2j}}\cdot \tfrac{1}{2^{2m}}{\textstyle \binom{2m}{m}}. \end{equation}</p>
<p>
By the probabilistic definition of $\mathrm{T}_\rho$, for any $\rho \in [-1,1]$ we have \[ \mathrm{T}_\rho \mathrm{Half}_{2m}(1, 1, \dots, 1) = \mathop{\bf E}_{{\boldsymbol{x}} \sim N_\rho((1, 1, \dots, 1))}[\mathrm{Half}_{2m}({\boldsymbol{x}})] = \mathop{\bf Pr}[{\boldsymbol{x}} \text{ has $m$ $1$'s and $m$ $-1$'s}], \] where each coordinate of ${\boldsymbol{x}}$ is $1$ with probability $\tfrac{1}{2} + \tfrac{1}{2} \rho$. Thus \begin{equation} \label{eqn:half1} \mathrm{T}_\rho \mathrm{Half}_{2m}(1, 1, \dots, 1) = {\textstyle \binom{2m}{m}}(\tfrac{1}{2} + \tfrac{1}{2} \rho)^{m} (\tfrac{1}{2} &#8211; \tfrac{1}{2} \rho)^{m} = \tfrac{1}{2^{2m}} {\textstyle \binom{2m}{m}} (1-\rho^2)^m. \end{equation} On the other hand, by the Fourier formula for $\mathrm{T}_\rho$ and the fact that $\mathrm{Half}_{2m}$ is symmetric we have \begin{equation} \label{eqn:half2} \mathrm{T}_\rho \mathrm{Half}_{2m}(1, 1, \dots, 1) = \sum_{U \subseteq [2m]} \widehat{\mathrm{Half}_{2m}}(U) \rho^{|U|} = \sum_{i=0}^{2m} {\textstyle \binom{2m}{i}} \widehat{\mathrm{Half}_{2m}}([i]) \rho^{i}. \end{equation} Since we have equality $\eqref{eqn:half1} = \eqref{eqn:half2}$ between two degree-$2m$ polynomials of $\rho$ on all of $[-1,1]$, we can equate coefficients. In particular, for $i = 2j$ we have \[ {\textstyle \binom{2m}{2j}} \widehat{\mathrm{Half}_{2m}}([2j]) = \tfrac{1}{2^{2m}} {\textstyle \binom{2m}{m}} \cdot [\rho^{2j}](1-\rho^2)^m = \tfrac{1}{2^{2m}} {\textstyle \binom{2m}{m}} \cdot (-1)^j {\textstyle \binom{m}{j}}, \] confirming \eqref{eqn:half-formula}. $\Box$</p>
<p>
You are asked to prove the following corollaries in the exercises: </p>
<blockquote><p><b>Corollary 17</b> <a name="cormaj-coeffs-symm"></a> $\widehat{\mathrm{Maj}_n}(S) = \widehat{\mathrm{Maj}_n}(T)$ whenever $|S| + |T| = n+1$. Hence also $\mathbf{W}^{n-k+1}[\mathrm{Maj}_n] = \frac{k}{n-k+1} \mathbf{W}^{k}[\mathrm{Maj}_n]$. </p></blockquote>
<blockquote><p><b>Corollary 18</b> <a name="cormaj-weight-decreasing"></a> For any odd $k$, $\mathbf{W}^{k}[\mathrm{Maj}_n]$ is a strictly decreasing function of $n$ (for $n \geq k$ odd). </p></blockquote>
<p> We can now prove the identity \eqref{eqn:maj-one-function}: </p>
<blockquote><p><b>Theorem 19</b> <a name="thmmaj-asymptotic-weight"></a> For each fixed odd $k$, \[ \mathbf{W}^{k}[\mathrm{Maj}_n] \searrow [\rho^k] (\tfrac{2}{\pi} \arcsin \rho) = \tfrac{4}{\pi k2^k} {\textstyle \binom{k-1}{\frac{k-1}{2}}} \] as $n \geq k$ tends to $\infty$ (through the odd numbers). Further, we have the error bound \begin{equation} \label{eqn:maj-weight-err} [\rho^k] (\tfrac{2}{\pi} \arcsin \rho) \leq \mathbf{W}^{k}[\mathrm{Maj}_n] \leq (1+2k/n) \cdot [\rho^k] (\tfrac{2}{\pi} \arcsin \rho) \end{equation} for all $k &lt; n/2$. (For $k &gt; n/2$ one can use Corollary <a href="#cormaj-coeffs-symm">17</a>.) </p></blockquote>
<p> <br/><em>Proof:</em>  Corollary <a href="#cormaj-weight-decreasing">18</a> tells us that $\mathbf{W}^{k}[\mathrm{Maj}_n]$ is decreasing in $n$; hence we only need to justify \eqref{eqn:maj-weight-err}. Using the formula from Theorem <a href="#thmmaj-coeffs">16</a> we have \[ \frac{\mathbf{W}^{k}[\mathrm{Maj}_n]}{[\rho^k] (\tfrac{2}{\pi} \arcsin \rho)} = \frac{\binom{n}{k}\tfrac{4}{2^{2n}}\binom{n-1}{\frac{n-1}{2}}^2\left.\binom{\frac{n-1}{2}}{\frac{k-1}{2}}^2/\binom{n-1}{k-1}^2\right.}{\frac{4}{\pi k2^k} \binom{k-1}{\frac{k-1}{2}}} = \tfrac{\pi}{2} n \cdot 2^{k-n}{\textstyle \binom{n-k}{\frac{n-k}{2}}} \cdot 2^{1-n}{\textstyle \binom{n-1}{\frac{n-1}{2}}}, \] where the second identity is verified by expanding all binomial coefficients to factorials. By Stirling&#8217;s approximation we have $2^{-m}\binom{m}{m/2} \nearrow \sqrt{\frac{2}{\pi m}}$, meaning that the ratio of the left side to the right side increases to $1$ as $m \to \infty$. Thus \[ \frac{\mathbf{W}^{k}[\mathrm{Maj}_n]}{[\rho^k] (\tfrac{2}{\pi} \arcsin \rho)} \nearrow \frac{n}{\sqrt{n-k}\sqrt{n-1}} = (1-\tfrac{k+1}{n} +\tfrac{k}{n^2})^{-1/2}, \] and the right-hand side is at most $1+2k/n$ for $1 \leq k \leq n/2$ (an exercise). $\Box$</p>
<p> Finally, we can deduce the asymptotics \eqref{eqn:maj-asympt-asympt} from this theorem (as you are asked in the exercises): </p>
<blockquote><p><b>Corollary 20</b> <a name="cormaj-asympt-asympt"></a> Let $k \in {\mathbb N}$ be odd and assume $n = n(k) \geq 2k^2$. Then \begin{align*} \mathbf{W}^{k}(\mathrm{Maj}_n) &#038;= \left(\tfrac{2}{\pi}\right)^{3/2} k^{-3/2} \cdot (1\pm O(1/k)), \\ \mathbf{W}^{&gt;k}(\mathrm{Maj}_n) &#038;= \left(\tfrac{2}{\pi}\right)^{3/2} k^{-1/2} \cdot (1\pm O(1/k)), \end{align*} and hence the Fourier spectrum of $\mathrm{Maj}_n$ is $\epsilon$-concentrated on degree up to $\frac{8}{\pi^3} \epsilon^{-2} + O_{\epsilon}(1)$. </p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.contrib.andrew.cmu.edu/~ryanod/?feed=rss2&#038;p=877</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>§5.2: Majority, and the Central Limit Theorem</title>
		<link>http://www.contrib.andrew.cmu.edu/~ryanod/?p=866</link>
		<comments>http://www.contrib.andrew.cmu.edu/~ryanod/?p=866#comments</comments>
		<pubDate>Fri, 30 Mar 2012 21:16:30 +0000</pubDate>
		<dc:creator>Ryan O'Donnell</dc:creator>
				<category><![CDATA[All chapter sections]]></category>
		<category><![CDATA[Chapter 5: Majority and threshold functions]]></category>
		<category><![CDATA[Berry--Esseen Theorem]]></category>
		<category><![CDATA[Central Limit Theorem]]></category>
		<category><![CDATA[majority]]></category>
		<category><![CDATA[Majority Is Stablest Theorem]]></category>
		<category><![CDATA[Sheppard's Formula]]></category>

		<guid isPermaLink="false">http://www.contrib.andrew.cmu.edu/~ryanod/?p=866</guid>
		<description><![CDATA[<p> Majority is one of the more important functions in boolean analysis and its study motivates the introduction of one of the more important tools: the Central Limit Theorem (CLT). In this section we will show how the CLT can be used to estimate the total influence and the noise stability of $\mathrm{Maj}_n$. Though we [...]]]></description>
			<content:encoded><![CDATA[<p> <a name="secmajority"></a> Majority is one of the more important functions in boolean analysis and its study motivates the introduction of one of the more important tools: the Central Limit Theorem (CLT).<br />
<span id="more-866"></span><br />
In this section we will show how the CLT can be used to estimate the total influence and the noise stability of $\mathrm{Maj}_n$. Though we already determined $\mathbf{I}[\mathrm{Maj}_n] \sim \sqrt{2/\pi}\sqrt{n}$ in Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=465#exmaj-influences" target="_blank">2.18</a> using binomial coefficients and Stirling&#8217;s formula, computations using the CLT are more flexible and extend to other linear threshold functions.</p>
<p>
We begin with a reminder about the Central Limit Theorem.									 Suppose $\boldsymbol{X}_1, \dots, \boldsymbol{X}_n$ are independent random variables and $\boldsymbol{S} = \boldsymbol{X}_1 + \cdots + \boldsymbol{X}_n$. Roughly speaking, the CLT says that so long as no $\boldsymbol{X}_i$ is too dominant in terms of variance, the distribution of $\boldsymbol{S}$ is close to that of a Gaussian random variable with the same mean and variance. We give a precise statement below in the form of the <em>Berry&#8211;Esseen Theorem</em>. The CLT also extends to the <em>multidimensional</em> case (sums of independent random vectors); we give a precise statement in the exercises. In Chapter 13 we will show one way to prove such Central Limit Theorems.</p>
<p>
											Let&#8217;s see how we can use the CLT to obtain the estimate $\mathbf{I}[\mathrm{Maj}_n] \sim \sqrt{2/\pi}\sqrt{n}$. Recall the proof of Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=368#thmmaj-maximizes-deg-1-sum" target="_blank">2.32</a>, which shows that $\mathrm{Maj}_n$ maximizes $\sum_{i=1}^n \widehat{f}(i)$ among all $f : \{-1,1\}^n \to \{-1,1\}$. In it we saw that \begin{equation} \label{eqn:sum-deg-1-maj} \mathbf{I}[\mathrm{Maj}_n] = \sum_{i=1}^n \widehat{\mathrm{Maj}_n}(i) = \mathop{\bf E}_{{\boldsymbol{x}}}[\mathrm{Maj}_n({\boldsymbol{x}})(\mathop{{\textstyle \sum}}_i {\boldsymbol{x}}_i)] = \mathop{\bf E}_{{\boldsymbol{x}}}[|\mathop{{\textstyle \sum}}_i {\boldsymbol{x}}_i|]. \end{equation} When using the Central Limit Theorem, it&#8217;s convenient to define majority (equivalently) as \[ 	\mathrm{Maj}_n(x) = \mathrm{sgn}\Bigl(\mathop{{\textstyle \sum}}_{i=1}^n \tfrac{1}{\sqrt{n}} x_i\Bigr). \] This motivates writing \eqref{eqn:sum-deg-1-maj} as \begin{equation} \label{eqn:sum-deg-1-maj2} \mathbf{I}[\mathrm{Maj}_n] = \sqrt{n} \cdot \mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[|\mathop{{\textstyle \sum}}_i \tfrac{1}{\sqrt{n}}{\boldsymbol{x}}_i|]. \end{equation} If we introduce $\boldsymbol{S} = \sum_{i=1}^n \tfrac{1}{\sqrt{n}} {\boldsymbol{x}}_i$, then $\boldsymbol{S}$ has mean $0$ and variance $\sum_i (1/\sqrt{n})^2 = 1$. Thus the CLT tells us that the distribution of $\boldsymbol{S}$ is close (for large $n$) to that of a standard Gaussian, $\boldsymbol{Z} \sim \mathrm{N}(0,1)$. So as $n \to \infty$ we have \begin{equation} \label{eqn:abs-first-moment-gaussian} \mathop{\bf E}_{{\boldsymbol{x}}}[|\boldsymbol{S}|] \sim \mathop{\bf E}_{\boldsymbol{Z} \sim \mathrm{N}(0,1)}[|\boldsymbol{Z}|] = 2\int_{0}^\infty z \cdot \tfrac{1}{\sqrt{2\pi}} e^{-z^2/2}\,dz = -\sqrt{2/\pi} e^{-z^2/2}\Bigm|_{0}^\infty = \sqrt{2/\pi}, \end{equation} which when combined with \eqref{eqn:sum-deg-1-maj2} gives us the estimate $\mathbf{I}[\mathrm{Maj}_n] \sim \sqrt{2/\pi} \sqrt{n}$. 											 To make this kind of estimate more precise we state the Berry&#8211;Esseen Theorem, which is a strong version of the CLT giving explicit error bounds rather than just limiting statements. </p>
<blockquote><p><b>Berry&#8211;Esseen (Central Limit) Theorem</b>  Let $\boldsymbol{X}_1, \dots, \boldsymbol{X}_n$ be independent random variables with $\mathop{\bf E}[\boldsymbol{X}_i] = 0$ and $\mathop{\bf Var}[\boldsymbol{X}_i] = \sigma_i^2$, and assume $\sum_{i=1}^n \sigma_i^2 = 1$. Let $\boldsymbol{S} = \sum_{i=1}^n \boldsymbol{X}_i$ and let $\boldsymbol{Z} \sim \mathrm{N}(0,1)$ be a standard Gaussian. Then for all $u \in {\mathbb R}$, \[ |\mathop{\bf Pr}[\boldsymbol{S} \leq u] &#8211; \mathop{\bf Pr}[\boldsymbol{Z} \leq u]| \leq B \beta, \] where \[ \beta = \sum_{i=1}^n \|\boldsymbol{X}_i\|_3^3 \] and $B$ is a universal constant. (For definiteness, $B = .56$ is acceptable.)  </p></blockquote>
<blockquote><p><b>Remark 12</b> <a name="remsimpler-BE"></a> If all of the $\boldsymbol{X}_i$&#8217;s satisfy $|\boldsymbol{X}_i| \leq \epsilon$ with probability $1$ then we can use the bound \[ \beta = \sum_{i=1}^n \mathop{\bf E}[|\boldsymbol{X}_i|^3] \leq \epsilon \cdot \sum_{i=1}^n \mathop{\bf E}[|\boldsymbol{X}_i|^2] = \epsilon \cdot \sum_{i=1}^n \sigma_i^2 = \epsilon. \] </p></blockquote>
<p>The exercises contain additional observations about this theorem.
<p>
Our most frequent use of the Berry&#8211;Esseen Theorem will be in analyzing random sums \[ \boldsymbol{S} = \sum_{i=1}^n a_i {\boldsymbol{x}}_i, \] where ${\boldsymbol{x}} \sim \{-1,1\}^n$ and the constants $a_i \in {\mathbb R}$ are normalized so that $\sum_i a_i^2 = 1$. For majority, all of the $a_i$&#8217;s were equal to $\tfrac{1}{\sqrt{n}}$. But from Remark <a href="#remsimpler-BE">12</a> we see that $\boldsymbol{S}$ is close in distribution to a standard Gaussian so long as each $|a_i|$ is small. For example, in the exercises you are asked to show the following: </p>
<blockquote><p><b>Theorem 13</b> <a name="thml1-be"></a> Let $a_1, \dots, a_n \in {\mathbb R}$ satisfy $\sum_i a_i^2 = 1$ and $|a_i| \leq \epsilon$ for all $i$. Then \[ \left|\mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[|\mathop{{\textstyle \sum}}_i a_i {\boldsymbol{x}}_i|] &#8211; \sqrt{2/\pi}\right| \leq C \epsilon, \] where $C$ is a universal constant. </p></blockquote>
<p> Theorem <a href="#thml1-be">13</a> justifies \eqref{eqn:abs-first-moment-gaussian} with an error bound of $O(1/\sqrt{n})$, yielding the more precise estimate $\mathbf{I}[\mathrm{Maj}_n] = \sqrt{2/\pi}\sqrt{n} \pm O(1)$ (cf. Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=465#exmaj-influences" target="_blank">2.18</a> which gives an even better error bound).</p>
<p>
 Now let&#8217;s turn to the noise stability of majority. Theorem <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=406#thmmaj-stab" target="_blank">2.44</a> stated the formula \begin{equation} \label{eqn:re-maj-stab} 	\lim_{n \to \infty} \mathbf{Stab}_\rho[\mathrm{Maj}_n] = \tfrac{2}{\pi} \arcsin \rho. \end{equation} Let&#8217;s now spend some time justifying this using the multidimensional CLT. (For complete details, see the exercises.) By definition, \begin{equation} \label{eqn:stab-maj-justify1} 	\mathbf{Stab}_\rho[\mathrm{Maj}_n] = \mathop{\bf E}_{\substack{({\boldsymbol{x}}, \boldsymbol{y}) \\ {\text{ $\rho$-correlated}}}}[\mathrm{Maj}_n({\boldsymbol{x}}) \cdot \mathrm{Maj}_n(\boldsymbol{y})] = \mathop{\bf E}_{{\substack{({\boldsymbol{x}}, \boldsymbol{y}) \\ {\text{ $\rho$-correlated}}}}}[\mathrm{sgn}(\mathop{{\textstyle \sum}}_i \tfrac{1}{\sqrt{n}} {\boldsymbol{x}}_i) \cdot \mathrm{sgn}(\mathop{{\textstyle \sum}}_i \tfrac{1}{\sqrt{n}} \boldsymbol{y}_i)]. \end{equation} For each $i \in [n]$ let&#8217;s stack $\tfrac{1}{\sqrt{n}} {\boldsymbol{x}}_i$ and $\tfrac{1}{\sqrt{n}} \boldsymbol{y}_i$ into a $2$-dimensional vector and then write \begin{equation} \label{eqn:Svec-cov} \vec{\boldsymbol{S}} = \sum_{i=1}^n \begin{bmatrix} \tfrac{1}{\sqrt{n}}{\boldsymbol{x}}_i \\ \tfrac{1}{\sqrt{n}}\boldsymbol{y}_i \end{bmatrix} \in {\mathbb R}^2. \end{equation} We are summing $n$ independent random vectors, so the multidimensional Central Limit Theorem tells us that the distribution of $\vec{\boldsymbol{S}}$ is close to that of 											a $2$-dimensional Gaussian $\vec{\boldsymbol{Z}}$ with the same mean and covariance matrix, namely  \[ \vec{\boldsymbol{Z}} \sim \mathrm{N}\left(\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 &#038; \rho \\ \rho &#038; 1 \end{bmatrix}\right). \] Continuing from \eqref{eqn:stab-maj-justify1}, \begin{align*} \mathbf{Stab}_\rho[\mathrm{Maj}_n] &#038;= \mathop{\bf E}[\mathrm{sgn}(\vec{\boldsymbol{S}}_1)\cdot \mathrm{sgn}(\vec{\boldsymbol{S}}_2)] \\ &#038;= \mathop{\bf Pr}[\mathrm{sgn}(\vec{\boldsymbol{S}}_1) = \mathrm{sgn}(\vec{\boldsymbol{S}}_2)] &#8211; \mathop{\bf Pr}[\mathrm{sgn}(\vec{\boldsymbol{S}}_1) \neq \mathrm{sgn}(\vec{\boldsymbol{S}}_2)] \\ &#038;= 2\mathop{\bf Pr}[\mathrm{sgn}(\vec{\boldsymbol{S}}_1) = \mathrm{sgn}(\vec{\boldsymbol{S}}_2)] &#8211; 1 = 4\mathop{\bf Pr}[\vec{\boldsymbol{S}} \in Q_{++}] &#8211; 1, \end{align*} where $Q_{++}$ denotes the upper-right quadrant of ${\mathbb R}^2$ and the last step uses the symmetry $\mathop{\bf Pr}[\vec{\boldsymbol{S}} \in Q_{++}] = \mathop{\bf Pr}[\vec{\boldsymbol{S}} \in Q_{--}]$. The $2$-dimensional Central Limit Theorem lets us deduce \[ \lim_{n \to \infty} \mathop{\bf Pr}[\vec{\boldsymbol{S}} \in Q_{++}] = \mathop{\bf Pr}[\vec{\boldsymbol{Z}} \in Q_{++}]. \] So to justify the noise stability formula \eqref{eqn:re-maj-stab} for majority, it remains to verify \[ 4\mathop{\bf Pr}[\vec{\boldsymbol{Z}} \in Q_{++}] &#8211; 1 = \tfrac{2}{\pi} \arcsin \rho \quad\Leftrightarrow\quad \mathop{\bf Pr}[\vec{\boldsymbol{Z}} \in Q_{++}] = \tfrac14 + \tfrac{1}{2\pi} \arcsin \rho. \] And this is turn is an old identity known as <em>Sheppard&#8217;s Formula</em> (which you are asked to prove in the exercises).<br />
<blockquote><b>Sheppard&#8217;s Formula</b>  Let $\boldsymbol{z}_1$, $\boldsymbol{z}_2$ be standard Gaussian random variables with correlation $\mathop{\bf E}[\boldsymbol{z}_1 \boldsymbol{z}_2] = \rho \in [-1,1]$. Then \[ \mathop{\bf Pr}[\boldsymbol{z}_1 &gt; 0, \boldsymbol{z}_2 &gt; 0] = \tfrac14 + \tfrac{1}{2\pi} \arcsin \rho. \] </p></blockquote>
<p>
This completes the justification of the formula $\mathbf{Stab}_\rho[\mathrm{Maj}_n] \to \tfrac{2}{\pi} \arcsin \rho$. You may have noticed that once we applied the $2$-dimensional Central Limit Theorem to \eqref{eqn:stab-maj-justify1}, the remainder of the derivation had nothing to do with majority. In fact, the same analysis works for <em>any</em> linear threshold function $\mathrm{sgn}(a_1 x_1 + \cdots + a_n x_n)$, the only difference being the &#8220;error term&#8221; arising from the CLT. As in Theorem <a href="#thml1-be">13</a>, this error is small so long as no coefficient $a_i$ is too dominant: </p>
<blockquote><p><b>Theorem 14</b> <a name="thmltf-stab-error"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$ be an unbiased LTF, $f(x) = \mathrm{sgn}(a_1 x_1 + \cdots + a_n x_n)$ with $\sum_i a_i^2 = 1$ and $|a_i| \leq \epsilon$ for all $i$. Then for any $\rho \in (-1,1)$, \[ \Bigl|\mathbf{Stab}_\rho[f] &#8211; \tfrac{2}{\pi} \arcsin \rho\Bigr| \leq O\Bigl(\tfrac{\epsilon}{\sqrt{1-\rho^2}}\Bigr). \] </p></blockquote>
<p> You are asked to prove Theorem <a href="#thmltf-stab-error">14</a> in the exercises. In the particular case of $\mathrm{Maj}_n$ where $a_i = \tfrac{1}{\sqrt{n}}$ for all $i$ we can make a slightly stronger claim (again, see the exercises): </p>
<blockquote><p><b>Theorem 15</b> <a name="thmmaj-stab-precise"></a> For any $\rho \in [0,1)$, $\mathbf{Stab}_\rho[\mathrm{Maj}_n]$ is a decreasing function of $n$, with \[ \tfrac{2}{\pi} \arcsin \rho \leq \mathbf{Stab}_\rho[\mathrm{Maj}_n] \leq \tfrac{2}{\pi} \arcsin \rho + O\Bigl(\tfrac{1}{\sqrt{1-\rho^2}\sqrt{n}}\Bigr). \] </p></blockquote>
<p><p>
We end this section by mentioning another way in which the majority function is extremal &#8212; among all unbiased functions with small influences, it has (essentially) the largest noise stability: </p>
<blockquote><p><b>Majority Is Stablest Theorem (simplified)</b>  Fix $\rho \in (0,1)$. Then for any $f : \{-1,1\}^n \to \{-1,1\}$ with $\mathop{\bf E}[f] = 0$ and $\mathbf{MaxInf}[f] \leq \tau$, \[ \mathbf{Stab}_\rho[f] \leq \tfrac{2}{\pi} \arcsin \rho + o_\tau(1). \]</p></blockquote>
<p>    For sufficiently small $\rho$, we&#8217;ll prove this in Section 4 of this chapter. The proof of the full Majority Is Stablest Theorem will have to wait until Chapter 13. 																		 		</p>
]]></content:encoded>
			<wfw:commentRss>http://www.contrib.andrew.cmu.edu/~ryanod/?feed=rss2&#038;p=866</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>§5.1: Linear threshold functions and polynomial threshold functions</title>
		<link>http://www.contrib.andrew.cmu.edu/~ryanod/?p=856</link>
		<comments>http://www.contrib.andrew.cmu.edu/~ryanod/?p=856#comments</comments>
		<pubDate>Wed, 21 Mar 2012 15:22:09 +0000</pubDate>
		<dc:creator>Ryan O'Donnell</dc:creator>
				<category><![CDATA[All chapter sections]]></category>
		<category><![CDATA[Chapter 5: Majority and threshold functions]]></category>
		<category><![CDATA[Chow parameters]]></category>
		<category><![CDATA[Gotsman--Linial Theorem]]></category>
		<category><![CDATA[linear threshold function]]></category>
		<category><![CDATA[polynomial threshold function]]></category>

		<guid isPermaLink="false">http://www.contrib.andrew.cmu.edu/~ryanod/?p=856</guid>
		<description><![CDATA[<p> </p> <p> Recall from Chapter 2.1 that a linear threshold function (abbreviated LTF) is a boolean-valued function $f : \{-1,1\}^n \to \{-1,1\}$ that can be represented as \begin{equation} \label{eqn:generic-LTF} f(x) = \mathrm{sgn}(a_0 + a_1 x_1 + \cdots + a_n x_n) \end{equation} for some constants $a_0, a_1, \dots, a_n \in {\mathbb R}$. (For definiteness we&#8217;ll [...]]]></description>
			<content:encoded><![CDATA[<p> <a name="secltfs-ptfs"></a></p>
<p>
 Recall from Chapter <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=344" target="_blank">2.1</a> that a linear threshold function (abbreviated LTF) is a boolean-valued function $f : \{-1,1\}^n \to \{-1,1\}$ that can be represented as \begin{equation} \label{eqn:generic-LTF} f(x) = \mathrm{sgn}(a_0 + a_1 x_1 + \cdots + a_n x_n) \end{equation} for some constants $a_0, a_1, \dots, a_n \in {\mathbb R}$.<br />
<span id="more-856"></span><br />
(For definiteness we&#8217;ll take $\mathrm{sgn}(0) = 1$. If we&#8217;re using the representation $f : \{-1,1\}^n \to \{0,1\}$ then $f$ is an LTF if it can be represented as $f(x) = 1_{\{a_0 + a_1 x_1 + \cdots + a_n x_n &gt; 0\}}$.) Examples include majority, AND, OR, dictators, and decision lists (Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=602#exdecision-list" target="_blank">3.23</a>). Besides representing &#8220;weighted majority&#8221; voting schemes, LTFs play an important role in learning theory and in circuit complexity.</p>
<p>
There is also a geometric perspective on LTFs. Writing $\ell(x) = a_0 + a_1 x_1 + \cdots + a_n x_n$, we can think of $\ell$ as an affine function ${\mathbb R}^n \to {\mathbb R}$. Then $\mathrm{sgn}(\ell(x))$ is the $\pm 1$-indicator of a <em>halfspace</em> in ${\mathbb R}^n$. A boolean LTF is thus the restriction of such a halfspace-indicator to the discrete cube $\{-1,1\}^n \subset {\mathbb R}^n$. Equivalently, a function $f : \{-1,1\}^n \to \{-1,1\}$ is an LTF if and only if it has a &#8220;linear separator&#8221;; i.e., a hyperplane in ${\mathbb R}^n$ which separates the points $f$ labels $1$ from the points $f$ labels $-1$.</p>
<p>
An LTF $f : \{-1,1\}^n \to \{-1,1\}$ can have several different representations as in \eqref{eqn:generic-LTF} &#8212; in fact it always has infinitely many. This is clear from the geometric viewpoint; any small enough perturbation to a linear separator will not change the way it partitions the discrete cube. Because we can make these perturbations, we may ensure that $a_0 + a_1 x_1 + \cdots + a_n x_n \neq 0$ for every $x \in \{-1,1\}^n$. We&#8217;ll usually insist that LTF representations have this property so that the nuisance of $\mathrm{sgn}(0)$ doesn&#8217;t arise. We also observe that we can scale all of the coefficients in an LTF representation by the same positive constant without changing the LTF. These observations can be used to show it&#8217;s always possible to take the $a_i$&#8217;s to be integers (see the exercises). However we will most often scale so that $\sum_{i=1}^n a_i^2 = 1$; this is convenient when using the Central Limit Theorem. The most elegant result connecting LTFs and Fourier expansions is Chow&#8217;s Theorem, which says that a boolean LTF is completely determined by its degree-$0$ and degree-$1$ Fourier coefficients. In fact, it&#8217;s determined not just within the class of LTFs but within the class of all boolean functions: </p>
<blockquote><p><b>Theorem 1</b> <a name="thmchow"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$ be an LTF and let $g : \{-1,1\}^n \to \{-1,1\}$ be any function. If $\widehat{g}(S) = \widehat{f}(S)$ for all $|S| \leq 1$ then $g = f$. </p></blockquote>
<p> <br/><em>Proof:</em>  Let $f(x) = \mathrm{sgn}(\ell(x))$, where $\ell : \{-1,1\}^n \to {\mathbb R}$ has degree at most $1$ and is never $0$ on $\{-1,1\}^n$. For any $x \in \{-1,1\}^n$ we have $f(x)\ell(x) = |\ell(x)| \geq g(x)\ell(x)$, with equality if and only if $f(x) = g(x)$ (here we use $\ell(x) \neq 0$). Using this observation along with Plancherel&#8217;s Theorem (twice) we have \[ \sum_{|S| \leq 1} \widehat{f}(S) \widehat{\ell}(S) = \mathop{\bf E}[f({\boldsymbol{x}})\ell({\boldsymbol{x}})] \geq \mathop{\bf E}[g({\boldsymbol{x}}) \ell({\boldsymbol{x}})] = \sum_{|S| \leq 1} \widehat{g}(S) \widehat{\ell}(S). \] But by assumption, the left-hand and right-hand sides above are equal. Thus the inequality must be an equality for every choice of ${\boldsymbol{x}}$; i.e., $f(x) = g(x)$ $\forall x$. $\Box$</p>
<p>   In light of Chow&#8217;s Theorem, the $n+1$ numbers $\widehat{g}(\emptyset)$, $\widehat{g}(\{1\}), \dots, \widehat{g}(\{n\})$ are sometimes called the <em>Chow parameters</em> of the boolean function $g$.</p>
<p>
As we will show later in this chapter, linear threshold functions are very noise-stable; hence they have a lot of their Fourier weight at low degrees. Here is a simple result along these lines: </p>
<blockquote><p><b>Theorem 2</b> <a name="thmgotsman-linial1"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$ be an LTF. Then $\mathbf{W}^{\leq 1}[f] \geq 1/2$. </p></blockquote>
<p> <br/><em>Proof:</em>  Writing $f(x) = \mathrm{sgn}(\ell(x))$ we have \[ \|\ell\|_1 = \mathop{\bf E}[|\ell({\boldsymbol{x}})|] = \langle f, \ell \rangle = \langle f^{\leq 1}, \ell \rangle \leq \|f^{\leq 1}\|_2 \|\ell\|_2 = \sqrt{\mathbf{W}^{\leq 1}[f]} \cdot \|\ell\|_2, \] where the third equality follows from Plancherel and the inequality is Cauchy&#8211;Schwarz. Assume first that $\ell(x) = a_1 x_1 + \cdots + a_n x_n$ (i.e., $\ell(x)$ has no constant term). The Khintchine&#8211;Kahane inequality (Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=494#exlo-kk" target="_blank">2.48</a>) states that $\|\ell\|_1 \geq \frac{1}{\sqrt{2}} \|\ell\|_2$ and hence we deduce \[ \tfrac{1}{\sqrt{2}} \|\ell\|_2 \leq \sqrt{\mathbf{W}^{\leq 1}[f]} \cdot \|\ell\|_2. \] The conclusion $\mathbf{W}^{\leq 1}[f] \geq 1/2$ follows immediately (since $\|\ell\|_2$ cannot be $0$). The case when $\ell(x)$ has a constant term is left for the exercises. $\Box$</p>
<p> From Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=465#exmaj-influences" target="_blank">2.18</a> we know that $\mathbf{W}^{\leq 1}[\mathrm{Maj}_n] = \mathbf{W}^{1}[\mathrm{Maj}_n] \geq 2/\pi$ for all $n$; it is reasonable to conjecture that majority is extremal for Theorem <a href="#thmgotsman-linial1">2</a>. This is an open problem.  </p>
<blockquote><p><b>Conjecture.</b> <a name="conjltf-weight-1"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$ be an LTF. Then $\mathbf{W}^{\leq 1}[f] \geq 2/\pi$.  </p></blockquote>
<p>A natural generalization of linear threshold functions is <em>polynomial threshold functions</em>: </p>
<blockquote><p><b>Definition 3</b> A function $f : \{-1,1\}^n \to \{-1,1\}$ is called a <em>polynomial threshold function (PTF)</em> of degree at most $k$ if it is expressible as $f(x) = \mathrm{sgn}(p(x))$ for some real polynomial $p : \{-1,1\}^n \to {\mathbb R}$ of degree at most $k$. </p></blockquote>
<blockquote><p><b>Example 4</b> <a name="egptf"></a> Let $f : \{-1,1\}^4 \to \{-1,1\}$ be the $4$-bit equality function which is $1$ if and only if all input bits are equal. Then $f$ is a degree-$2$ PTF because it has the representation $f(x) = \mathrm{sgn}(-3 + x_1x_2 + x_1x_3 + x_1x_4 + x_2x_3 + x_2x_4 + x_3x_4)$. </p></blockquote>
<p> <em>Every</em> boolean function $f : \{-1,1\}^n \to \{-1,1\}$ is a PTF of degree at most $n$, since we can take the sign of its Fourier expansion. Thus we are usually interested in the case when the degree $k$ is &#8220;small&#8221;; say, $k = O_n(1)$. Low-degree PTFs arise frequently in learning theory; for example, as hypotheses in the Low-Degree Algorithm and many other practical learning algorithms. They also arise in circuit complexity, wherein a PTF representation \[ f(x) = \mathrm{sgn}(\sum_{i=1}^s {a_i} x^{T_i}) \] is thought of as a &#8220;threshold-of-parities circuit&#8221;: i.e., a depth-$2$ circuit with $s$ &#8220;parity gates&#8221; $x^{T_i}$ at layer $1$ and a single &#8220;(linear) threshold gate&#8221; at layer $2$. From this point of view, the size of the circuit corresponds to the <em>sparsity</em> of the PTF representation: </p>
<blockquote><p><b>Definition 5</b> We say a PTF representation $f(x) = \mathrm{sgn}(p(x))$ has <em>sparsity</em> at most $s$ if $p(x)$ is a multilinear polynomial with at most $s$ terms. </p></blockquote>
<p>   For example, the PTF representation of the $4$-bit equality function from Example <a href="#egptf">4</a> has sparsity $7$. Let&#8217;s extend the two theorems about LTFs we proved above to the case of PTFs. The generalization of Chow&#8217;s Theorem is straightforward; its proof is left to the exercises: </p>
<blockquote><p><b>Theorem 6</b> <a name="thmptf-chow"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$ be a PTF of degree at most $k$ and let $g : \{-1,1\}^n \to \{-1,1\}$ be any function. If $\widehat{g}(S) = \widehat{f}(S)$ for all $|S| \leq k$ then $g = f$. </p></blockquote>
<p> We also have the following extension of Theorem <a href="#thmgotsman-linial1">2</a>: </p>
<blockquote><p><b>Theorem 7</b> <a name="thmgotsman-linial2"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$ be a degree-$k$ PTF. Then $\mathbf{W}^{\leq k}[f] \geq e^{-2k}$. </p></blockquote>
<p> <br/><em>Proof:</em>  Writing $f(x) = \mathrm{sgn}(p(x))$ for $p$ of degree $k$, we again have \[ \|p\|_1 = \mathop{\bf E}[|p({\boldsymbol{x}})|] = \langle f, p \rangle = \langle f^{\leq k}, p \rangle \leq \|f^{\leq k}\|_2 \|p\|_2 = \sqrt{\mathbf{W}^{\leq k}[f]} \cdot \|p\|_2. \] To complete the proof we need the fact that $\|p\|_2 \leq e^k \|p\|_1$ for any degree-$k$ polynomial $p : \{-1,1\}^n \to {\mathbb R}$. We will prove this much later in Chapter 10 on hypercontractivity. $\Box$</p>
<p>   The $e^{-2k}$ in this theorem cannot be improved beyond $2^{1-k}$; see the exercises.</p>
<p>
We close this section by discussing PTF sparsity. We begin with a (simpler) variant of Theorem <a href="#thmgotsman-linial2">7</a> which is useful for proving PTF sparsity lower bounds: </p>
<blockquote><p><b>Theorem 8</b> <a name="thmbruck"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$ be expressible as a PTF over the collection of monomials $\mathcal{F} \subseteq 2^{[n]}$; i.e., $f(x) = \mathrm{sgn}(p(x))$ for some polynomial $p(x) = \sum_{S \in \mathcal{F}} \widehat{p}(S) x^S$. Then $\sum_{S \in \mathcal{F}} |\widehat{f}(S)| \geq 1$. </p></blockquote>
<p> <br/><em>Proof:</em>  Define $g : \{-1,1\}^n \to {\mathbb R}$ by $g(x) = \sum_{S \in \mathcal{F}} \widehat{f}(S)\,x^S$. Since $\hat{\lVert} p \hat{\rVert}_\infty \leq \|p\|_1$ (Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=602#exhausdorff-young" target="_blank">3.8</a>) we have \[ \hat{\lVert} p \hat{\rVert}_\infty \leq \|p\|_1 = \mathop{\bf E}[f({\boldsymbol{x}})p({\boldsymbol{x}})] = \sum_{S \subseteq [n]} \widehat{f}(S) \widehat{p}(S) = \sum_{S \in \mathcal{F}} \widehat{g}(S) \widehat{p}(S) \leq \hat{\lVert} g \hat{\rVert}_1 \hat{\lVert} p \hat{\rVert}_\infty, \] and hence $\hat{\lVert} g \hat{\rVert}_1 \geq 1$ as claimed. $\Box$</p>
<p> We can use this result to show that the &#8220;inner product mod $2$ function&#8221; (see Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=301#excompute-expansions" target="_blank">1.1</a>) requires huge threshold-of-parities circuits: </p>
<blockquote><p><b>Corollary 9</b> <a name="corip-no-sparse-PTF"></a> Any PTF representation of the inner product mod $2$ function $\mathrm{IP}_{2n} : {\mathbb F}_2^{2n} \to \{-1,1\}$ has sparsity at least $2^n$. </p></blockquote>
<p> <br/><em>Proof:</em>  This follows immediately from Theorem <a href="#thmbruck">8</a> and the fact that $|\widehat{\mathrm{IP}}_{2n}(S)| = 2^{-n}$ for all $S \subseteq [2n]$ (Exercise <a href="http://www.contrib.andrew.cmu.edu/~ryanod/?p=301#excompute-expansions" target="_blank">1.1</a>). $\Box$</p>
<p>
We can also show that any function $f : \{-1,1\}^n \to \{-1,1\}$ with small Fourier $1$-norm $\hat{\lVert} f \hat{\rVert}_1$ has a sparse PTF representation. In fact a stronger result holds: such a function can be additively <em>approximated</em> by a sparse polynomial: </p>
<blockquote><p><b>Theorem 10</b> <a name="thmbruck-smolensky"></a> Let $f : \{-1,1\}^n \to {\mathbb R}$ be nonzero, let $\delta &gt; 0$, and let $s \geq 4n\hat{\lVert} f \hat{\rVert}_1^2/\delta^2$ be an integer. Then there is a multilinear polynomial $q : \{-1,1\}^n \to {\mathbb R}$ of sparsity at most $s$ such that $\|f &#8211; q\|_\infty &lt; \delta$. </p></blockquote>
<p> <br/><em>Proof:</em>  The proof is by the probabilistic method. Let $\boldsymbol{T} \subseteq [n]$ be randomly chosen according to the distribution ${\mathop{\bf Pr}[\boldsymbol{T} = T]} = \frac{|\widehat{f}(T)|}{\hat{\lVert} f \hat{\rVert}_1}$. Let $\boldsymbol{T}_1, \dots, \boldsymbol{T}_s$ be independent draws from this distribution and define the multilinear polynomial \[ \boldsymbol{p}(x) = \sum_{i=1}^s \mathrm{sgn}(\widehat{f}(\boldsymbol{T}_i))\,x^{\boldsymbol{T}_i}. \] When $x \in \{-1,1\}^n$ is fixed, each monomial $\mathrm{sgn}(\widehat{f}(\boldsymbol{T}_i)\,x^{\boldsymbol{T}_i}$ becomes a $\pm 1$-valued random variable with expectation \[ \sum_{T \subseteq [n]} \tfrac{|\widehat{f}(T)|}{\hat{\lVert} f \hat{\rVert}_1}\cdot \mathrm{sgn}(\widehat{f}(T))\,x^{T} = \tfrac{1}{\hat{\lVert} f \hat{\rVert}_1} \sum_{T \subseteq [n]} \widehat{f}(T)\,x^T = \tfrac{f(x)}{\hat{\lVert} f \hat{\rVert}_1}. \] Thus by a Chernoff bound , for any $\epsilon &gt; 0$, \[ \mathop{\bf Pr}_{\boldsymbol{T}_1, \dots, \boldsymbol{T}_s}\left[\Bigl|\boldsymbol{p}(x) - \tfrac{f(x)}{\hat{\lVert} f \hat{\rVert}_1} s\Bigr| \geq \epsilon s\right] \leq 2\exp(-\epsilon^2 s/2). \] Selecting $\epsilon = \delta/\hat{\lVert} f \hat{\rVert}_1$ and using $s \geq 4n\hat{\lVert} f \hat{\rVert}_1^2/\delta^2$, the probability is at most $2\exp(-2n) &lt; 2^{-n}$. Taking a union bound over all $2^n$ choices of $x \in \{-1,1\}^n$, we conclude that there exists some $p(x) = \sum_{i=1}^s \mathrm{sgn}(\widehat{f}(T_i))\,x^{T_i}$ such that for all $x \in \{-1,1\}^n$, \[ \Bigl|p(x) - \tfrac{f(x)}{\hat{\lVert} f \hat{\rVert}_1} s\Bigr| &lt; \epsilon s = \tfrac{\delta}{\hat{\lVert} f \hat{\rVert}_1} s \quad\Rightarrow\quad \Bigl|\tfrac{\hat{\lVert} f \hat{\rVert}_1}{s} \cdot p(x) - f(x) \Bigr| &lt; \delta. \] Thus we may take $q = \frac{\hat{\lVert} f \hat{\rVert}_1}{s} \cdot p$. $\Box$</p>
<blockquote><p><b>Corollary 11</b> <a name="corbruck-smolensky"></a> Let $f : \{-1,1\}^n \to \{-1,1\}$. Then $f$ is expressible as a PTF of sparsity at most $s = \lceil 4n\hat{\lVert} f \hat{\rVert}_1\rceil$. Indeed $f$ can be represented as a majority of $s$ parities or negated-parities. </p></blockquote>
<p> <br/><em>Proof:</em>  Apply the previous theorem with $\delta = 1$; we then have $f(x) = \mathrm{sgn}(q(x))$. Since this is also equivalent to $\mathrm{sgn}(p(x))$, the terms $\mathrm{sgn}(\widehat{f}(T_i))\,x^{T_i}$ are the required parities/negated-parities. $\Box$</p>
<p> Though functions computable by small DNFs need not have small Fourier $1$-norm, it is a further easy corollary that they can be computed by sparse PTFs: see the exercises. We also remark that there is no good converse to Corollary <a href="#corbruck-smolensky">11</a>: the $\mathrm{Maj}_n$ function has a PTF (indeed, an LTF) of sparsity $n$ but has exponentially large Fourier $1$-norm (as you will be asked to show in the exercises).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.contrib.andrew.cmu.edu/~ryanod/?feed=rss2&#038;p=856</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

