A small warning: On the download page, I am using a “Google form” to [...]]]>
A small warning: On the download page, I am using a “Google form” to ask for your email address, to which the PDF will be sent. I promise not to send you any other emails; just the one containing the attached PDF. I would also like to warn that I think Google allows at most 100 emails per day, so there is some small chance that we might hit the limit today. If that happens, please try again tomorrow! Finally, the server hosting this website has been a bit flaky lately; I hope it will be fine in the future.
Many thanks to Cambridge University Press for agreeing (several years ago!) to allow the PDF to be freely available. Of course, I’m also very thankful to the heroes who sent in comments and bug-fixes to the blog draft of the book. Please feel free to continue doing so — I’m happy to keep making corrections.
Regarding the differences between the printed book and the electronic version: The printed book is “Version 1.00″ and the PDF is “Version 1.01″. The main difference is that about 40 small typos/bugs have been fixed (nothing major; in almost all cases you can easily guess the correct thing once you notice the mistake). I would also like to add that the numbering of the theorems/definitions/exercises/etc. is identical between the printed book and the PDF, so if you would like to cite something, it doesn’t matter which version you cite. The page numbering is not the same, though, so please try not to cite things by page number.
Once more, thanks again to all the readers of the blog; I hope you enjoy the book!
Added later: We hit 100 emails; however, it appears that the downloads are still working. But in any case, if you have trouble, please try again tomorrow.
]]>In July I was at the (First Annual?) Swedish Summer School in Computer Science. This was wonderfully organized by KTH faculty Per Austrin, Johan Håstad, and Jakob Nordström, and took place [...]]]>
In July I was at the (First Annual?) Swedish Summer School in Computer Science. This was wonderfully organized by KTH faculty Per Austrin, Johan Håstad, and Jakob Nordström, and took place on the scenic island of Djurhamn in the Stockholm archipelago. Boaz Barak and I alternated lectures (he on Sums of Squares) to 50 or so highly motivated graduate students. I was very impressed that many of them worked quite hard on the homework outside of lecture times! Some photos by Talya Abram can be found here, including one of me holding an exquisite table linen gift, with the De–Mossel–Neeman differential equation in the background.
The second summer school (just finished) was held at the Nesin Mathematical Village, an idyllic site in the hills near the Aegean coast of Turkey. The village operates year-round, but its main business is in the summer, when a hundred or so high school, college, and graduate math students come each week for courses. The spirit of the place is charming, and the math village itself is quite beautiful, as you can see from these photos. I lectured here, though I wish I got to lecture here.
By the way, if you ever teach a class on Analysis of Boolean Functions yourself, please drop me a line; I like to keep track of links to them. (So far I know of 31 such courses since 2002.)
]]> We describe a web of connections between the following [...]]]>
We describe a web of connections between the following topics: the mathematical theory of voting and social choice; the computational complexity of the Maximum Cut problem; the Gaussian Isoperimetric Inequality and Borell’s generalization thereof; the Hypercontractive Inequality of Bonami; and, the analysis of Boolean functions. A major theme is the technique of reducing inequalities about Gaussian functions to inequalities about Boolean functions $f : \{-1,1\}^n \to \{-1,1\}$, and then using induction on $n$ to further reduce to inequalities about functions $f : \{-1,1\} \to \{-1,1\}$. We especially highlight De, Mossel, and Neeman’s recent use of this technique to prove the Majority Is Stablest Theorem and Borell’s Isoperimetric Inequality simultaneously.
The article actually includes a few ideas that don’t explicitly appear in the book, so please take a look if you’re interested. I hope the article will make a nice fit for the congress; it has a few connections (some major, some minor) with work being discussed by several other speakers here, including Boaz Barak, Andrei Bulatov, Sourav Chatterjee, Julia Chuzhoy, Alessio Figalli, Monique Laurent, Michel Ledoux, and (last but certainly not least) our newest Nevanlinna Prize winner, Subhash Khot. (Congratulations Subhash!) Indeed, my talk on Saturday will mostly center around a very famous conjecture made by Subhash — I refer, of course, to the Majority Is Stablest Conjecture.
PS: In case you happen to actually be at the congress, you can buy a “special ICM edition” of the book for $30 (or, even cheaper, ₩30,000) at the Cambridge University Press booth.
]]>One small note: In the relatively near future I will post here an electronic version of the book. It will actually be version 1.01, with version 1.00 being the [...]]]>
One small note: In the relatively near future I will post here an electronic version of the book. It will actually be version 1.01, with version 1.00 being the printed book. The electronic version will include fixes for typos found subsequent to the book’s printing.
Due to laziness, however, I will stop incorporating fixes into the old blog posts. So those are now basically deprecated. Feel free to take a look at them till the PDF comes out, of course. But this means there’s a chance that when you find a typo or mistake on the blog pages, I might respond with “Thanks — someone else found that too, and it’s already been corrected.”
Once again, thanks so much to all of you for the corrections!
]]>There will be at least one more post on this blog; I will release a free PDF of the book in its entirety some time around September. Watch this space!
]]>As mentioned in the notes on Chapter 9, the Gaussian Hypercontractivity Theorem is originally due to Nelson [Nel66] and now has many known proofs. The idea behind the proof we presented — first proving the Boolean hypercontractivity result and then deducing the Gaussian case by the Central Limit Theorem — is due to Gross [Gro75] (see also Trotter [Tro58]). Gross actually used the idea to prove his Gaussian Log-Sobolev Inequality, and thereby deduced the Gaussian Hypercontractivity Theorem. Direct proofs of the Gaussian Hypercontractivity Theorem have been given by Neveu [Nev76] (using stochastic calculus), Brascamp and Lieb [BL76a] (using rearrangement [BL76a]), and Ledoux [Led13] (using a variation on Exercises 26–29); direct proofs of the Gaussian Log-Sobolev Inequality have been given by Adams and Clarke [AC79], by Bakry and Emery [BE85a], and by Ledoux [Led92], the latter two using semigroup techniques. Bakry’s survey [Bak94] on these topics is also recommended.
The Gaussian Isoperimetric Inequality was first proved independently by Borell [Bor75] and by Sudakov and Tsirel’son [ST78]. Both works derived the result by taking the isoperimetric inequality on the sphere (due to Lévy [Lev22] and Schmidt [Sch48], see also Figiel, Lindenstrauss, and Milman [FLM77]) and then taking “Poincaré’s limit” — i.e., viewing Gaussian space as a projection of the sphere of radius $\sqrt{n}$ in $n$ dimensions, with $n \to \infty$ (see Lévy [Lev22], McKean [McK73], and Diaconis and Freedman [DF87]). Ehrhard [Ehr83] gave a different proof using a symmetrization argument intrinsic to Gaussian space. This may be compared to the alternate proof of the spherical isoperimetric inequality [Ben84] based on the “two-point symmetrization” of Baernstein and Taylor [BT76] (analogous to Riesz rearrangement in Euclidean space and to the polarization operation from Exercise 2.45).
To carefully define Gaussian surface area for a broad class of sets requires venturing into the study of geometric measure theory and functions of bounded variation. For a clear and comprehensive development in the Euclidean setting (including the remark in Exercise 15), see the book by Ambrosio, Fusco, and Pallara [AFP00]. There’s not much difference between the Euclidean and finite-dimensional Gaussian settings; research on Gaussian perimeter tends to focus on the trickier infinite-dimensional case. For a thorough development of surface area in this latter setting (which of course includes finite-dimensional Gaussian space as a special case) see the work of Ambrosio, Miranda, Maniglia, and Pallara [AMMP10]; in particular, Theorem 4.1 in that work gives several additional equivalent definitions for $\text{surf}_\gamma$ besides those in Definition 48. Regarding the fact that $\mathbf{RS}_A’(0^+)$ is an equivalent definition, the Euclidean analogue of this statement was proven in Miranda et al. [MPPP07] and the statement itself follows similarly [Mir13] using Ambrosio et al. [AFR13]. (Our heuristic justification of 11.4.(2) is similar to the one given by Kane [Kan11a].) Additional related results can be found in Hino [Hin10] (which includes the remark about convex sets at the end of Definition 48), Ambrosio and Figalli [AF11], Miranda et al. [MNP12], and Ambrosio et al. [AFR13].
The inequality of Theorem 51 is explicit in Ledoux [Led94] (see also the excellent survey [Led96]); he used it to deduce the Gaussian Isoperimetric Inequality. He also noted that it’s essentially deducible from an earlier inequality of Pisier and Maurey [Pis86] (Theorem 2.2). Theorem 43, which expresses the subadditivity of rotation sensitivity, can be viewed as a discretization of the Pisier–Maurey inequality. This theorem appeared in work of Kindler and O’Donnell [KO12], which also made the observations about the volume-$\tfrac{1}{2}$ case of Borell’s Isoperimetric Theorem at the end of Section 3 and in Remark 75.
Bobkov’s Inequality [Bob97] in the special case of Gaussian space had already been implicitly established by Ehrhard [Ehr84]; the striking novelty of Bobkov’s work (partially inspired by Talagrand [Tal93]) was his reduction to the two-point Boolean inequality. The proof of this inequality which we presented is, as mentioned a discretization of the stochastic calculus proof of Barthe and Maurey [BM00a]. (In turn, they were extending the stochastic calculus proof of Bobkov’s Inequality in the Gaussian setting due to Capitaine, Hsu, and Ledoux [CHL97].) The idea that it’s enough to show that Claim 54 is “nearly true” by computing two derivatives — as opposed to showing it’s exactly true by computing four derivatives — was communicated to the author by Yuval Peres. Following Bobkov’s paper, Bakry and Ledoux [BL96b] established Theorem 55 in very general infinite-dimensional settings including Gaussian space; Ledoux [Led98] further pointed out that the Gaussian version of Bobkov’s Inequality has a very short and direct semigroup-based proof. See also Bobkov and Götze [BG99] and Tillich and Zémor [TZ00] for results similar to Bobkov’s Inequality in other discrete settings. Borell’s Isoperimetric Theorem is from Borell [Bor85]. Borell’s proof used “Ehrhard symmetrization” and actually gave much stronger results — e.g., that if $f, g \in L^2({\mathbb R}^n, \gamma)$ are nonnegative and $q \geq 1$, then $\langle (\mathrm{U}_\rho f)^q, g\rangle$ can only increase under simultaneous Ehrhard symmetrization of $f$ and $g$. There are at least four other known proofs of the basic Borell Isoperimetric Theorem. Beckner [Bec92] observed that the analogous isoperimetric theorem on the sphere follows from two-point symmetrization; this yields the Gaussian result via Poincaré’s limit (for details, see Carlen and Loss [CL90]). (This proof is perhaps the conceptually simplest one, though carrying out all the technical details is a chore.) Mossel and Neeman [MN12] gave the proof based on semigroup methods outlined in Exercises 26–29, and later together with De [DMN12] gave a “Bobkov-style” Boolean proof (see Exercise 30). Finally, Eldan [Eld13] gave a proof using stochastic calculus.
As mentioned in Section 5 there are several known ways to prove the Berry–Esseen Theorem. Aside from the original method (characteristic functions), there is also Stein’s Method [Ste72,Ste86a]; see also, e.g., [Bol84,BH84,CGS11]. The Replacement Method approach we presented originates in the work of Lindeberg [Lin22]. The mollification techniques used (e.g., those in Exercise 40) are standard. The Invariance Principle as presented in Section 6 is from Mossel, O’Donnell, and Oleszkiewicz [MOO10]. Further extensions (e.g., Exercise 48) appear in the work of Mossel [Mos10]. In fact the Invariance Principle dates back to the 1971 work of Rotar’ [Rot73,Rot74]; therein he essentially proved the Invariance Principle for degree-$2$ multilinear polynomials (even employing the term “influence” as we do for the quantity in Definition 63). Earlier work on extending the Central Limit Theorem to higher-degree polynomials had focused on obtaining sufficient conditions for polynomials (especially quadratics) to have a Gaussian limit distribution; this is the subject of U-statistics. Rotar’ emphasized the idea of invariance and of allowing any (quadratic) polynomial with low influences. Rotar’ also credited Girko [Gir73] with related results in the case of positive definite quadratic forms. In 1975, Rotar’ [Rot75] generalized his results to handle multilinear polynomials of any constant degree, and also random vectors (as in Exercise 48). (Rotar’ also gave further refinements in 1979 [Rot79].)
The difference between the results of Rotar’ [Rot75] and Mossel et al. [MOO10] comes in the treatment of the error bounds. It’s somewhat difficult to extract simple-to-state error bounds from Rotar’ [Rot75], as the error there is presented as a sum over $i \in [n]$ of expressions $\mathop{\bf E}[F({\boldsymbol{x}}) \boldsymbol{1}_{|F({\boldsymbol{x}})| > u_i}]$, where $u_i$ involves $\mathbf{Inf}_i[F]$. (Partly this is so as to generalize the statement of the Lindeberg CLT.) Nevertheless, the work of Rotar’ implies a Lévy distance bound as in Corollary 69, with some inexplicit function $o_\epsilon(1)$ in place of $(1/\rho)^{O(k)} \epsilon^{1/8}$. By contrast, the work of Mossel et al. [MOO10] shows that a straightforward combination of the Replacement Method and hypercontractivity yields good, explicit error bounds. Regarding the Carbery–Wright Theorem [CW01], an alternative exposition appears in Nazarov, Sodin, and Vol’berg [NSV02].
Regarding the Majority Is Stablest Theorem (conjectured in Khot, Kindler, Mossel, and O’Donnell [KKMO04] and proved originally in Mossel, O’Donnell, and Oleszkiewicz [MOO05a]), it can be added that additional motivation for the conjecture came from Kalai [Kal02]. The fact that (SDP) is an efficiently computable relaxation for the Max-Cut problem dates back to the 1990 work of Delorme and Poljak [DP93]; however, they were unable to give an analysis relating its value to the optimum cut value. In fact, they conjectured that the case of the $5$-cycle from Example 72 had the worst ratio of $\mathrm{Opt}(G)$ to $\mathrm{SDPOpt}(G)$. Goemans and Williamson [GW94] were the first to give a sharp analysis of the SDP (Theorem 71), at least for $\theta \geq \theta^*$. Feige and Schechtman [FS02] showed an optimal integrality gap for the SDP for all values $\theta \geq \theta^*$ (in particular, showing an integrality gap ratio of $c_{\text{GW}}$); interestingly, their construction essentially involved proving Borell’s Isoperimetric Inequality (though they did it on the sphere rather than in Gaussian space). Both before and after the Khot et al. [KKMO04] UG-hardness result for Max-Cut there was a long line of work [Kar99,Zwi99,AS00,ASZ02,CW04,KV05,FL06,KO06] devoted to improving the known approximation algorithms and UG-hardness results, in particular for $\theta < \theta^*$. This culminated in the results from O’Donnell and Wu [OW08] (mentioned in Remark 74), which showed explicit matching $(\alpha,\beta)$-approximation algorithms, integrality gaps, and UG-hardness results for all $\frac12 < \beta < 1$. The fact that the best integrality gaps matched the best UG-hardness results proved not to be a coincidence; in contemporaneous work, Raghavendra [Rag08] showed that for any CSP, any SDP integrality gap could be turned into a matching Dictator-vs.-No-Notables test. This implies the existence of matching efficient $(\alpha,\beta)$-approximation algorithms and UG-hardness results for every CSP and every $\beta$. See Raghavendra’s thesis [Rag09] for full details of his earlier publication [Rag08] (including some Invariance Principle extensions building further on Mossel [Mos10]); see also Austrin’s work [Aus07,Aus07a] for precursors to the Raghavendra theory.
Exercise 31 concerns a problem introduced by Talagrand [Tal89]. Talagrand offers a $1000 prize [Tal06a] for a solution to the following Boolean version of the problem: Show that for any fixed $0 < \rho < 1$ and for $f : \{-1,1\}^n \to {\mathbb R}^{\geq 0}$ with $\mathop{\bf E}[f] = 1$ it holds that $\mathop{\bf Pr}[\mathrm{T}_\rho f > t] = o(1/t)$ as $t \to \infty$. (The rate of decay may depend on $\rho$ but not, of course, on $n$; in fact, a bound of the form $O(\frac{1}{t \sqrt{\log t}})$ is expected.) The result outlined in Exercise 31 (obtained together with James Lee) is for the very special case of $1$-dimensional Gaussian space; Ball, Barthe, Bednorz, Oleszkiewicz, and Wolff [BBB+13] obtained the same result and also showed a bound of $O(\frac{\log \log t}{t \sqrt{\log t}})$ for $d$-dimensional Gaussian space (with the constant in the $O(\cdot)$ depending on $d$).
The Multifunction Invariance Principle (Exercise 48 and its special case Exercise 46) are from Mossel [Mos10]; the version for general product spaces (Exercise 49) is from Mossel, O’Donnell, and Oleszkiewicz [MOO10].
]]>Theorem 77 For each $\rho \in (0,1)$ and $\epsilon > 0$, there exists a constant $C_{\rho, \epsilon}$ such that the following holds: If $f, g : \{-1,1\}^n \to [\epsilon, 1-\epsilon]$, then \[ \mathop{\bf E}_{\substack{({\boldsymbol{x}}, {\boldsymbol{x}}') \\ \text{$\rho$-correlated}}}[\Lambda_\rho(f({\boldsymbol{x}}), g({\boldsymbol{x}}'))] \leq \Lambda_\rho(\mathop{\bf E}[f], \mathop{\bf E}[g]) + C_{\rho, \epsilon}\cdot (\Delta_n[f] + \Delta_n[g]). \] Here we using the following inductive notation: $\Delta_1[f] = \mathop{\bf E}[|f - \mathop{\bf E}[f]|^3]$, and \[ \Delta_n[f] = \mathop{\bf E}_{{\boldsymbol{x}}_n \sim \{-1,1\}}\left[\Delta_{n-1}[f_{ \mid {\boldsymbol{x}}_n}]\right] + \Delta_1[f^{\subseteq \{n\}}]. \]
Proposition 78 Let $\psi : {\mathbb R}^d \to {\mathbb R}$ be $c$-Lipschitz. Then for any $\eta > 0$ there exists $\widetilde{\psi}_\eta : {\mathbb R}^d \to {\mathbb R}$ satisfying $\|\psi – \widetilde{\psi}_\eta\|_\infty \leq c \sqrt{d} \eta$ and $\|\partial^\beta \widetilde{\psi}_\eta\|_\infty \leq C_{|\beta|} c \sqrt{d}/\eta^{|\beta|-1}$ for each multi-index $\beta \in {\mathbb N}^{d}$ with $|\beta| = \sum_i \beta_i \geq 1$, where $C_k$ is a constant depending only on $k$.
Invariance Principle for Sums of Random Vectors Let $\vec{\boldsymbol{X}}_1, \dots, \vec{\boldsymbol{X}}_n$, $\vec{\boldsymbol{Y}}_1, \dots, \vec{\boldsymbol{Y}}_n$ be independent ${\mathbb R}^d$-valued random variables with matching means and covariance matrices; i.e., $\mathop{\bf E}[\vec{\boldsymbol{X}}_t] = \mathop{\bf E}[\vec{\boldsymbol{Y}}_t]$ and $\mathop{\bf Cov}[\vec{\boldsymbol{X}}_t] = \mathop{\bf Cov}[\vec{\boldsymbol{Y}}_t]$ for all $t \in [n]$. (Note that the $d$ individual components of a particular $\vec{\boldsymbol{X}}_t$ or $\vec{\boldsymbol{Y}}_t$ are not required to be independent.) Write $\vec{\boldsymbol{S}}_X = \sum_{t=1}^n \vec{\boldsymbol{X}}_t$ and $\vec{\boldsymbol{S}}_Y = \sum_{t=1}^n \vec{\boldsymbol{Y}}_t$. Then for any $\mathcal{C}^3$ function $\psi : {\mathbb R}^d \to {\mathbb R}$ satisfying $\|\partial^\beta \psi\|_\infty \leq C$ for all $|\beta| = 3$, \[ \left| \mathop{\bf E}[\psi(\vec{\boldsymbol{S}}_X)] – \mathop{\bf E}[\psi(\vec{\boldsymbol{S}}_Y)] \right| \leq C \gamma_{\vec{X}\vec{Y}}, \] where \[ \gamma_{\vec{X}\vec{Y}} = \sum_{\substack{\beta \in {\mathbb N}^d\\|\beta| = 3}} \frac{1}{\beta!} \sum_{t=1}^n \bigl(\mathop{\bf E}[|\vec{\boldsymbol{X}}_t^\beta|] + \mathop{\bf E}[|\vec{\boldsymbol{Y}}_t^\beta|]\bigr). \]
Multifunction Invariance Principle Let $F^{(1)}$, \dots, $F^{(d)}$ be formal $n$-variate multilinear polynomials each of degree at most $k \in {\mathbb N}$. Let $\vec{{\boldsymbol{x}}}_1, \dots, \vec{{\boldsymbol{x}}}_n$ and $\vec{\boldsymbol{y}}_1, \dots, \vec{\boldsymbol{y}}_n$ be independent ${\mathbb R}^d$-valued random variables such that $\mathop{\bf E}[\vec{{\boldsymbol{x}}}_t] = \mathop{\bf E}[\vec{\boldsymbol{y}}_t] = 0$ and $M_t = \mathop{\bf Cov}[\vec{{\boldsymbol{x}}}_t] = \mathop{\bf Cov}[\vec{\boldsymbol{y}}_t]$ for each $t \in [n]$. Assume each $M_t$ has all its diagonal entries equal to $1$ (i.e., each of the $d$ components of $\vec{{\boldsymbol{x}}}_t$ has variance $1$, and similarly for $\vec{\boldsymbol{y}}_t$). Further assume each component random variable $\vec{{\boldsymbol{x}}}_t^{(j)}$ and $\vec{\boldsymbol{y}}_t^{(j)}$ is $(2,3,\rho)$-hypercontractive ($t \in [n]$, $j \in [d]$). Then for any $\mathcal{C}^3$ function $\psi : {\mathbb R}^d \to {\mathbb R}$ satisfying $\|\partial^\beta \psi\|_\infty \leq C$ for all $|\beta| = 3$, \[ \left|\mathop{\bf E}[\psi(\vec{F}(\vec{{\boldsymbol{x}}}))] – \mathop{\bf E}[\psi(\vec{F}(\vec{\boldsymbol{y}}))]\right| \leq \tfrac{Cd^2}{3} \cdot (1/\rho)^{3k} \cdot \sum_{t=1}^n \sum_{j=1}^d \mathbf{Inf}_t[F^{(j)}]^{3/2}. \] Here we are using the following notation: If $\vec{\boldsymbol{z}} = (\vec{\boldsymbol{z}}_1, \dots, \vec{\boldsymbol{z}}_n)$ is a sequence of ${\mathbb R}^d$-valued random variables, $\vec{F}(\vec{\boldsymbol{z}})$ denotes the vector in ${\mathbb R}^d$ whose $j$th component is $F^{(j)}(\vec{\boldsymbol{z}}_1^{(j)}, \dots, \vec{\boldsymbol{z}}_n^{(j)})$.
(Hint: Combine the proofs of the Basic Invariance Principle and the Invariance Principle for Sums of Random Vectors, Exercise 46. The only challenging part should be notation.)
Invariance Principle in general product spaces
Let $(\Omega, \pi)$ be a finite probability space, $|\Omega| = m \geq 2$, in which every outcome has probability at least $\lambda$. Suppose $f \in L^2(\Omega^n, \pi^{\otimes n})$ has degree at most $k$; thus, fixing some Fourier basis $\phi_0, \dots, \phi_{m-1}$ for $L^2(\Omega, \pi)$, we have \[ f = \sum_{\substack{\alpha \in {\mathbb N}^n_{<m} \\ \#\alpha \leq k}} \widehat{f}(\alpha)\phi_\alpha. \] Introduce indeterminates $x = (x_{i,j})_{i \in [n], j \in [m-1]}$ and let $F$ be the formal $(m-1)n$-variate polynomial of degree at most $k$ defined by \[ F(x) = \sum_{\#\alpha \leq k} \widehat{f}(\alpha) \prod_{i \in \mathrm{supp}(\alpha)} x_{i, \alpha_i}. \] Then for any $\psi : {\mathbb R} \to {\mathbb R}$ that is $\mathcal{C}^3$ and satisfies $\|\psi^{\prime \prime \prime}\|_\infty \leq C$ we have \[ \left|\mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^{(m-1)n}}[\psi(F({\boldsymbol{x}}))] – \mathop{\bf E}_{\boldsymbol{\omega} \sim \pi^{\otimes n}}[\psi(f(\boldsymbol{\omega}))]\right| \leq \tfrac{C}{3} \cdot (2\sqrt{2/\lambda})^k \cdot \sum_{i=1}^n \mathbf{Inf}_i[f]^{3/2}. \]
(Hint: For $0 \leq t \leq n$, define the function $h_t \in L^2(\Omega^t \times \{-1,1\}^{(m-1)(n-t)}, \pi^{\otimes t} \otimes \pi_{1/2}^{\otimes (m-1)(n-t)})$ via \[ h_t(\boldsymbol{\omega}_1, \dots, \boldsymbol{\omega}_t, {\boldsymbol{x}}_{t+1,1}, \dots, {\boldsymbol{x}}_{n,m-1}) = \sum_{\#\alpha \leq k} \widehat{f}(\alpha) \prod_{\substack{i \in \mathrm{supp}(\alpha) \\ i \leq t}} \phi_{\alpha_i}(\boldsymbol{\omega}_i) \prod_{\substack{i \in \mathrm{supp}(\alpha) \\ i > t}} {\boldsymbol{x}}_{i,\alpha_i}. \] Express \[ h_{t} = \mathrm{E}_t h_{t} + \mathrm{L}_t h_{t} = \mathrm{E}_t h_t + \sum_{j = 1}^m D_j \cdot \phi_j(\boldsymbol{\omega}_t) \] where \[ D_j = \sum_{\alpha : \alpha_t = j} \widehat{f}(\alpha) \prod_{\substack{i \in \mathrm{supp}(\alpha) \\ i < t}} \phi_{\alpha_i}(\boldsymbol{\omega}_i) \prod_{\substack{i \in \mathrm{supp}(\alpha) \\ i > t}} {\boldsymbol{x}}_{i,\alpha_i}, \] and note that $h_{t-1} = \mathrm{E}_t h_t + \sum_{j = 1}^m D_j \cdot {\boldsymbol{x}}_{t,j}$.)
Theorem 79 Let $(\Omega, \pi)$ be a finite probability space in which each outcome has probability at least $\lambda$. Let $f \in L^2(\Omega^n, \pi^{\otimes n})$ have range $[0,1]$. Suppose that $f$ has no $(\epsilon, \frac{1}{\log(1/\epsilon)})$-notable coordinates. Then for any ${0 \leq \rho < 1}$, \[ \mathbf{Stab}_\rho[f] \leq \Lambda_\rho(\mathop{\bf E}[f]) + O\bigl(\tfrac{\log \log (1/\epsilon)}{\log(1/\epsilon)}\bigr) \cdot \tfrac{\log(1/\lambda)}{1-\rho}. \]
(Hint: Naturally, you’ll need Exercise 49(b).)
Remark: It can be shown that the $\mathcal{C}^3$ hypothesis in Propositions 26 and 28 is not necessary (provided the derivatives are interpreted in the distributional sense); see, e.g., Bogachev [Bog98] (Chapter 1) for more details.
Definition 76 For $\rho \in [-1,1]$ we define the Gaussian quadrant probability function $\Lambda_\rho : [0,1]^2 \to [0,1]$ by \[ \Lambda_\rho(\alpha,\beta) = \mathop{\bf Pr}_{\substack{(\boldsymbol{z}, \boldsymbol{z}') \text{ $\rho$-correlated} \\ \text{standard Gaussians}}}[\boldsymbol{z} \leq t, \boldsymbol{z}' \leq t'], \] where $t$ and $t’$ are defined by $\Phi(t) = \alpha$, $\Phi(t’) = \beta$. This is a slight reparametrization of the bivariate Gaussian cdf. We also use the shorthand notation \[ \Lambda_\rho(\alpha) = \Lambda_\rho(\alpha, \alpha), \] which we encountered in Borell’s Isoperimetric Theorem (and also in Exercises 5.33 and 9.24, with a different, but equivalent, definition).
Two-Set Borell Isoperimetric Theorem Fix $\rho \in (0,1)$ and $\alpha, \beta \in [0,1]$. Then for any $A, B \subseteq {\mathbb R}^n$ with $\mathrm{vol}_\gamma(A) = \alpha$, $\mathrm{vol}_\gamma(B) = \beta$, \begin{equation} \label{eqn:two-set-borell1} \mathop{\bf Pr}_{\substack{(\boldsymbol{z}, \boldsymbol{z}’) \text{ $\rho$-correlated} \\ \text{$n$-dimensional Gaussians}}}[\boldsymbol{z} \in A, \boldsymbol{z}' \in B] \leq \Lambda_\rho(\alpha,\beta). \end{equation}
By definition of $\Lambda_\rho(\alpha,\beta)$, equality holds if $A$ and $B$ are parallel halfspaces. Taking $\beta = \alpha$ and $B = A$ in this theorem gives Borell’s Isoperimetric Theorem as stated in Section 3 (in the case of range $\{0,1\}$, at least, which is equivalent by Exercise 25). It’s quite natural to guess that parallel halfspaces should maximize the “joint Gaussian noise stability” quantity on the left of \eqref{eqn:two-set-borell1}, especially in light of Remark 2 from Chapter 10.1 concerning the analogous Generalized Small-Set Expansion Theorem. Just as our proof of the Small-Set Expansion Theorem passed through the Two-Function Hypercontracitivity Theorem to facilitate induction, so too does the Mossel–Neeman proof pass through the following “two-function version” of Borell’s Isoperimetric Theorem:
Two-Function Borell Isoperimetric Theorem Fix $\rho \in (0,1)$ and let $f, g \in L^2({\mathbb R}^n, \gamma)$ have range $[0,1]$. Then \[ \mathop{\bf E}_{\substack{(\boldsymbol{z}, \boldsymbol{z}') \text{ $\rho$-correlated} \\ \text{$n$-dimensional Gaussians}}}[\Lambda_\rho(f(\boldsymbol{z}),g(\boldsymbol{z}'))] \leq \Lambda_\rho\left(\mathop{\bf E}[f], \mathop{\bf E}[g]\right). \]
Turning to hardness of approximation, we know from Theorem 7.41 (developed in [KKMO04]) that to prove UG-hardness of $(\alpha+\delta, \beta-\delta)$-approximating Max-Cut, it suffices to construct an $(\alpha,\beta)$-Dictator-vs.-No-Notables test which uses the predicate $\neq$. As we’ll see in this section, the quality of the most natural such test can be easily inferred from the Majority Is Stablest Theorem. Assuming that theorem (as Khot et al. [KKMO04] did), we get a surprising conclusion: It’s UG-hard to approximate the Max-Cut CSP any better than the Goemans–Williamson Algorithm does. In other words, the peculiar approximation guarantee of Goemans and Williamson on the very simple Max-Cut problem is optimal (assuming the Unique Games Conjecture). Let’s demystify this somewhat, starting with a description of the Goemans–Williamson Algorithm. Let $G = (V,E)$ be an $n$-vertex input graph for the algorithm; we’ll write $(\boldsymbol{v}, \boldsymbol{w}) \sim E$ to denote that $(\boldsymbol{v}, \boldsymbol{w})$ is a uniformly random edge (i.e., $\neq$-constraint) in the graph. The first step of the Goemans–Williamson Algorithm is to solve following optimization problem: \begin{equation} \label{eqn:gw-sdp} \tag{SDP} \begin{aligned} \text{maximize}& \quad \mathop{\bf E}_{(\boldsymbol{v},\boldsymbol{w}) \sim E}\left[\tfrac{1}{2} - \tfrac{1}{2} \langle \vec{U}(\boldsymbol{v}), \vec{U}(\boldsymbol{w}) \rangle\right] \\ \text{subject to}& \quad \vec{U} : V \to S^{n-1}. \end{aligned} \end{equation} Here $S^{n-1}$ denotes the set of all unit vectors in ${\mathbb R}^n$. Somewhat surprisingly, since this optimization problem is a “semidefinite program” it can be solved in polynomial time using the Ellipsoid Algorithm. (Technically, it can only be solved up to any desired additive tolerance $\epsilon > 0$, but we’ll ignore this point.) Let’s write $\mathrm{SDPOpt}(G)$ for the optimum value of \eqref{eqn:gw-sdp}, and $\mathrm{Opt}(G)$ for the optimum Max-Cut value for $G$. We claim that \eqref{eqn:gw-sdp} is a relaxation of the Max-Cut CSP on input $G$, and therefore \[ \mathrm{SDPOpt}(G) \geq \mathrm{Opt}(G). \] To see this, simply note that if $F^* : V \to \{-1,1\}$ is an optimal assignment (“cut”) for $G$ then we can define $\vec{U}(v) = (F^*(v), 0, \dots, 0) \in S^{n-1}$ for each $v \in V$ and achieve the optimal cut value $\mathrm{Val}_G(F^*)$ in \eqref{eqn:gw-sdp}.
The second step of the Goemans–Williamson Algorithm might look familiar from Fact 7 and Remark 8. Let $\vec{U}^* : V \to S^{n-1}$ be the optimal solution for \eqref{eqn:gw-sdp}, achieving $\mathrm{SDPOpt}(G)$; abusing notation we’ll write $\vec{U}^*(v) = \vec{v}$. The algorithm now chooses $\vec{\boldsymbol{g}} \sim \mathrm{N}(0,1)^n$ at random and outputs the assignment (cut) $\boldsymbol{F} : V \to \{-1,1\}$ defined by $\boldsymbol{F}(v) = \mathrm{sgn}(\langle \vec{v}, \vec{\boldsymbol{g}} \rangle)$. Let’s analyze the (expected) quality of this assignment. The probability the algorithm’s assignment $\boldsymbol{F}$ cuts a particular edge $(v,w) \in E$ is \[ \mathop{\bf Pr}_{\vec{\boldsymbol{g}} \sim \mathrm{N}(0,1)^n}[\mathrm{sgn}(\langle \vec{v}, \vec{\boldsymbol{g}} \rangle) \neq \mathrm{sgn}(\langle \vec{w}, \vec{\boldsymbol{g}} \rangle)]. \] This is precisely the probability that $\mathrm{sgn}(\boldsymbol{z}) \neq \mathrm{sgn}(\boldsymbol{z}’)$ when $(\boldsymbol{z}, \boldsymbol{z}’)$ is a pair of $\langle \vec{v}, \vec{w}\rangle$-correlated $1$-dimensional Gaussians. Writing $\angle(\vec{v},\vec{w}) \in [0,\pi]$ for the angle between the unit vectors $\vec{v}, \vec{w}$, we conclude from Sheppard’s Formula (see equation 11.1.(2)) that \[ \mathop{\bf Pr}_{\vec{\boldsymbol{g}}}[\boldsymbol{F} \text{ cuts edge $(v,w)$}] = \frac{\angle(\vec{v}, \vec{w})}{\pi}. \] By linearity of expectation we can compute the expected value of the algorithm’s assignment $\boldsymbol{F}$: \begin{equation} \mathop{\bf E}_{\vec{\boldsymbol{g}}}[\mathrm{Val}_G(\boldsymbol{F})] = \mathop{\bf E}_{(\boldsymbol{v},\boldsymbol{w}) \sim E}\left[\angle(\vec{\boldsymbol{v}}, \vec{\boldsymbol{w}})/\pi\right]. \label{eqn:gw-gets}
\end{equation}
On the other hand, by definition we have \begin{equation} \mathrm{SDPOpt}(G) = \mathop{\bf E}_{(\boldsymbol{v},\boldsymbol{w}) \sim E}\left[\tfrac{1}{2}-\tfrac{1}{2}\cos \angle(\vec{\boldsymbol{v}}, \vec{\boldsymbol{w}})\right]. \label{eqn:gw-sdpopt} \end{equation} It remains to compare \eqref{eqn:gw-gets} and \eqref{eqn:gw-sdpopt}. Define \begin{equation} \label{eqn:cgw-def} c_{\text{GW}} = \min_{\theta \in [0,\pi]} \left\{\frac{\theta/\pi}{\tfrac{1}{2}-\tfrac{1}{2} \cos \theta}\right\} \approx .8786. \end{equation} Then from \eqref{eqn:gw-gets} and \eqref{eqn:gw-sdpopt} we immediately get \[ \mathop{\bf E}_{\vec{\boldsymbol{g}}}[\mathrm{Val}_G(\boldsymbol{F})] \geq c_{\text{GW}} \cdot \mathrm{SDPOpt}(G) \geq c_{\text{GW}} \cdot \mathrm{Opt}(G); \] i.e., in expectation the Goemans–Williamson Algorithm delivers a cut of value at least $c_{\text{GW}}$ times the Max-Cut. In other words, it’s a $(c_{\text{GW}} \beta, \beta)$-approximation algorithm, as claimed. By being a little bit more careful about this analysis (exercise) you can show following additional result:
Theorem 71 [GW95]. Let $\theta \in [\theta^*, \pi]$, where $\theta^* \approx .74\pi$ is the minimizing $\theta$ in \eqref{eqn:cgw-def} (also definable as the positive solution of $\tan(\theta/2) = \theta$). Then on any graph $G$ with $\mathrm{SDPOpt}(G) \geq \tfrac{1}{2} – \tfrac{1}{2} \cos \theta$, the Goemans–Williamson Algorithm produces a cut of (expected) value at least $\theta/\pi$. In particular, the algorithm is a $(\theta/\pi, \tfrac{1}{2} – \tfrac{1}{2} \cos \theta)$-approximation algorithm for Max-Cut.
Example 72 Consider the Max-Cut problem on the $5$-vertex cycle graph ${\mathbb Z}_5$. The best bipartition of this graph cuts $4$ out of the $5$ edges; hence $\mathrm{Opt}({\mathbb Z}_5) = \frac45$. In the exercises you are asked to show that taking \[ \vec{U}(v) = (\cos \tfrac{4\pi v}{5}, \sin \tfrac{4\pi v}{5}), \quad v \in {\mathbb Z}_5, \] in the semidefinite program \eqref{eqn:gw-sdp} establishes that $\mathrm{SDPOpt}({\mathbb Z}_5) \geq \tfrac{1}{2} – \tfrac{1}{2} \cos \frac{4\pi}{5}$. (These are actually unit vectors in ${\mathbb R}^2$ rather than in ${\mathbb R}^5$ as \eqref{eqn:gw-sdp} requires, but we can pad out the last three coordinates with zeroes.) This example shows that the Goemans–Williamson analysis in Theorem 71 lower-bounding $\mathrm{Opt}(G)$ in terms of $\mathrm{SDPOpt}(G)$ cannot be improved (at least when $\mathrm{SDPOpt}(G) = \frac45$). This is termed an optimal integrality gap. In fact, Theorem 71 also implies that $\mathrm{SDPOpt}({\mathbb Z}_5)$ must equal $\tfrac{1}{2} – \tfrac{1}{2} \cos \frac{4\pi}{5}$, for if it were greater, the theorem would falsely imply that $\mathrm{Opt}({\mathbb Z}_5) > \frac45$. Note that the Goemans–Williamson Algorithm actually finds the maximum cut when run on the cycle graph ${\mathbb Z}_5$. For a related example, see the exercises.
Now we explain the result of Khot et al. [KKMO04], that the Majority Is Stablest Theorem implies it’s UG-hard to approximate Max-Cut better than the Goemans–Williamson Algorithm does:
Theorem 73 [KKMO04]. Let $\theta \in (\frac{\pi}{2}, \pi)$. Then for any $\delta > 0$ it’s UG-hard to $(\theta/\pi + \delta, \tfrac{1}{2} – \tfrac{1}{2} \cos\theta)$-approximate Max-Cut.
Proof: It follows from Theorem 7.41 that we just need to construct a $(\theta/\pi, \tfrac{1}{2} – \tfrac{1}{2} \cos\theta)$-Dictator-vs.-No-Notables test using the predicate $\neq$. (See the exercises for an extremely minor technical point.) It’s very natural to try the following, with $\beta = \tfrac{1}{2} – \tfrac{1}{2} \cos \theta \in (\frac12, 1)$:
$\boldsymbol{\beta}$-Noise Sensitivity Test Given query access to $f : \{-1,1\}^n \to \{-1,1\}$:
- Choose ${\boldsymbol{x}} \sim \{-1,1\}^n$ and form ${\boldsymbol{x}}’$ by reversing each bit of ${\boldsymbol{x}}$ independently with probability $\beta = \tfrac{1}{2} – \tfrac{1}{2} \cos\theta$. In other words let $({\boldsymbol{x}}, {\boldsymbol{x}}’)$ be a pair of $\cos\theta$-correlated strings. (Note that $\cos\theta < 0$.)
- Query $f$ at ${\boldsymbol{x}}$, ${\boldsymbol{x}}’$.
- Accept if $f({\boldsymbol{x}}) \neq f({\boldsymbol{x}}’)$.
By design, \begin{equation} \label{eqn:kkmo-acceptance} \mathop{\bf Pr}[\text{the test accepts } f] = \mathbf{NS}_\beta[f] = \tfrac{1}{2} – \tfrac{1}{2} \mathbf{Stab}_{\cos \theta}[f]. \end{equation} (We might also express this as “$\mathbf{RS}_f(\theta)$”.) In particular, if $f$ is a dictator, it’s accepted with probability exactly $\beta = \tfrac{1}{2} – \tfrac{1}{2} \cos \theta$. To complete the proof that this is a $(\theta/\pi, \tfrac{1}{2} – \tfrac{1}{2} \cos\theta)$-Dictator-vs.-No-Notables test, let’s suppose $f : \{-1,1\}^n \to [-1,1]$ has no $(\epsilon,\epsilon)$-notable coordinates and show that \eqref{eqn:kkmo-acceptance} is at most $\theta/\pi + o_\epsilon(1)$. (Regarding $f$ having range $[-1,1]$, recall Remark 7.39.)
At first it might look like we can immediately apply the Majority Is Stablest Theorem; however, the theorem’s inequality goes the “wrong way” and the correlation parameter $\rho = \cos \theta$ is negative. These two difficulties actually cancel each other out. Note that \begin{align} \mathop{\bf Pr}[\text{the test accepts } f] &= \tfrac{1}{2} – \tfrac{1}{2} \mathbf{Stab}_{\cos \theta}[f] \nonumber\\ &= \tfrac{1}{2} – \tfrac{1}{2} \sum_{k=0}^n (\cos \theta)^{k} \mathbf{W}^{k}[f] \nonumber\\ &\leq \tfrac{1}{2} + \tfrac{1}{2} \sum_{\text{$k$ odd}} (-\cos \theta)^{k} \mathbf{W}^{k}[f] \tag{since $\cos \theta < 0$} \nonumber\\ &= \tfrac{1}{2} + \tfrac{1}{2} \mathbf{Stab}_{-\cos\theta}[f^\mathrm{odd}], \label{eqn:finish-kkmo} \end{align} where $f^\mathrm{odd} : \{-1,1\}^n \to [-1,1]$ is the odd part of $f$ (see Exercise 1.8) defined by \[ f^\mathrm{odd}(x) = \tfrac{1}{2}(f(x) - f(-x)) = \sum_{|S|\text{ odd}} \widehat{f}(S)\,x^S. \] Now we’re really in a position to apply the Majority Is Stablest Theorem to $f^\mathrm{odd}$, because $-\cos\theta \in (0,1)$, $\mathop{\bf E}[f^\mathrm{odd}] = 0$, and $f^\mathrm{odd}$ has no $(\epsilon,\epsilon)$-notable coordinates (since it’s formed from $f$ by just dropping some terms in the Fourier expansion). Using $-\cos\theta = \cos(\pi – \theta)$, the result is that \[ \mathbf{Stab}_{-\cos\theta}[f^\mathrm{odd}] \leq 1 – \tfrac{2}{\pi} \arccos(\cos(\pi – \theta)) + o_\epsilon(1) = 2\theta/\pi – 1 + o_\epsilon(1). \] Putting this into \eqref{eqn:finish-kkmo} yields \[ \mathop{\bf Pr}[\text{the test accepts } f] \leq \tfrac{1}{2} + \tfrac{1}{2} (2\theta/\pi – 1 + o_\epsilon(1)) = \theta/\pi + o_\epsilon(1), \] as needed. $\Box$
Remark 74 There’s actually still a mismatch between the algorithmic guarantee of Theorem 71 and the UG-hardness result Theorem 73, concerning the case of $\theta \in (\frac{\pi}{2}, \theta^*)$. In fact, for these values of $\theta$ — i.e., $\tfrac{1}{2} \leq \beta \lessapprox .8446$ — neither result is sharp; see O’Donnell and Wu [OW08].
Remark 75 If we want to prove UG-hardness of $(\theta’/\pi + \delta, \tfrac{1}{2} – \tfrac{1}{2} \cos\theta’)$-approximating Max-Cut, we don’t need the full version of Borell’s Isoperimetric Theorem; we only need the volume-$\tfrac{1}{2}$ case with parameter $\theta = \pi – \theta’$. Corollary 44 gave a simple proof of this result for $\theta = \frac{\pi}{4}$, hence $\theta’ = \frac{3}{4} \pi$. This yields UG-hardness of $(\tfrac34 + \delta, \tfrac{1}{2} + \frac{1}{2\sqrt{2}})$-approximating Max-Cut. The ratio between $\alpha$ and $\beta$ here is approximately $.8787$, very close to the Goemans–Williamson constant $c_{\text{GW}} \approx .8786$.
Finally, we will prove the General-Volume Majority Is Stablest Theorem, by using the Invariance Principle to reduce it to Borell’s Isoperimetric Theorem.
General-Volume Majority Is Stablest Theorem Let $f : \{-1,1\}^n \to [0,1]$. Suppose that $\mathbf{MaxInf}[f] \leq \epsilon$, or more generally, that $f$ has no $(\epsilon, \frac{1}{\log(1/\epsilon)})$-notable coordinates. Then for any $0 \leq \rho < 1$, \begin{equation} \label{eqn:mist} \mathbf{Stab}_\rho[f] \leq \Lambda_\rho(\mathop{\bf E}[f]) + O\bigl(\tfrac{\log \log (1/\epsilon)}{\log(1/\epsilon)}\bigr) \cdot \tfrac{1}{1-\rho}. \end{equation} (Here the $O(\cdot)$ bound has no dependence on $\rho$.)
Proof: The proof involves using the Basic Invariance Principle twice (in the form of Corollary 68). To facilitate this we introduce $f’ = \mathrm{T}_{1-\delta} f$, where (with foresight) we choose \[ \delta = 3 \tfrac{\log \log(1/\epsilon)}{\log(1/\epsilon)}. \] (We may assume $\epsilon$ is sufficiently small so that $0 < \delta \leq \frac{1}{20}$.) Note that $\mathop{\bf E}[f'] = \mathop{\bf E}[f]$ and that \[ \mathbf{Stab}_\rho[f'] = \sum_{S \subseteq [n]} \rho^{|S|} (1-\delta)^{2|S|} \widehat{f}(S)^2 = \mathbf{Stab}_{\rho(1-\delta)^2}[f]. \] But \begin{equation} \label{eqn:mist-tradeoff} \bigl|\mathbf{Stab}_{\rho(1-\delta)^2}[f] – \mathbf{Stab}_{\rho}[f]\bigr| \leq (\rho – \rho(1-\delta)^2) \cdot \tfrac{1}{1-\rho} \cdot \mathop{\bf Var}[f] \leq 2\delta \cdot \tfrac{1}{1-\rho} \end{equation} by Exercise 2.40, and with our choice of $\delta$ this can be absorbed into the error of \eqref{eqn:mist}. Thus it suffices to prove \eqref{eqn:mist} with $f’$ in place of $f$.
Let $\text{Sq} : {\mathbb R} \to {\mathbb R}$ be the continuous function which agrees with $t \mapsto t^2$ for $t \in [0,1]$ and is constant outside $[0,1]$. Note that $\text{Sq}$ is $2$-Lipschitz. We will apply the second part of Corollary 68 with “$h$” set to $\mathrm{T}_{\sqrt{\rho}} f$ (and thus $\mathrm{T}_{1-\delta} h = \mathrm{T}_{\sqrt{\rho}} f’$). This is valid since the variance and $(1-\delta)$-stable influences of $h$ are only smaller than those of $f$. Thus \begin{equation} \label{eqn:mist-halfway} \left|\mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[\text{Sq}(\mathrm{T}_{\sqrt{\rho}}f'({\boldsymbol{x}}))] – \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[\text{Sq}(\mathrm{T}_{\sqrt{\rho}} f'(\boldsymbol{g}))]\right| \leq O(\epsilon^{\delta/3}) = O\bigl(\tfrac{1}{\log(1/\epsilon)}\bigr), \end{equation} using our choice of $\delta$. (In fact, it’s trading off this error with \eqref{eqn:mist-tradeoff} that led to our choice of $\delta$.) Now $\mathrm{T}_{\sqrt{\rho}} f’({\boldsymbol{x}}) = \mathrm{T}_{(1-\delta)\sqrt{\rho}} f({\boldsymbol{x}})$ is always bounded in $[0,1]$, so \[ \text{Sq}(\mathrm{T}_{\sqrt{\rho}} f'({\boldsymbol{x}})) = (\mathrm{T}_{\sqrt{\rho}} f'({\boldsymbol{x}}))^2 \quad \implies \quad \mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[\text{Sq}(\mathrm{T}_{\sqrt{\rho}}f'({\boldsymbol{x}}))] = \mathbf{Stab}_\rho[f']. \] Furthermore, $\mathrm{T}_{\sqrt{\rho}} f’(\boldsymbol{g})$ is the same as $\mathrm{U}_{\sqrt{\rho}} f’(\boldsymbol{g})$ because $f’$ is a multilinear polynomial. (Both are equal to $f’(\rho \boldsymbol{g})$; see Fact 13.) Thus in light of \eqref{eqn:mist-halfway}, to complete the proof of \eqref{eqn:mist} it suffices to show \begin{equation} \label{eqn:mist-suffices} \left|\mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[\text{Sq}(\mathrm{U}_{\sqrt{\rho}} f'(\boldsymbol{g}))] – \Lambda_\rho(\mathop{\bf E}[f'])\right| \leq O\bigl(\tfrac{1}{\log(1/\epsilon)}\bigr). \end{equation}
Define the function $F : {\mathbb R}^n \to [0,1]$ by \[ F(g) = \mathrm{trunc}_{[0,1]}(f’(g)) = \begin{cases} 0 & \text{if $f’(g) < 0$,} \\ f’(g) & \text{if $f’(g) \in [0,1]$,}\\ 1 & \text{if $f’(g) >1$.} \end{cases} \] We will establish the following two inequalities, which together imply \eqref{eqn:mist-suffices}: \begin{gather} \left|\mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[\text{Sq}(\mathrm{U}_{\sqrt{\rho}} f'(\boldsymbol{g}))] – \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[\text{Sq}(\mathrm{U}_{\sqrt{\rho}} F(\boldsymbol{g}))]\right| \leq O\bigl(\tfrac{1}{\log(1/\epsilon)}\bigr), \label{eqn:mist1} \\ \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[\text{Sq}(\mathrm{U}_{\sqrt{\rho}} F(\boldsymbol{g}))] \leq \Lambda_\rho(\mathop{\bf E}[f']) + O\bigl(\tfrac{1}{\log(1/\epsilon)}\bigr). \label{eqn:mist2} \end{gather} Both of these inequalities will in turn follow from \begin{equation} \label{eqn:f’-to-F} \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[|f'(\boldsymbol{g}) - F(\boldsymbol{g})|] = \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[\mathrm{dist}_{[0,1]}(f’(\boldsymbol{g}))] \leq O\bigl(\tfrac{1}{\log(1/\epsilon)}\bigr). \end{equation} Let’s show how \eqref{eqn:mist1} and \eqref{eqn:mist2} follow from \eqref{eqn:f’-to-F}, leaving the proof of \eqref{eqn:f’-to-F} to the end. For \eqref{eqn:mist1}, \begin{align*} \left|\mathop{\bf E}[\text{Sq}(\mathrm{U}_{\sqrt{\rho}} f'(\boldsymbol{g}))] – \mathop{\bf E}[\text{Sq}(\mathrm{U}_{\sqrt{\rho}} F(\boldsymbol{g}))] \right| &\leq 2 \mathop{\bf E}[|\mathrm{U}_{\sqrt{\rho}} f'(\boldsymbol{g}) - \mathrm{U}_{\sqrt{\rho}} F(\boldsymbol{g})|]\\ &\leq 2 \mathop{\bf E}[|f'(\boldsymbol{g}) - F(\boldsymbol{g})|] \leq O\bigl(\tfrac{1}{\log(1/\epsilon)}\bigr), \end{align*} where the first inequality used that $\text{Sq}$ is $2$-Lipschitz, the second inequality used the fact that $\mathrm{U}_{\sqrt{\rho}}$ is a contraction on $L^1({\mathbb R}^n,\gamma)$, and the third inequality was \eqref{eqn:f’-to-F}. As for \eqref{eqn:mist2}, $\mathrm{U}_{\sqrt{\rho}} F$ is bounded in $[0,1]$ since $F$ is. Thus \[ \mathop{\bf E}[\text{Sq}(\mathrm{U}_{\sqrt{\rho}} F(\boldsymbol{g}))] = \mathop{\bf E}[(\mathrm{U}_{\sqrt{\rho}} F(\boldsymbol{g}))^2] = \mathbf{Stab}_\rho[F] \leq \Lambda_\rho(\mathop{\bf E}[F(\boldsymbol{g})]), \] where we used Borell’s Isoperimetric Theorem. But $|\mathop{\bf E}[F(\boldsymbol{g})] – \mathop{\bf E}[f'(\boldsymbol{g})]| \leq O\bigl(\tfrac{1}{\log(1/\epsilon)}\bigr)$ by \eqref{eqn:f’-to-F}, and $\Lambda_\rho$ is easily shown to be $2$-Lipschitz (exercise). This establishes \eqref{eqn:mist2}.
It therefore remains to show \eqref{eqn:f’-to-F}, which we do by applying the Invariance Principle one more time. Taking $\psi$ to be the $1$-Lipschitz function $\mathrm{dist}_{[0,1]}$ in Corollary 68 we deduce \begin{align*} \left|\mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[\mathrm{dist}_{[0,1]}(f’(\boldsymbol{g}))] – \mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[\mathrm{dist}_{[0,1]}(f’({\boldsymbol{x}}))]\right| \leq O(\epsilon^{\delta/3}) = O\bigl(\tfrac{1}{\log(1/\epsilon)}\bigr). \end{align*} But $\mathop{\bf E}[\mathrm{dist}_{[0,1]}f’({\boldsymbol{x}})] = 0$ since $f’({\boldsymbol{x}}) = \mathrm{T}_{1-\delta} f({\boldsymbol{x}}) \in [0,1]$ always. This establishes \eqref{eqn:f’-to-F} and completes the proof. $\Box$
We conclude with one more application of the Majority Is Stablest Theorem. Recall Kalai’s version of Arrow’s Theorem from Chapter 2.5, i.e., Theorem 2.55. It states that in a $3$-candidate Condorcet election using the voting rule $f : \{-1,1\}^n \to \{-1,1\}$, the probability of having a Condorcet winner — often called a rational outcome — is precisely $\tfrac34 – \tfrac34\mathbf{Stab}_{-1/3}[f]$. As we saw in the proof of Theorem 73 near \eqref{eqn:finish-kkmo}, this is in turn at most $\tfrac34 + \tfrac34\mathbf{Stab}_{1/3}[f^\mathrm{odd}]$, with equality if $f$ is already odd. It follows from the Majority Is Stablest Theorem that among all voting rules with $\epsilon$-small influences (a condition all reasonable voting rules should satisfy), majority rule is the “most rational”. Thus we see that the principle of representative democracy can be derived using analysis of Boolean functions.
]]>