Restating Borell’s theorem using rotation sensitivity [...]]]>
*Gaussian Isoperimetric Inequality*. This inequality is a special case of the Borell Isoperimetric Inequality (and hence also a special case of the General-Volume Majority Is Stablest Theorem); in particular, it’s the special case arising from the limit $\rho \to 1^{-}$.

Restating Borell’s theorem using rotation sensitivity we have that for any $A \subseteq {\mathbb R}^n$, if $H \subseteq {\mathbb R}^n$ is a halfspace with the same Gaussian volume as $A$ then for all $\epsilon$, \[ \mathbf{RS}_A(\epsilon) \geq \mathbf{RS}_H(\epsilon). \] Since $\mathbf{RS}_A(0) = \mathbf{RS}_H(0) = 0$, it follows that \[ \mathbf{RS}_A'(0^+) \geq \mathbf{RS}_H'(0^+). \] (Here we are considering the one-sided derivatives at $0$, which can be shown to exist, though $\mathbf{RS}_A’(0^+)$ may equal $+\infty$; see the notes at the end of this chapter.) As will be explained shortly, $\mathbf{RS}_A’(0^+)$ is precisely $\sqrt{2/\pi} \cdot \text{surf}_\gamma(A)$, where $\text{surf}_\gamma(A)$ denotes the “Gaussian surface area” of $A$. Therefore the above inequality is equivalent to the following:

Gaussian Isoperimetric InequalityLet $A \subseteq {\mathbb R}^n$ have $\mathrm{vol}_\gamma(A) = \alpha$ and let $H \subseteq {\mathbb R}^n$ be any halfspace with $\mathrm{vol}_\gamma(H) = \alpha$. Then $\text{surf}_\gamma(A) \geq \text{surf}_\gamma(H)$.

Remark 47As shown in Proposition 49 below, the right-hand side in this inequality is equal to $\mathcal{U}(\alpha)$, where $\mathcal{U}$ is theGaussian isoperimetric function, encountered earlier in Definition 5.23 and defined by $\mathcal{U} = \varphi \circ \Phi^{-1}$.

Let’s now discuss the somewhat technical question of how to properly define $\text{surf}_\gamma(A)$, the Gaussian surface area of a set $A$. Perhaps the most natural definition would be to equate it with the *Gaussian Minkowski content* of the boundary $\partial A$ of $A$, \begin{equation} \label{eqn:heuristic-gsa} \gamma^+(\partial A) = \liminf_{\epsilon \to 0^+} \frac{\mathrm{vol}_\gamma(\{z : \mathrm{dist}(z, \partial A) < \epsilon/2\})}{\epsilon}. \end{equation} (Relatedly, one might also consider the surface integral over $\partial A$ of the Gaussian pdf $\varphi$.) Under the “official” definition of $\text{surf}_\gamma(A)$ we give below in Definition 48, we’ll indeed have $\text{surf}_\gamma(A) = \gamma^+(\partial A)$ whenever $A$ is sufficiently nice — say, a disjoint union of closed, full-dimensional, convex sets. However, the Minkowski content definition is not a good one in general because it’s possible to have $\gamma^+(\partial A_1) \neq \gamma^+(\partial A_2)$ for some sets $A_1$ and $A_2$ that are equivalent up to measure $0$. (For more information, see an exercise and the notes at the end of this chapter.) As mentioned above, one “correct” definition is $\text{surf}_\gamma(A) = \sqrt{\pi/2} \cdot \mathbf{RS}_A’(0^+)$. This definition has the advantage of being insensitive to measure-$0$ changes to $A$. To connect this unusual-looking definition with Minkowski content, let’s heuristically interpret $\mathbf{RS}_A’(0^+)$. We start by thinking of it as $\frac{\mathbf{RS}_A(\epsilon)}{\epsilon}$ for “infinitesimal $\epsilon$”. Now $\mathbf{RS}_A(\epsilon)$ can be thought of as the probability that the line segment $\boldsymbol{\ell}$ joining two $\cos \epsilon$-correlated Gaussians crosses $\partial A$. Since $\sin \epsilon \approx \epsilon$, $\cos \epsilon \approx 1$ up to $O(\epsilon^2)$, we can think of these correlated Gaussians as $\boldsymbol{g}$ and $\boldsymbol{g} + \epsilon \boldsymbol{g}’$ for independent $\boldsymbol{g}, \boldsymbol{g}’ \sim \mathrm{N}(0,1)^n$. When $\boldsymbol{g}$ lands near $\partial A$, the length of $\boldsymbol{\ell}$ in the direction perpendicular to $\partial A$ will, in expectation, be $\epsilon \mathop{\bf E}[|\mathrm{N}(0,1)|] = \sqrt{2/\pi} \epsilon$. Thus $\mathbf{RS}_A(\epsilon)$ should essentially be $\sqrt{2/\pi} \epsilon \cdot \mathrm{vol}_\gamma(\{z : \mathrm{dist}(z, \partial A) < \epsilon/2\})$ and we have heuristically justified \begin{equation} \label{eqn:heur-gsa2} \sqrt{\pi/2} \cdot \mathbf{RS}_A’(0^+) = \sqrt{\pi/2} \cdot \lim_{\epsilon \to 0^+} \frac{\mathbf{RS}_A(\epsilon)}{\epsilon}\ \mathop{=}^?\ \gamma^+(\partial A). \end{equation}

One more standard idea for the definition of $\text{surf}_\gamma(A)$ is “$\mathop{\bf E}[\|\nabla 1_A\|]$”. This doesn’t quite make sense since $1_A \in L^1({\mathbb R}^n,\gamma)$ is not actually differentiable. However, we might consider replacing it with the limit of $\mathop{\bf E}[\|\nabla f_m\|]$ for a sequence $(f_m)$ of smooth functions approximating $1_A$. To see why this notion should agree with the Gaussian Minkowski content $\gamma^+(\partial A)$ for nice enough $A$, let’s suppose we have a smooth approximator $f$ to $1_A$ that agrees with $1_A$ on ${\{z : \mathrm{dist}(z, \partial A) \geq \epsilon/2\}}$ and is (essentially) a linear function on $\{z : \mathrm{dist}(z, \partial A) < \epsilon/2\}$. Then $\|\nabla f\|$ will be $0$ on the former set and (essentially) constantly $1/\epsilon$ on the latter (since it must climb from $0$ to $1$ over a distance of $\epsilon$). Thus we indeed have \[ \mathop{\bf E}[\|\nabla f\|] \approx \frac{\mathrm{vol}_\gamma(\{z : \mathrm{dist}(z, \partial A) < \epsilon/2\})}{\epsilon} \approx \gamma^+(\partial A), \] as desired. We summarize the above technical discussion with the following definition/theorem, which is discussed further in the notes at the end of this chapter:

Definition 48For any $A \subseteq {\mathbb R}^n$, we define itsGaussian surface areato be \[ \text{surf}_\gamma(A) = \sqrt{\pi/2} \cdot \mathbf{RS}_A'(0^+) \in [0, \infty]. \] An equivalent definition is \[ \text{surf}_\gamma(A) = \inf\left\{\liminf_{m \to \infty} \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[\|\nabla f_m(\boldsymbol{z})\|]\right\}, \] where the infimum is over all sequences $(f_m)_{m \in {\mathbb N}}$ of smooth $f_m : {\mathbb R}^n \to [0,1]$ with first partial derivatives in $L^2({\mathbb R}^n,\gamma)$ such that $\|f_m – 1_A\|_1 \to 0$. Furthermore, this infimum is actually achieved by taking $f_m = \mathrm{U}_{\rho_m} f$ for any sequence $\rho_m \to 1^{-}$. Finally, the equality $\text{surf}_\gamma(A) = \gamma^+(\partial A)$ with Gaussian Minkowski content holds if $A$ is a disjoint union of closed, full-dimensional, convex sets.

To get further acquainted with this definition, let’s describe the Gaussian surface area of some basic sets. We start with halfspaces, which as mentioned in Remark 47 have Gaussian surface area given by the Gaussian isoperimetric function.

Proposition 49Let $H \subseteq {\mathbb R}^n$ be any halfspace (open or closed) with $\mathrm{vol}_\gamma(H) = \alpha \in (0,1)$. Then $\text{surf}_\gamma(H) = \mathcal{U}(\alpha) = \varphi(\Phi^{-1}(\alpha))$. In particular, if $\alpha = 1/2$ — i.e., $H$’s boundary contains the origin — then $\text{surf}_\gamma(H) = \frac{1}{\sqrt{2\pi}}$.

*Proof:* Just as in the proof of Corollary 20, by rotational symmetry we may assume $H$ is a $1$-dimensional halfline, $H = (-\infty, t]$. Since $\mathrm{vol}_\gamma(H) = \alpha$, we have $t = \Phi^{-1}(\alpha)$. Then $\text{surf}_\gamma(H)$ is equal to \[ \gamma^+(\partial H) = \lim_{\epsilon \to 0^+} \frac{\mathrm{vol}_\gamma(\{z \in {\mathbb R} : \mathrm{dist}(z,\partial H) < \tfrac{\epsilon}{2}\})}{\epsilon} = \lim_{\epsilon \to 0^+} \frac{\int_{t-\epsilon/2}^{t+\epsilon/2} \varphi(s)\,ds}{\epsilon} = \varphi(t) = \mathcal{U}(\alpha). \quad \Box \]

Here are some more Gaussian surface area bounds:

Examples 50In an exercise, you are asked to generalize the above computation and show that if $A \subseteq {\mathbb R}$ is the union of disjoint nondegenerate intervals $[t_1, t_2], [t_3, t_4], \dots, [t_{2m-1}, t_{2m}]$ then $\text{surf}_\gamma(A) = \sum_{i=1}^{2m} \varphi(t_i)$. Perhaps the next easiest example is when $A \subseteq {\mathbb R}^n$ is an origin-centered ball; Ball[Bal93]gave an explicit formula for $\text{surf}_\gamma(A)$ in terms of the dimension and radius, one which is always less than $\sqrt{\frac{2}{\pi}}$ (see the exercises). This upper bound was extended to non-origin-centered balls in Klivans et al.[KOS08]. Ball also showed that every convex set $A \subseteq {\mathbb R}^n$ satisfies $\text{surf}_\gamma(A) \leq O(n^{1/4})$; Nazarov[Naz03]showed that this bound is tight up to the constant, using a construction highly reminiscent of Talagrand’s Exercise 4.15. As noted in Klivans et al.[KOS08], Nazarov’s work also immediately implies that an intersection of $k$ halfspaces has Gaussian surface area at most $O(\sqrt{\log k})$ (tight for appropriately sized cubes in ${\mathbb R}^k$), and that any cone in ${\mathbb R}^n$ with apex at the origin has Gaussian surface area at most $1$. Finally, by proving the “Gaussian special case” of the Gotsman–Linial Conjecture, Kane[Kan11a]established that if $A \subseteq {\mathbb R}^n$ is a degree-$k$ “polynomial threshold function” — i.e., $A = \{z : p(z) > 0\}$ for $p$ an $n$-variate degree-$k$ polynomial — then $\text{surf}_\gamma(A) \leq \frac{k}{\sqrt{2 \pi}}$. This is tight for every $k$ (even when $n = 1$).

Though we’ve shown that the Gaussian Isoperimetric Inequality follows from Borell’s Isoperimetric Theorem, we now discuss some alternative proofs. In the special case of sets of Gaussian volume $\tfrac{1}{2}$, we can again get a very simple proof using the subadditivity property of Gaussian rotation sensitivity, Theorem 43. That result easily yields the following kind of “concavity property” concerning Gaussian surface area:

Theorem 51Let $A \subseteq {\mathbb R}^n$. Then for any $\delta > 0$, \[ \sqrt{\pi/2} \cdot \frac{\mathbf{RS}_A(\delta)}{\delta} \leq \text{surf}_\gamma(A). \]

*Proof:* For $\delta > 0$ and $\epsilon = \delta/\ell$, $\ell \in {\mathbb N}^+$, Theorem 43 is equivalent to \[ \frac{\mathbf{RS}_A(\delta)}{\delta} \leq \frac{\mathbf{RS}_A(\epsilon)}{\epsilon}. \] Taking $\ell \to \infty$ hence $\epsilon \to 0^+$, the right-hand side becomes $\mathbf{RS}_A(0^+) = \sqrt{2/\pi} \cdot \text{surf}_\gamma(A)$. $\Box$

If we take $\delta = \pi/2$ in this theorem, the left-hand side becomes \[ \sqrt{2/\pi} \mathop{\bf Pr}_{\substack{\boldsymbol{z}, \boldsymbol{z}' \sim \mathrm{N}(0,1)^n \\ \text{independent}}}[1_A(\boldsymbol{z}) \neq 1_A(\boldsymbol{z}')] = 2\sqrt{2/\pi} \cdot \mathrm{vol}_\gamma(A)(1-\mathrm{vol}_\gamma(A)). \] Thus we obtain a simple proof of the following result, which includes the Gaussian Isoperimetric Inequality in the volume-$\frac12$ case:

Theorem 52Let $A \subseteq {\mathbb R}^n$. Then \[ 2\sqrt{2/\pi} \cdot \mathrm{vol}_\gamma(A)(1-\mathrm{vol}_\gamma(A)) \leq \text{surf}_\gamma(A). \] In particular, if $\mathrm{vol}_\gamma(A) = \frac12$, then we get the tight Gaussian Isoperimetric Inequality statement $\text{surf}_\gamma(A) \geq \frac{1}{\sqrt{2\pi}} = \mathcal{U}(\tfrac{1}{2})$.

As for the full Gaussian Isoperimetric Inequality, it’s a pleasing fact that it can be derived by pure analysis of Boolean functions. This was shown by Bobkov **[Bob97]**, who proved the following very interesting isoperimetric inequality about Boolean functions:

Bobkov’s InequalityLet $f : \{-1,1\}^n \to [0,1]$. Then \begin{equation} \label{eqn:bobkov} \mathcal{U}(\mathop{\bf E}[f]) \leq \mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}\left[\|(\mathcal{U}(f({\boldsymbol{x}})), \nabla f({\boldsymbol{x}}))\|\right]. \end{equation} Here $\nabla f$ is the discrete gradient (as in Definition 2.33) and $\| \cdot \|$ is the usual Euclidean norm (in ${\mathbb R}^{n+1}$). Thus to restate the inequality, \[ \mathcal{U}(\mathop{\bf E}[f]) \leq \mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}\left[\sqrt{\mathcal{U}(f({\boldsymbol{x}}))^2 + \mathop{{\textstyle \sum}}_{i=1}^n \mathrm{D}_if({\boldsymbol{x}})^2}\right]. \] In particular, suppose $f = 1_A$ is the $0$-$1$ indicator of a subset $A \subseteq \{-1,1\}^n$. Then since $\mathcal{U}(0) = \mathcal{U}(1) = 0$ we obtain $\mathcal{U}(\mathop{\bf E}[1_A]) \leq \mathop{\bf E}[\|\nabla 1_A\|]$.

As Bobkov noted, by the usual Central Limit Theorem argument one can straightforwardly obtain inequality \eqref{eqn:bobkov} in the setting of functions $f \in L^2({\mathbb R}^n, \gamma)$ with range $[0,1]$, provided $f$ is sufficiently smooth (for example, if $f$ is in the domain of $\mathrm{L}$; see the exercises). Then given $A \subseteq {\mathbb R}^n$, by taking a sequence of smooth approximations to $1_A$ as in Definition 48, the Gaussian Isoperimetric Inequality $\mathcal{U}(\mathop{\bf E}[1_A]) \leq \text{surf}_\gamma(A)$ is recovered. Given $A \subseteq \{-1,1\}^n$ we can write the quantity $\mathop{\bf E}[\|\nabla 1_A\|]$ appearing in Bobkov’s Inequality as \begin{equation} \label{eqn:boolean-surface-area} \mathop{\bf E}[\|\nabla 1_A\|] = \tfrac12 \cdot \mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}\bigl[\sqrt{\mathrm{sens}_A({\boldsymbol{x}})}\bigr], \end{equation} using the fact that for $1_A : \{-1,1\}^n \to \{0,1\}$ we have \[ \mathrm{D}_i1_A({\boldsymbol{x}})^2 = \tfrac14\cdot \boldsymbol{1}[\text{coordinate $i$ is pivotal for $1_A$ on~$x$}]. \] The quantity in \eqref{eqn:boolean-surface-area} — (half of) the expected square-root of the number of pivotal coordinates — is an interesting possible notion of “Boolean surface area” for sets $A \subseteq \{-1,1\}^n$. It was first essentially proposed by Talagrand **[Tal93]**. By Cauchy–Schwarz it’s upper-bounded by (half of) the square-root of our usual notion of boundary size, average sensitivity: \begin{equation} \label{eqn:tal-surf-vs-surf} \mathop{\bf E}[\|\nabla 1_A\|] \leq \sqrt{\mathop{\bf E}[\|\nabla 1_A\|^2]} = \sqrt{\mathbf{I}[1_A]}. \end{equation} (Note that $\mathbf{I}[1_A]$ here is actually one quarter of the average sensitivity of $A$, because we’re using $0$-$1$ indicators as opposed to $\pm 1$). But the inequality in \eqref{eqn:tal-surf-vs-surf} is often far from sharp. For example, while the majority function has average sensitivity $\Theta(\sqrt{n})$, the expected square-root of its sensitivity is $\Theta(1)$ because a $\Theta(1/\sqrt{n})$-fraction of strings have sensitivity $\lceil n/2 \rceil$ and the remainder have sensitivity $0$. Let’s turn to the proof of Bobkov’s Inequality. As you are asked to show in the exercises, the general-$n$ case of Bobkov’s Inequality follows from the $n = 1$ case by a straightforward “induction by restrictions”. Thus just as in the proof of the Hypercontractivity Theorem, it suffices to prove the $n = 1$ “two-point inequality”, an elementary inequality about two real numbers:

Bobkov’s Two-Point InequalityLet $f : \{-1,1\} \to [0,1]$. Then \[ \mathcal{U}(\mathop{\bf E}[f]) \leq \mathop{\bf E}[\|(\mathcal{U}(f), \nabla f)\|]. \] Writing $f(x) = a + bx$, this is equivalent to saying that provided $a \pm b \in [0,1]$, \[ \mathcal{U}(a) \leq \tfrac{1}{2} \|(\mathcal{U}(a+b), b)\| + \tfrac{1}{2} \|(\mathcal{U}(a-b), b)\|. \]

Remark 53The only property of $\mathcal{U}$ used in proving this inequality is that it satisfies (Exercise 5.41) the differential equation $\mathcal{U} \mathcal{U}” = -1$ on $(0,1)$.

Bobkov’s proof of the two-point inequality was elementary but somewhat long and hard to motivate. In contrast, Barthe and Maurey **[BM00a]** gave a fairly short proof of the inequality, but it used methods from stochastic calculus, namely Itô’s Formula. We present here an elementary discretization of the Barthe–Maurey proof.

*Proof of Bobkov’s Two-Point Inequality:* By symmetry and continuity we may assume $\delta \leq a-b < a+b \leq 1-\delta$ for some $\delta > 0$. Let $\tau = \tau(\delta) > 0$ be a small quantity to be chosen later such that $b/\tau$ is an integer. Let $\boldsymbol{y}_0, \boldsymbol{y}_1, \boldsymbol{y}_2, \dots$ be a random walk within $[a-b, a+b]$ that starts at $\boldsymbol{y}_0 = a$, takes independent equally likely steps of $\pm \tau$, and is absorbed at the endpoints $a \pm b$. Finally, for $t \in {\mathbb N}$, define $\boldsymbol{z}_t = \|(\mathcal{U}(\boldsymbol{y}_t), \tau \sqrt{t})\|$. The key claim for the proof is:

Claim 54Assuming $\tau = \tau(\delta) > 0$ is small enough, $(\boldsymbol{z}_t)_t$ is a submartingale with respect to $(\boldsymbol{y}_t)_t$, i.e., $\mathop{\bf E}[\boldsymbol{z}_{t+1} \mid \boldsymbol{y}_0, \dots, \boldsymbol{y}_t] = \mathop{\bf E}[\boldsymbol{z}_{t+1} \mid \boldsymbol{y}_t] \geq \boldsymbol{z}_t$.

Let’s complete the proof given the claim. Let ${\boldsymbol{T}}$ be the stopping time at which $\boldsymbol{y}_t$ first reaches $a\pm b$. By the Optional Stopping Theorem we have $\mathop{\bf E}[\boldsymbol{z}_0] \leq \mathop{\bf E}[\boldsymbol{z}_{\boldsymbol{T}}]$; i.e., \begin{equation} \label{eqn:submartingale-deduction} \mathcal{U}(a) \leq \mathop{\bf E}[\|(\mathcal{U}(\boldsymbol{z}_{\boldsymbol{T}}), \tau\sqrt{{\boldsymbol{T}}})\|]. \end{equation} In the expectation above we can condition on whether the walk stopped at $a+b$ or $a-b$. By symmetry, both events occur with probability $1/2$ and neither changes the conditional distribution of ${\boldsymbol{T}}$. Thus we get \begin{align*} \mathcal{U}(a) &\leq \tfrac{1}{2}\mathop{\bf E}[\|(\mathcal{U}(a+b), \tau \sqrt{{\boldsymbol{T}}})\|] + \tfrac{1}{2}\mathop{\bf E}[\|(\mathcal{U}(a-b), \tau \sqrt{{\boldsymbol{T}}})\|] \\ &\leq \tfrac{1}{2} \|(\mathcal{U}(a+b), \sqrt{\mathop{\bf E}[\tau^2{\boldsymbol{T}}]})\| + \tfrac{1}{2}\|(\mathcal{U}(a-b), \sqrt{\mathop{\bf E}[\tau^2{\boldsymbol{T}}]})\|, \end{align*} with the second inequality using concavity of $v \mapsto \sqrt{u^2 + v}$. But it’s a well-known fact (exercise) that $\mathop{\bf E}[{\boldsymbol{T}}] = (b/\tau)^2$. Substituting this into the above completes the proof.

It remains to verify Claim 54. Actually, although the claim is true as stated (see the exercises) it will be more natural to prove the following slightly weaker claim: \begin{equation} \label{eqn:weaker-submartingale} \mathop{\bf E}[\boldsymbol{z}_{t+1} \mid \boldsymbol{y}_t] \geq \boldsymbol{z}_t – C_\delta \tau^3 \end{equation} for some constant $C_\delta$ depending only on $\delta$. This is still enough to complete the proof: Applying the Optional Stopping Theorem to the submartingale ${(\boldsymbol{z}_t + C_\delta \tau^3 t)_t}$ we get that \eqref{eqn:submartingale-deduction} holds up to an additive $C_\delta \tau^3 \mathop{\bf E}[{\boldsymbol{T}}] = C_\delta b^2 \tau$. Then continuing with the above we deduce Bobkov’s Inequality up to $C_\delta b^2 \tau$, and we can make $\tau$ arbitrarily small.

Even though we only need to prove \eqref{eqn:weaker-submartingale}, let’s begin a proof of the original Claim 54 anyway. Fix $t \in {\mathbb N}^+$ and condition on $\boldsymbol{y}_t = y$. If $y$ is $a \pm b$, then the walk is stopped and the claim is clear. Otherwise, $\boldsymbol{y}_{t+1}$ is $y \pm \tau$ with equal probability, and we want to verify the following inequality (assuming $\tau > 0$ is sufficiently small as a function of $\delta$, independent of $y$): \begin{align} \|(\mathcal{U}(y), \tau \sqrt{t})\|&\leq \tfrac{1}{2} \|(\mathcal{U}(y+\tau), \tau\sqrt{t+1})\| + \tfrac{1}{2} \|(\mathcal{U}(y-\tau), \tau\sqrt{t+1})\| \label{eqn:bobkov-verify}\\ &= \tfrac{1}{2} \bigl\|\bigl(\sqrt{\mathcal{U}(y+\tau)^2 + \tau^2}, \tau\sqrt{t}\bigr)\bigr\| + \tfrac{1}{2} \bigl\|\bigl(\sqrt{\mathcal{U}(y-\tau)^2 + \tau^2}, \tau\sqrt{t}\bigr)\bigr\|. \nonumber \end{align} By the triangle inequality, it’s sufficient to show \[ \mathcal{U}(y) \leq \tfrac{1}{2} \sqrt{\mathcal{U}(y+\tau)^2 + \tau^2} + \tfrac{1}{2} \sqrt{\mathcal{U}(y-\tau)^2 + \tau^2}, \] and this is actually necessary too, being the $t = 0$ case of \eqref{eqn:bobkov-verify}. Finally, since we actually only need the weakened submartingale statement \eqref{eqn:weaker-submartingale}, we’ll instead establish \begin{equation} \label{eqn:bobkov-taylor} \mathcal{U}(y) – C_\delta \tau^3 \leq \tfrac{1}{2} \sqrt{\mathcal{U}(y+\tau)^2 + \tau^2} + \tfrac{1}{2} \sqrt{\mathcal{U}(y-\tau)^2 + \tau^2} \end{equation} for some constant $C_\delta$ depending only on $\delta$ and for every $\tau \leq \frac{\delta}{2}$. We do this using Taylor’s theorem. Write $V_y(\tau)$ for the function of $\tau$ on the right-hand side of \eqref{eqn:bobkov-taylor}. For any $y \in [a-b,a+b]$ the function $V_y$ is smooth on $[0,\frac{\delta}{2}]$ because $\mathcal{U}$ is a smooth, positive function on $[\frac{\delta}{2}, 1 - \frac{\delta}{2}]$. Thus \[ V_y(\tau) = V_y(0) + V_y'(0)\tau + \tfrac12 V_y''(0)\tau^2 + \tfrac16 V_y'''(\xi) \tau^3 \] for some $\xi$ between $0$ and $\tau$. The magnitude of $V_y”’(\xi)$ is indeed bounded by some $C_\delta$ depending only on $\delta$, using the fact that $\mathcal{U}$ is smooth and positive on $[\frac{\delta}{2}, 1 - \frac{\delta}{2}]$. But $V_y(0) = \mathcal{U}(y)$, and it’s straightforward to calculate that \[ V_y'(0) = 0, \qquad V_y''(0) = \mathcal{U}''(y) + 1/\mathcal{U}(y) = 0, \] the last identity used the key property $\mathcal{U}” = -1/\mathcal{U}$ mentioned in Remark 53. Thus we conclude $V_y(\tau) \geq \mathcal{U}(y) – C_\delta \tau^3$, verifying \eqref{eqn:bobkov-taylor} and completing the proof. $\Box$

As a matter of fact, by a minor adjustment (see the exercises) to this random walk argument we can establish the following generalization of Bobkov’s Inequality:

Theorem 55Let $f : \{-1,1\}^n \to [0,1]$. Then $\mathop{\bf E}[\|(\mathcal{U}(\mathrm{T}_\rho f), \nabla \mathrm{T}_\rho f)\|]$ is an increasing function of $\rho \in [0,1]$. We recover Bobkov’s Inequality by considering $\rho = 0, 1$.

We end this section by remarking that De, Mossel, and Neeman **[DMN13]** have given a “Bobkov-style” Boolean inductive proof that yields both Borell’s Isoperimetric Theorem and also the Majority Is Stablest Theorem (albeit with some aspects of the Invariance Principle-based proof appearing in the latter case); see an exercise and the notes at the end of this chapter.

- (Boaz Barak) Is there an efficient algorithm for finding sparse vectors in subspaces $V\subseteq \mathbb{R}^n$? Here by sparsity Boaz has in mind vectors $v\in V$ whose number of non-zero coordinates is $o(n)$, though he is also interested in various approximations including \[ \mathbb{E}\big[v_i^4\big] \gg \mathbb{E}\big[v_i^2\big]^2. \]Relatedly, what can we say about polynomials $p$ over Gaussian space that satisfy \[ \mathbb{E}\big[p(\boldsymbol{x})^4\big] \gg \mathbb{E}\big[p(\boldsymbol{x})^2\big]^2?\] Must $p$ be correlated to the product of linear functions? (Rafał mentions that this is

- (Ehud Friedgut) A $t$-coset in $S_n$ is a coset of the stabilizer of $t$ points. Is every partition of $S_n$ into $t$-cosets a refinement of a partition into $(t-1)$-cosets?

- (Avi Wigderson) Let $T \in \mathbb{R}^{n\times n\times n\times n}$ be a 4-dimensional tensor $T = \sum \alpha_{ijk\ell} x_i y_j z_k w_\ell$. Define $\mathrm{rank}_{xy,zw}(T)$ to be the rank of the $n^2\times n^2$ matrix whose entry in the $(i,j)$-th row and $(k,\ell)$-th column is $\alpha_{ijk\ell}$, and likewise $\mathrm{rank}_{yz,wx}(T)$ to be the rank of the matrix whose entry in the $(j,k)$-th row and $(\ell,i)$-th column is $\alpha_{ijk\ell}$. The problem is the find an explicit tensor $T$ such that whenever $T = T_1 + T_2$ and $\mathrm{rank}_{xy,zw}(T_1) \le r$ and $\mathrm{rank}_{yz,wx}(T_2) \le r$, then $r \ge n^{1+\epsilon}$. Avi suggests the super-simple candidate $T_0 = \sum_{i,j} x_i y_j z_i w_j$. This has implications for lower bounds in non-commutative computation, and in particular, such an explicit tensor would yield an exponential lower bound against the size of non-commutative arithmetic circuits computing permanent; see Avi’s paper with Pavel Hrubeš and Amir Yehudayoff for more details.

- (Raghu Meka) Let $A_1,\ldots,A_n \in \mathbb{R}^{n\times n}$ be symmetric matrices satisfying $\| A_i \| \le 1$ for all $i\in [n]$. Prove that\[ \Pr\Big[\Big\| \sum \boldsymbol{g}_i A_i \Big\| \le 1000\sqrt{n} \Big] \ge 2^{-\Omega(n)}, \]where the probability is with respect to the $\boldsymbol{g}_i$’s which are independent standard Gaussians. An affirmative answer would have applications to the conjectured Matrix Spencer Theorem.

- (Yuval Filmus) Let $f:\{0,1\}^n \to [-1,1]$ be a bounded function over the hypercube, and suppose the Fourier degree of $f$ is at most $d$. Define the $L_1$-influence of $f$ to be\[ \mathrm{Inf}[f] := \sum_{i=1}^n \mathop{\mathbb{E}}_{\boldsymbol{x}\in\{0,1\}^n} \bigg[\bigg| \frac{f(\boldsymbol{x})-f(\boldsymbol{x}\oplus e_i)}{2}\bigg|\bigg]. \]Is it true that $\mathrm{Inf}[f] \le d$ for all $f$ (which would be tight by considering the parity function on $d$ variables)? Recent work of Artūrs Bačkurs and Mohammad Bavarian prove a bound of $O(d^3\log d)$, answering a question of Aaronson and Ambainis. This has been improved by Yuval (in joint work with Hamed Hatami, Guy Kindler, and Elchanan) to $d^2$. (Note that for Boolean-valued $f: \{0,1\}^n \to \{-1,1\}$ this definition of $L_1$-influence agrees with the slightly more standard notion of influence (i.e. “$L_2$-influence”), and in this case we do indeed have the bound $\mathrm{Inf}[f] \le \mathrm{deg}[f]$.)

- (Yuval Peres) Let $A \subseteq \mathbb{Z}_n$ and suppose $|A| = \Theta(n^{\alpha})$ for some $\alpha$. Define the quantity\[ K_A := \min\left\{ k \colon \exists\, x_1,\ldots,x_k \text{ such that } \bigcup_{i=1}^k\, x_i + A = \mathbb{Z}_n\right\}.\]It is known that\[ n^{1-\alpha} \lesssim K_A \lesssim n^{1-\alpha}\log n, \]and there are examples of sets $A$ showing that both inequalities are tight. Yuval’s question is: what is the magnitude of $K_A$ for a
*random*set $A$? Ben Green has shown that for all $\alpha > 1/2$, there exists a constant $C_\alpha$ such that $K_A \ge C_\alpha n^{1-\alpha}\log n$ for a random set $A$.

- (Raghu Meka) Let $G$ be the set of all permutations on $n^2$ elements, and consider the action of this group on balanced strings $x\in \{0,1\}^{n^2}$. Let $S$ be an expanding set of generators for $G$ and consider a random walk of length $t = O(\log(n/\epsilon))$ on $\{0,1\}^{n^2}$ obtained by applying random elements of $S$ iteratively on a fixed balanced string in $\{0,1\}^{n^2}$. If $\boldsymbol{y}\in \{0,1\}^{n^2}$ is the resulting string after such a walk (with worst-case starting point), is it true that $\sum_{i=1}^n \boldsymbol{y}_i$ is $O(\epsilon)$ close in TV-distance to $\mathrm{Bin}(n,1/2)$? You can change $n^{2}$ to $n^{100}$ if you think it helps.

- (Thomas Vidick) Consider $m$ equations $\varphi_k(x) = x_ix_j$ and signs $\epsilon_k \in \{-1,1\}^n$ for $k \in [m]$. The Max-2LIN problem of computing\[ \mathop{\max_{x_i\in\mathbb{R}}}_{x_i^2 = 1} \left\{\frac1{m} \sum_{k=1}^m \mathbb{1}[\varphi_k(x) = \epsilon_k]\right\} \]is NP-hard. However, when the variables $x_i \in \mathbb{R}^n$ are replaced with orthogonal matrices in the following way:\[ \mathop{\max_{X_i\in\mathbb{R}^{d\times d}}}_{X_i^2 = \mathrm{Id}} \left\{\frac1{m} \sum_{k=1}^m \mathbb{1}\Big[\frac1{d}\mathrm{Tr}(\varphi_k(X)) = \epsilon_k\Big]\right\},\]this quantity can be computed in polynomial time using SDPs. Can the same be done for other predicates, e.g. $\varphi_k(X) = X_i X_j X_k$?

- (Li-Yang Tan) Find an explicit $f:\{0,1\}^n\to\{0,1\}$ such that any $g:\{0,1\}^n\to\{0,1\}$ satisfying $\Pr[f(\boldsymbol{x})\ne g(\boldsymbol{x})] \le \epsilon$ requires DNFs of size $\Omega_\epsilon(2^n/\mathrm{poly}(n))$ to compute. (This is true for a random function $f$.) Ryan and Karl Wimmer proved that MAJORITY can be $\epsilon$-approximated by a DNF of size $2^{O_\epsilon(\sqrt{n})}$; subsequently Eric Blais and I showed that PARITY can be $\epsilon$-approximated by a DNF of size $2^{(1-2\epsilon)n} = O_\epsilon(2^n/\mathrm{exp}(n))$. A candidate explicit hard function $f$ for this problem is a random symmetric function.

- (Ryan O’Donnell, sharing an open problem due to Mendel and Naor) Let $f:\{-1,1\}^n \to \mathbb{R}$ where $\hat{f}(S) = 0$ for all $|S| < k$. The inequality \[ \| T_\rho f \|_2 \le \rho^k \|f\|_2, \] is a straightforward consequence of the Fourier expansion $T_\rho f(x) = \sum_{S\subseteq [n]} \rho^{|S|}\hat{f}(S) \chi_S(x)$, but how about analogous inequalities for higher norms? For example, is it true that\[ \| T_\rho f \|_4 \le \rho^{0.001k} \| f\|_4? \]The problem is open even for $k=1$!

- (Avi Wigderson) Let $f:\{0,1\}^n \to [-1,1]$ be a bounded function satisfying the three conditions: (1) $\mathbb{E}[f(\boldsymbol{x})] = 0$, (2) $\sum |f(x)| = 1$, and (3) $|\mathrm{supp}(f)| \le m$. Prove that there exists $S \subseteq [n]$ of size $O(\log m)$ such that $|\hat{f}(S)| \ge m^{-O(1)}2^{-n}$. Avi and Amir Yehudayoff prove a lower bound of $|\hat{f}(S)| \ge m^{-O(\log m)}2^{-n}$ in this paper.

- (Yuval Peres) Let $A$ be a stochastic, symmetric positive-definite matrix. For a given $k \in \mathbb{Z}^+$, let $(i_k, j_k)$ be the position of the smallest entry of $A^k$. Conjecture: for all $\ell < k$ it holds that $A^\ell[i_k, j_k] \leq A^k[i_k, j_k]$. In some sense, “if it’s very hard to get from vertex $i$ to vertex $j$ in $k$ steps, then it’s even harder to do it in fewer steps”. The conjecture is open even for $3 \times 3$ matrices.

- (Krzysztof Oleszkiewicz) Let $F$ be a normed linear space, and say that $A \subseteq F$ is $\delta$-separate if $\| x – y \|\ge \delta$ for all distinct $x,y\in A$. Let $A,B \subseteq F$ be two non-empty, $1$-separate subsets. Prove that there always exists $C \subseteq A+B$ of size $|C| \ge |A|+|B|-1$ such that $C$ is $\delta$-separate for some universal constant $\delta > 0$.

- (Rafał Latała) [Might not have transcribed this one correctly...] Let $\boldsymbol{X}_t$ be a stochastic process and suppose $|T| \leq \exp(p)$. Then it’s not hard to show that

\[

{\bf E}[\max_{t \in T} \boldsymbol{X}_t] \lesssim \max\{{\bf E}[|\boldsymbol{X}_t|^p]^{1/p}\}.

\]

The question is when this can be reversed. In other words, continuing to assume $|T| \leq \exp(p)$, suppose that for all $t, s \in T$ it holds that ${\bf E}[|\boldsymbol{X}_t - \boldsymbol{X}_s|^p]^{1/p} \geq C$. Under what circumstances do we also have ${\bf E}[\max_{t \in T} \boldsymbol{X}_t] \gtrsim C$? It seems that for Gaussian processes it follows by Sudakov Minoration. What about the case where $\boldsymbol{X}_t = \sum_{i=1}^n t_i \boldsymbol{Y}_i$, where $\boldsymbol{Y}$ is uniform on some convex set $K \subseteq \mathbb{R}^n$? In this case, if $K$ is a cube, the implication follows from work of Talagrand.

- (Boaz Barak) Prove that there exists $\epsilon,\delta > 0$ such that for every graph $G$ on $N$ vertices containing at least $\mu N^3$ many triangles, the number of $K_{2,2,2}$’s in $G$ is at least $\delta\mu^{8-\epsilon}N^6$. (Iterated Cauchy-Schwarz gives a lower bound of $\mu^8N^6$.) There are potential applications to lower bounds in the NOF model of communication complexity.

- (Avi Wigderson, sharing an open problem due to him and Oded Goldreich) Suppose we are given an efficiently computable $f:\{0,1\}^n \to\{ \mathrm{Good},\mathrm{Bad}\}$ (e.g. $f$ is computable by a constant-degree polynomial), with the promise that $|f^{-1}(\mathrm{Bad})| \le 2^n/3$. Our goal is to efficiently find an $x\in f^{-1}(\mathrm{Good})$. (Avi refers to this as the problem of “finding hay in the haystack”.) Certainly we can easily accomplish this with a randomized algorithm; the key is to do it deterministically. As it turns out this derandomization problem remains equivalent even if we are given the significantly stronger promise that $|f^{-1}(\mathrm{Bad})| \le 2^{n^{\epsilon}}$. Avi and Oded’s question is: what if $|f^{-1}(\mathrm{Bad})| \le n^{\log n}$? Note that there is a trivial deterministic algorithm that runs in quasi-polynomial time, since any set of $n^{\log n} + 1$ many inputs must contain an $x$ such that $f(x) = \mathrm{Good}$.

- (Raghu Meka) Let $f_1,\ldots,f_n$ be linear threshold functions where $f_i$ does not depend on $x_i$ for all $i\in [n]$. Is it true that \[ \mathop{\mathbb{E}}_{\boldsymbol{x}\sim\{-1,1\}^n}\bigg[\bigg| \sum_{i=1}^n \boldsymbol{x}_i f_i\bigg|\bigg]^2 = O(n)? \] An affirmative answer would have applications to the Gotsman–Linial conjecture on the average sensitivity of polynomial threshold functions. For
*arbitrary*Boolean functions $f_1,\ldots,f_n$, where again $f_i$ does not depend on $x_i$ for all $i\in [n]$, it is known that \[ \mathop{\mathbb{E}}_{\boldsymbol{x}\sim\{-1,1\}^n}\bigg[\bigg| \sum_{i=1}^n \boldsymbol{x}_i f_i\bigg|\bigg]^2 \le 2\bigg(\sum_{i=1}^n \mathrm{Inf}[f_i]\bigg) + n,\]and there are examples showing that this is tight.

- (Yuval Peres) Let $G$ be a transitive graph, and consider edge percolation on $G$ with parameter $p$ (i.e. each edge is present with probability $p$). Define the quantity \[ S(p) := \mathop{\mathbb{E}}_p[|\mathcal{C}(v)|],\] where $v$ is an arbitrary vertex in $G$ and $\mathcal{C}$ is the component containing $v$. Is the logarithmic derivative $S’(p)/S(p)$ unimodal in $p$?

The first speaker of the day was Sergey Bobkov, who spoke about concentration on the cube and its relationship with various isoperimetric problems, including highlights of his own work. Suppose we endow the continuous unit cube $[0,1]^n$ with [...]]]>
**[Editor's note -- just a reminder that these daily updates are almost entirely thanks to Li-Yang Tan.]**

The first speaker of the day was Sergey Bobkov, who spoke about concentration on the cube and its relationship with various isoperimetric problems, including highlights of his own work.

Suppose we endow the continuous unit cube $[0,1]^n$ with the scaled Lebesgue measure $\mathbf{P}$; we will be interested in concentration inequalities like the following: for all measurable $A\subseteq \mathbb{R}^n$ of density $\mathbf{P}(A) \ge 1/2$ and $r > 0$,

\begin{equation}\mathbf{P}(A + rB_2) \ge 1-e^{-cr^2}, \label{bobkov}\end{equation}

where $c > 0$ is a universal constant ($c$ can be taken to be $\pi$ in this case), $B_2$ is the open Euclidean ball of unit radius centered at the origin, and $+$ is the Minkowski sum. (The analogous statement is *false* for the discrete cube $\{0,1\}^n$, though Talagrand has shown that it holds when $A$ or $\overline{A}$ is convex.) Here $A+rB_2$ is the enlargement of $A$ that one obtains by taking every point in $x\in A$ and also including its $r$-neighborhood of all $y\in A$ at distance $\| x-y\|_2 < r$ from $a$, and so (\ref{bobkov}) says that $A$ along with its $r$-neighborhood fills up almost all of $[0,1]^n$. Concentration inequalities like theses are well-known to be intimately related to various isoperimetric problems: measures $\mu$ that satisfy the Poincaré and Log-Sobolev inequalities enjoy good concentration properties, and implications in the reverse direction have been established as well. (Incidentally, Sergey mentions that the isoperimetric problem of determining the sets $A\subseteq [0,1]^n$ of measure $1/2$ that minimize $\mathbf{P}(A+rB_2)$ remains open.)

Sergey presented two concentration inequalities of this flavor with respect to other measures and notions of enlargement/neighborhood, both of which appear in a 2010 paper of his. First, if $A\subseteq [0,1]^n$ is *permutation invariant* (i.e. if $x\in A$ then $(x_{\pi(1)},\ldots,x_{\pi(n)})\in A$ for all permutations $\pi \in S_n)$ and $\mathbf{P}(A) \ge 1/2$, then for all $r > 0$,

\[ \mathbf{P}(A + rB_\infty) \ge 1-e^{-cnr^2}, \]

where again $c > 0$ is a universal constant and now $B_\infty$ is the unit $\ell_\infty$-ball. Note that since

\[ A + \frac{r}{\sqrt{n}} B_\infty \subset A + rB_2, \]

this is a sharpening of (\ref{bobkov}) for permutation invariant sets $A$. An equivalent functional formulation is the following: let $f:[0,1]^n\to\mathbb{R}$ be permutation invariant and suppose $|f(x)-f(y)| \le \|x-y\|_\infty$ for all $x,y\in[0,1]^n$. Then for all $r > 0$,

\[ \mathbf{P}\{|f-\mathbb{E}[f]|> r\}\le 2e^{-cnr^2}.\]

The second inequality concerns subsets of the simplex

\[ \Delta_n := \{ y \in \mathbb{R}^n_+\colon y_1 + \cdots + y_n \le 1 \}, \]

endowed with the uniform measure $\mu_n$, and states the following: for all $r > 0$ and $A\subseteq \Delta_n$ with $\mu_n(A) \ge 1/2$ and $r > 0$, we have that

\begin{equation} \mu_n(A+rB_1) \ge 1-e^{-cnr^2}. \label{bobkov2} \end{equation}

This inequality was proved by Arias-de-Reyna and Villa with a weaker bound of $1-ne^{-cnr^2}$, and the unnecessary dependence on $n$ was subsequently removed by Schechtman. Both these proofs are based on Talagrand’s isoperimetric theorem for the product exponential measure; Sergey gives a direct inductive proof in his paper. (A result of Sodin establishes an incomparable inequality $\mu_n(A+rB_2) \ge 1-e^{-cnr}$. Since $B_2\subset \sqrt{n}B_1$ this implies that $\mu_n(A+rB_1) \ge e^{c\sqrt{n}r}$, but this is a weaker statement than than (\ref{bobkov2}).)

Our next speaker was Almut Burchard, who spoke about polarization, symmetrization, and rearrangement inequalities. (For an introduction to all of these ideas, see Almut’s excellent survey.) The setting for these ideas can be either the sphere, Euclidean space, or hyperbolic space. Almut used Euclidean space as her running example but I’ll use the sphere here. Let $A$ be a subset of (the surface of) the sphere (in any dimension). Its *(symmetric decreasing) rearrangement* is the set $A^*$ which is a spherical cap (centered at the north pole, say) with the same volume as $A$. More generally, for a function $f : S^n \to \mathbb{R}$ you can define its rearrangement $f^* : S^n \to \mathbb{R}$ to be function which: a) is constant on all circles centered at the north pole; b) decreases as you move away from the north pole; c) is equimeasurable to $f$; i.e., for each $t$, the volume of points where $f > t$ is the same as the volume of points where $f^* > t$.

There are many inequalities which say that certain functionals of $f$ (or of multiple functions) increase or decrease when you change $f$ to its rearrangement $f^*$. For example, it’s possible to show that the perimeter of $A$ is always at least the perimeter of $A^*$ — this gives the the Isoperimetric Inequality on the sphere. More generally, you can show that for any $0 < \rho < 1$, the “spherical $\rho$-noise stability” of $A$ is always at most the spherical $\rho$-noise stability of $A^*$. This is like “Borell’s Isoperimetric Theorem”, except for the sphere. In fact, it’s strictly better because you can very straightforwardly deduce Borell’s Theorem (in Gaussian space) by projecting the sphere down to a lower dimension. (See the appendix of this paper.)

How do you show such rearrangement inequalities? All you can do is show that a *much simpler* symmetrization operation, called *polarization* appropriately increases or decreases your functional. Then you can get the result for the full rearrangement straightforwardly just by “polarizing many times, in every possible way” (roughly speaking!). And what is polarization? It’s super-simple: Pick any great circle $H$ on the sphere, dividing it into the hemisphere $H^+$ containing the north pole and the hemisphere $H^-$ containing the south pole. Now “polarizing” $A$ across $H$ just means that for each point of $A$ in $H^-$, you move it to its reflection across $H$ in $H^+$ — provided that point is not already occupied by $A$. More generally, if $f : S^n \to \mathbb{R}$, its polarization across $H$ is defined by looking at each pair of mirror-image points $x^\pm \in H^\pm$ and switching $f$’s values on these two points so that $f(x^+) \geq f(x^-)$ holds.

OK, so this reduces your job to showing that any polarization improves your functional (say, increases your noise stability). How do you show that? Essentially, you just have to show that each little two-point switch only helps you. Using this, one can get such basic results as the following: For any functions $u, v$,

\[

\int \int u(x) v(y) K(\mathrm{dist}(x,y)) dx dy \leq \int \int u^*(x) v^*(y) K(\mathrm{dist}(x,y)) dx dy

\]

whenever $K$ is a decreasing “kernel” function. (The fact that $K$ is decreasing is all you need to show that the two-point switches always work in your favor.) The above inequality holds, e.g., in Euclidean space (Riesz ’30s) and on the sphere (Baernstein-Taylor, ’70s). “Borell’s Isoperimetric Inequality on the Sphere” is automatic from it, even in the $2$-function or $2$-set version, just because $\rho$-noisy pairs of points $x, y$ are more likely to be close together than far apart (i.e., the appropriate kernel function is indeed decreasing).

Almut spoke about how these kinds of results can be used to show various results about Brownian motion. For example, in a work of hers with Schmuckenschlager, she shows that if $A$ is any subset of the sphere and $t \in \mathbb{R}^+$ is any time, then the probability a Brownian motion on the sphere will exit $A$ by time $t$ is at least the probability that it will exit the cap $A^*$ by time $t$. (This generalizes Borell’s similar result in Gaussian space.)

Almut also described versions of these rearrangement inequalities with more than two functions. One nice one (proved by Draghici using a reduction to functions on $\{-1,1\}^n$, and then strengthened by Almut and Hajaiej) is the following: Let $F : \mathbb{R}^n \to \mathbb{R}$ be supermodular. Then for any functions $u_1, \dots, u_n$ and any decreasing kernels $K_{ij}$ ($1 \leq i < j \leq n$) it holds that the following only increases when the $u_i$’s are replaced by their rearrangements $u_i^*$:

\[

\int \cdots \int F(u_1(x_1), \dots, u_n(x_n)) \prod_{i < j} K_{ij}(\mathrm{dist}(x_i,x_j))dx_1 \cdots dx_n.

\]

Next up was Alexandra Kolla, who presented joint work with Aram Harrow and Leonard Schulman on a maximal inequality for spherical means on the hypercube. She began with the following motivating example: suppose a criminal on the run in $\{0,1\}^n$ would like to choose a vertex among the $2^n$ many to reside, and there are cops located on a certain number of these vertices. Needless to say the criminal wouldn’t want to reside on a vertex on which a cop is located; presumably he would want to be as far away from as many cops as possible. To quantify this, let us say that a vertex $v$ is safe if for every radius $r$, the fraction of cops located at distance $r$ away from $v$ is a small fraction. The question is: how many safe spots are there in $\{0,1\}^n$ as a function of the density of cops?

Let $f: \{0,1\}^n \to \{0,1\}$, which we should think of as an indicator for the vertices on which a cop is located. For each $k \in [n]$, consider the $k$-th spherical means operator

\[ (S_kf)(x) := \mathop{\mathbb{E}}_{\mathrm{dist}(x,\mathbf{y}) = k}[f(\boldsymbol{y})],\]

or in words, the fraction of the ${n\choose k}$ many vertices $y$ at distance $k$ from $x$ on which a cop is located. As the title of their paper promises, Aram, Alexandra, and Leonard prove a maximal inequality for these spherical means operators. More precisely, if we define

\[ (M_S f)(x) := \max_{0\le k\le n} (S_kf)(x), \]

then their main theorem states that there exists a universal constant $A_H$ such that $\| M_S \|_{2\to 2} \le A_H$. The key point is that $A_H$ is independent of the dimension $n$. Equivalently, for all $n \in \mathbb{N}$ and $f: \{0,1\}^n \to \{0,1\}$ we have that

\[ \| M_Sf \|_2 \le A_H \| f \|_2. \]

Therefore, assuming the overall density of cops (i.e. $\|f \|_2^2 = \mathbb{E}[f(\boldsymbol{x})]$) is sufficiently small, there is a large fraction of vertices that are relatively safe for the criminal.

The proof proceeds via spectral approximation. Note that the distribution induced by the $S_k$ operator (i.e. uniform over all vertices at a fixed distance $k$ away from $x$) is not a product distribution, which makes it less amendable to analysis using spectral techniques. We will therefore use as a proxy for $S_k$ the $T_\rho$ noise operator with rate $\rho = 1-(2k)/n$, where each coordinate is flipped with probability $k/n$. Intuitively these two operators should be somewhat similar, since $S_k(x)$ operator corresponds to flipping a random subset of *exactly* $k$ coordinates of $x$, whereas the $T_\rho$ operator corresponds to flipping $k$ coordinates *in expectation*. And indeed this intuition can be made precise by comparing the eigenstructures of $S_k$ and $T_\rho$: the spectrum of $T_\rho$ can be analyzed via Fourier analysis, and that of $S_k$ is determined by the Krawtchouck polynomials.

Alexandra concluded by mentioning recent follow-up work with Greenbalt, Krause, and Schulman establishing similar maximal inequalities for $\mathbb{Z}_{m+1}^n$, the Cartesian product of $(m+1)$-cliques, for operators that average over a ball of radius $k$ instead of a sphere of radius $k$. (The analogous inequalities for the sphere appear to be harder.) The eigenstructure of these operators are again given by the Krawtchouk polynomials, though in this case the analysis is significantly more challenging.

For more details, see this blog post or watch Alexandra’s talk at the Simons Institute.

Our fourth speaker was Joe Neeman, who discussed some very nice recent work of his on noise stability. The *Gaussian noise stability *of a set $A\subseteq \mathbb{R}^n$ is the quantity $\Pr[\mathbf{X}\in A, \mathbf{Y}\in A]$, where $\mathbf{X}$ and $\mathbf{Y}$ are a pair of $\rho$-correlated Gaussians. While there are numerous interesting questions concerning Gaussian noise stability, Joe began his talk by addressing the basic one of how large this quantity can be relative to the Gaussian measure $\gamma_n(A)$ of $A$. A fundamental result in this direction is Borell’s Isoperimetric Theorem, which states that halfspaces are the most noise stable. More precisely, if $B = \{x\cdot a \le b\}$ is a halfspace such that $\gamma_n(B) = \gamma_n(A)$ (by rotational invariance of the Gaussian all such halfspaces are essentially one and the same), then

\[ \Pr[\mathbf{X}\in A, \mathbf{Y}\in A] \le \Pr[\mathbf{X}\in B, \mathbf{Y}\in B]. \]

The discrete analogue of Borell’s theorem for the hypercube (of which Borell’s theorem may be viewed as the “Gaussian special case”) is the Majority Is Stablest theorem of our three organizers Elchanan, Ryan, and Krzysztof: if the origin-centered Hamming ball $B = \big\{\sum x_i \le b\big\}$ satisfies $|B| = |A|$ then

\[ \Pr[\mathbf{X}\in A, \mathbf{Y}\in A] \le \Pr[\mathbf{X}\in B, \mathbf{Y}\in B] + \tau_A, \]

where now $\mathbf{X}, \mathbf{Y} \in \{-1,1\}^n$ are $\rho$-correlated bitstrings (i.e. $\mathbf{X}$ and $\mathbf{Y}$ are each uniform and $\mathbb{E}[\mathbf{X}_i\mathbf{Y}_j] = \delta_{ij}\cdot \rho$), and the error term $\tau_A$ depends on the maximum influence of $\mathbb{1}_A$.

To state these theorems in a slightly different language, we define

\[ J(a,b;\rho) := \Pr[\mathbf{X}\in B_1, \mathbf{Y}\in B_2], \]

where $B_1$ and $B_2$ are parallel halfspaces with $\gamma_n(B_1) = a$ and $\gamma_n(B_2) = b$, and $\mathbf{X},\mathbf{Y}$ form a pair of $\rho$-correlated Gaussians. (For brevity we write $J(a,b)$ when the parameter $\rho$ is clear from context.) In this notation, (the two-function generalization of) Borell’s theorem may be stated as

\[ \Pr[\mathbf{X}\in A_1, \mathbf{Y}\in A_2] \le J(\gamma_n(A_1), \gamma_n(A_2))\quad \text{for all $A_1,A_2 \subseteq \mathbb{R}^n$}, \]

and likewise, the Mossel-O’Donnell-Oleszkiewicz theorem as

\[ \Pr[\mathbf{X} \in A_1, \mathbf{Y} \in A_2] \le J(|A_1|/2^n, |A_2|/2^n) + \tau. \]

It may seem slightly strange that the $J(\cdot,\cdot)$ function, defined in terms of the Gaussian distribution, appears here in the case when $\mathbf{X},\mathbf{Y}\in \{-1,1\}^n$; however, note that this is in fact okay because of the central limit theorem. A recent result Joe and Elchanan’s states the following functional inequality: if $f,g:\mathbb{R}^n\to [0,1]$, and $K : [0,1]^2\to\mathbb{R}$ is a function satisfying the second derivative condition

\[ \left(\begin{array}{cc} K_{xx} & \rho K_{xy} \\ \rho K_{xy} & K_{yy} \end{array}\right) \le 0,\]

then $\mathbb{E}[K(f(\mathbf{X}),g(\mathbf{Y}))] \le K(\mathbb{E}f,\mathbb{E}g)$. To see that this implies Borell’s theorem, we simply check that the derivatives of $J$ do in fact satisfy the condition above, and that $J(0,0)=J(0,1)=J(1,0)=0$ and $J(1,1)=1$, and therefore

\[ \mathbb{E}[J(\mathbb{1}_{A_1}(\mathbf{X}),\mathbb{1}_{A_2}(\mathbf{Y})] = \Pr[\mathbf{X}\in A_1,\mathbf{Y}\in A_2] \le J(\gamma_n(A),\gamma_n(B)). \]

A key advantage of Joe and Elchanan’s new proof of Borell’s inequality is that it allows them to establish both *uniqueness* and *stability*: halfspaces are uniquely the most noise stable sets, and furthermore, sets that are almost optimally noise stable are close to some halfspace. In exciting follow-up work with Anindya De, Joe and Elchanan extended this technique to give a new proof of the Majority Is Stablest theorem by induction on dimension. Yet again this approach has a key advantage: in this case this new “discrete proof” can be shown to be captured by low-degree SOS proofs (see Boaz’s talk yesterday), and as a consequence, rules out the Khot-Vishnoi MaxCut instance as a candidate integrality gap instance for the SOS/Lasserre hierarchy.

Joe concluded with a couple of intriguing directions and open problems. First, instead of just two correlated random variables $\mathbf{X}$ and $\mathbf{Y}$, we may consider $k$ of them $\mathbf{X}^1,\ldots,\mathbf{X}^k$. In this case the analogous question would be that of determining the sets $A_1,\ldots,A_k\subseteq \mathbb{R}^n$ that maximize the probability

\[ \Pr[\mathbf{X}^1 \in A_1,\ldots,\mathbf{X}^k \in A_k]. \]

Second, let $\mathbf{X}$ and $\mathbf{Y}$ be $\rho$-correlated Gaussians as in Borell’s isoperimetric theorem. Can we determine the *three* sets $A_1,A_2$, and $A_3$ that partition $\mathbb{R}^n$ and maximize the probability

\[ \Pr[\exists\, i \colon \mathbf{X}\in A_i, \mathbf{Y}\in A_i],\]

subject to volume constraints $\gamma_n(A_i) = a_i$? Joe and Elchanan have made progress on this question in very recent work with Steven Heilman.

For more details check out these videos of Joe and Elchanan speaking at the Simons Institute.

The final scheduled speaker of the workshop was Yuval Peres, who spoke about joint work with Yael Dekel and Ori Gurel-Gurevich on the planted clique problem, an important unresolved problem in average-case complexity. (The worst-case complexity of determining the size of the maximum clique in an *arbitrary* graph is well-known to be NP-complete, and assuming $\mathsf{P}\ne \mathsf{NP}$, this quantity is even hard to approximate.) Suppose we are given a graph $G \leftarrow \mathcal{G}(n,1/2)$ and are asked the find large clique in $G$; a straightforward probabilistic argument shows that the largest clique has size $(2+o(1))\log n$ with high probability. A simple greedy algorithm which runs in polynomial time finds a clique of size $(1+o(1))\log n$, and an exhaustive search running in time $n^{O(\log n)}$ and will successfully find the largest clique. Despite our best efforts we have not been able to get the best of both these worlds, finding a clique of size say $1.01\log n$ in polynomial time.

Given this state of affairs Jerrum and Kučera introduced the planted clique problem, where the algorithm designer’s job is made seemingly easier with the promise that a clique of size $k = n^{\epsilon}$ for some constant $\epsilon > 0$ is planted somewhere in $G \leftarrow \mathcal{G}(n,1/2)$ — now can we find it efficiently? (Note that the size of this planted clique is *huge* compared to the one of size $2\log n$ that “shows up naturally”.) Kučera noted that when $k \ge c_0\sqrt{n\log n}$ the clique vertices are the ones with the highest degrees w.h.p., and so a simple algorithm that makes a pass over the vertex set recovers it. Alon, Krivelevich and Sudakov subsequently improved on this with a spectral algorithm that works when $k\ge c\sqrt{n}$. This remains the best we can do, and designing algorithms that find (or even simply detect the existence of) planted cliques of size $k = o(\sqrt{n})$ is an outstanding open problem.

As an alternative to the spectral algorithm of Alon et al., Feige and Ron proposed a simple and elegant algorithm that also works for $k \ge c\sqrt{n}$: in a sentence, it proceeds by iteratively removing the vertex of lowest degree (i.e. weeding out the losers instead of keeping the winners like in Kučera’s algorithm.) The one downside is that their analysis only yielded an error bound of $\Theta(1)$, where this constant depends on $c$; ideally we would like the error probability to tend to zero as $n \to \infty$. Yuval and his coauthors remedied this with an algorithm that has error probability $\exp(-n^{\epsilon})$ for some $\epsilon > 0$. Their algorithm is equally simple: first pick a random subset $S$ of vertices by independently including each one with probability $1/4$, and let $V^*$ be the vertices in $V \setminus S$ of degree at least $n/8$ into $S$. It can be shown that the density of the hidden clique within $V^*$ has increased significantly, and so by iterating this pruning procedure sufficiently many times we would be able to run Kučera’s simple “take the vertices of highest degrees” algorithm.

]]>

Suppose we are given a 3SAT formula $\varphi$, and are promised the existence of an assignment $x^*$ such that $x^*$ satisfies > 99% of the clauses in $\varphi$. Our goal is to efficiently find an assignment $x$ that satisfies as many clauses as possible. Certainly a uniformly random assignment satisfies a 7/8-fraction of the clauses — can we do better? Johan Håstad’s seminal paper proved that this entirely trivial algorithm (one that doesn’t even look at the input formula $\varphi$!) is in fact the best we can do assuming $\mathsf{P} \ne \mathsf{NP}$. For such CSPs we call the associated predicate $P:\{-1,1\}^k\to\{-1,1\}$ (e.g. $P(x_1,x_2,x_3) = x_1 \vee x_2 \vee x_3$ in the case of Max-3SAT)

The work of Austrin and Mossel gave a nice and clean *sufficient* condition for a predicate to be approximation resistant (also under the UGC): $P:\{-1,1\}^k \to\{-1,1\}$ is approximation resistant if there exists a balanced pairwise independent distribution on $\{-1,1\}^k$ that is supported on $P^{-1}(1)$. In this work Subhash and his coauthors give a complete characterization of approximation resistant predicates. Given a distribution $\mu$ on $\{-1,1\}^k$, define $\xi(\mu) \in \mathbb{R}^{k+{k\choose 2}}$ to be the vector of first and second moments:

\[ \xi(\mu)_i = \mathop{\mathbb{E}}_{\boldsymbol{x}\sim \mu}[\boldsymbol{x}_i],\quad \xi(\mu)_{i,j} =\mathop{\mathbb{E}}_{\boldsymbol{x}\sim \mu}[\boldsymbol{x}_i\boldsymbol{x}_{j}],\]

and let $\mathcal{C}(P)$ be the convex polytope

\[ \mathcal{C}(P) := \big\{\xi(\mu)\colon \text{$\mu$ is supported on $P^{-1}(1)$}\big\}. \]

In this notation, Austrin and Mossel’s result can be equivalently stated as saying that $P$ is approximation resistant if $\mathbf{0}\in \mathcal{C}(P)$. The necessary and sufficient condition that Subhash and his coauthors give is stated in terms of the existence of a probability measure $\Lambda$ on $\mathcal{C}(P)$ with certain symmetry properties.

Subhash concluded with a couple of open problems, the first of which is to determine if their characterization is decidable. The second is to find a linear threshold predicate (i.e. a predicate $P$ that can be expressed as $P(x) = \mathrm{sign}(w\cdot x – \theta)$ for some $w\in \mathbb{R}^k$ and $\theta\in \mathbb{R}$) that is approximation resistant, or to prove that all such predicates are approximable (see this paper for more on this problem). Finally, Subhash mentioned that the picture for the stronger notion of *approximation resistance on satisfiable instances* appears to be completely different. These are CSPs for which we cannot beat a random assignment even when promised the existence of an assignment satisfying *all *constraints; Max-3SAT is one such CSP whereas Max-3LIN, while approximation resistant, is not approximation resistant on satisfiable instances (since we can simply run Gaussian elimination).

For more details check out Subhash’s TCS+ talk.

The next speaker was Boaz Barak, who spoke about questions and results surrounding the sums-of-squares (SOS) hierarchy, UGC, hypercontractivity, and sparse vectors. For many combinatorial optimization problems where a simple LP/SDP yields a non-trivial performance (e.g. the Goemans-Williamson approximation algorithm for MaxCut), we would naturally like to know if these algorithms be improved by more sophisticated relaxations, or if they represent the best we can hope to do. In many of these cases the gap between the current best algorithm’s performance and the known (NP-)hardness result is closed by the UGC (and indeed this is the case for MaxCut). Boaz points out that the SOS hierarchy represents our best candidate for further improving current algorithms, and as a corollary also our best candidate for refuting the UGC; evidently understanding its power and limitations is of significant importance.

Let $P : \mathbb{R}^n\to\mathbb{R}$, and suppose we would like to certify that $P\le \alpha$. An SOS proof of this claim is an SOS polynomial $S$ such that $P -\alpha = S$ (where an SOS polynomial is, well, the sum of squared polynomials), and an SOS proof of degree $d$ is a pair of degree-$d$ SOS polynomials $S$ and $S’$ such that $(1+S’)(P-\alpha) = S$. The key fact about SOS proofs, and the main reason behind its importance in combinatorial optimization, is that degree-$d$ SOS proofs can be found “automatically” in time $n^{O(d)}$ via semidefinite programming; this is a theorem attributed to the independent work of Shor, Parillo, Nesterov, and Lasserre. The following example illustrates the connection between SOS proofs and optimization (and incidentally, is also among our few examples establishing limitations on the power of SOS proofs). Let us call a polynomial $P:\{-1,1\}^n\to\mathbb{R}$ a *3LIN polynomial* if

\[ P(x) = \frac1{m} \sum_{i,j,k} A_{ijk} x_i x_j x_k, \]

where $A_{ijk} \in \{0,-1,1\}$ and $m = \sum_{i,j,k} |A_{ijk}|$. Note that the range of $P$ is bounded in $[-1,1]$, and since $\mathbb{E}[P] = 0$ there must exist an $x\in \{-1,1\}^n$ such that $P(x) \ge 0$. (It is straightforward to see that every 3LIN polynomial encodes a 3LIN instance, hence its name.) A result of Grigoriev (and independently Schoenebeck) says the following: for every $\epsilon > 0$, there exists a 3LIN polynomial $P$ with such that $P \le \epsilon$, and yet there is no degree-$d$ SOS proof certifying that $P < 1$ for all $d < (\epsilon^2 n)/10^6$. Note that $P < 1$ certainly has a degree-$n$ SOS proof, since we are working over $\{-1,1\}^n$ and hence may assume that $P$ is multilinear without loss of generality. In words, the Grigoriev-Schoenebeck theorem states that there is a 3LIN instance where every assignment satisfies < 51% of constraints, and yet no degree $d < n/10^{10}$ proof can certify the absence of an assignment satisfying > 99% of constraints. This is in marked contrast with the situation for MaxCut, where it is known that the Goemans-Williamson approximation algorithm is captured by degree-$2$ SOS proofs.

Boaz concluded with the following problem related to the hypercontractive inequality: Given unit vectors $v_1,\ldots,v_m\in\mathbb{R}^n$, we define

\[ P(x) = \frac{n^2}{m} \sum \langle v_i,x\rangle^4,\]

and say that $P$ is bounded if $P(x) \le C \| x \|^4$ and $C$ is a constant. Is there always a low-degree SOS proof that $P$ is bounded? This has implications for algorithms for finding sparse vectors in subspaces, which in turn is related to a variety of problems including small-set expansion, planted clique, sparse coding, and restricted isometry.

For more details see Boaz’s blog post, or watch his TCS+ talk. Also check out the introduction to Ryan’s recent paper with Yuan Zhou for a nice overview of the SOS hierarchy and its history.

Next up was Julia Wolf, who spoke about going beyond the Gowers norm. Like Tom’s talk yesterday, Julia began with the statement of Szemerédi’s theorem: Let $A\subseteq \{1,\ldots,N\}$ be a subset containing no $k$-APs. Then $|A| = o_k(N)$. Julia pointed out that determining the precise asymptotics of the “$o_k(N)$” term is an important open problem, since sufficiently strong decay bounds along with the prime number theorem would directly yield the Green-Tao theorem as a consequence; unfortunately such strong quantitative bounds appear to be out of reach of our current techniques. (Of course, as we saw in Tom’s talk yesterday Szemerédi’s theorem already plays a crucial role in the current proof of the Green-Tao theorem.) This was the theme of Julia’s talk — strong quantitative bounds on the relationship between density and the existence of APs — but in the finite field model $\mathbb{F}_p^n$ instead of $\{1,\ldots,N\}$.

Let $f:\mathbb{F}_p^n\to\mathbb{C}$ where $p > 2$ is a fixed prime, and let $k\ge 2$ be an integer. The *Gowers $k$-norm *is defined to be

\[ \| f \|_{U^k}^{2^k} := \mathop{\mathbb{E}}_{\boldsymbol{x},\boldsymbol{h}_1,\ldots,\boldsymbol{h}_k\in\mathbb{F}_p^n} \big[\Delta_{\boldsymbol{h}_1\cdots \boldsymbol{h}_k} f(\boldsymbol{x})\big],\]

where $(\Delta_h f)(x) := f(x)\overline{f(x+h)}$ is the discrete derivative of $f$ with respect to $h$. (It is not obvious that these are norms, but they are.) The connection between these norms and APs in $\mathbb{F}_p^n$ goes via the following key fact due to Gowers: Let $f:\mathbb{F}_p^n\to [-1,1]$. Then

\[ \Big| \mathop{\mathbb{E}}_{\boldsymbol{x},\boldsymbol{d}}\big[f(\boldsymbol{x})f(\boldsymbol{x}+\boldsymbol{d}) \cdots f(\boldsymbol{x}+(k-1)\boldsymbol{d})\big]\Big| \le \| f \|_{U^{k-1}}. \]

Note that when $f:\mathbb{F}_p^n\to\{0,1\}$, which we may view as the indicator of a subset $A_f$, this expectation counts the density of $k$-APs in $A_f$. More generally, given a system of linear forms $\mathcal{L} = (L_1,\ldots,L_m)$ we may define

\[ T_{\mathcal{L}}(f_1,\ldots,f_m) = \mathop{\mathbb{E}}_{\boldsymbol{x}_1,\ldots,\boldsymbol{x}_d}\Big[\prod_{i=1}^m f_i(L_i(x_1,\ldots,x_d))\Big],\]

and in this notation, Gowers’s key fact may be stated as $T_{k-\mathrm{AP}}(f,\ldots,f) \le \| f \|_{U^{k-1}}$. Gowers’s key fact tells us that for any subset $A\subseteq\mathbb{F}_p^n$, if $\| \mathbb{1}_A-\alpha \|_{U^{k-1}}$ is small (where as always $\alpha = |A|/p^n$), then the number of $k$-APs that $A$ contains is correspondingly small. But what if the Gowers norm is large? In this case we need an inverse theorem, which allows us to conclude that $f$ would then be correlated with a degree-$(k-1)$ phase polynomial. These inverse theorems are proved using ergodic theory and transference principles, and their proofs do not establish explicit bounds. Consequently, the quantitative bounds we currently have for $k=3$ are far from tight, and we have no quantitative bounds at all for $k > 3$.

Julia concluded with an open problem. A result of hers and Gowers’s shows that Gowers’s key fact is actually not the best possible for most linear systems $\mathcal{L}$; they constructed a linear system $\mathcal{L}$ of size 6 for which

\[ |T_\mathcal{L}(f)| \le C(\|f \|_{U^2}), \quad \text{where $C(t) \to 0$ as $t\to 0$},\]

whereas Gowers’s key fact only implies a bound of $|T_\mathcal{L}(f)| \le \| f\|_{U^3}$. This was proved *using *Gowers’s key fact; the open problem is to find a proof that doesn’t.

Next up was Stanisław Szarek, whose talk was around a series of works, 1, 2, 3; mainly the second one, entitled “Entanglement thresholds for random induced states”, joint with Guillaume Aubrun and Deping Ye. The interest here is in studying how much entanglement there is in a “typical” bipartite quantum state. However, Stanisław considered a more sophisticated (realistic?) notion of a typical state than just choosing a random complex unit vector.

Let’s forget bipartite states for a second to describe this alternate notion of a random quantum state. Actually, we should be careful to distinguish between pure quantum states (which are identified with unit vectors in some $\mathbb{C}^d$) and “mixed” quantum states (which are identified with density matrices in $\mathbb{C}^{d \times d}$ — i.e., trace-1 PSD matrices — and can be thought of as probability distributions over pure quantum states). To define a “typical” pure state, it seems the only natural thing to do is take a uniformly random unit complex vector. But it’s not completely clear what the most “natural” way to define a “typical” mixed state is. A work of Życzkowski and Sommers introduced the following notion: To get a “typical” $d$-dimensional mixed state, first suppose that there is also an $s$-dimensional “environment”. Next, take a random pure state in $\mathbb{C}^d \otimes \mathbb{C}^{s}$, and then get a mixed state on the first $d$-dimensional part by “tracing out” the second $s$-dimensional part. This yields a different distribution on $d$-dimensional mixed states for each choice of $s$. This distribution is evidently quite a bit more complicated than simply taking a uniformly random pure state.

Back to Stanisław’s talk. Suppose we generate a two-“qudit” bipartite mixed state in this way; i.e., we take a random pure state in $\mathbb{C}^d \otimes \mathbb{C}^d \otimes \mathbb{C}^s$ and trace out the third part. Will the resulting mixed bipartite state actually be entangled? This depends on the enviroment’s dimension, and Stanisław’s work shows a sharp transition around some $s = \widetilde{\Theta}(d^2)$: if $s \leq (1-\epsilon)d^2$ then the resulting mixed state is entangled with very high probability; if $s \geq (1+\epsilon)d^2$ then the resulting mixed state is separable with very high probability.

If you prefer to hear about qubits, a consequence is the following. Suppose we take a random pure state in an $N$-qubit system. Trace out all but $2k$ of the qubits, leaving a mixed state on $2k$ qubits, thought of as a bipartite system with $k$ qubits each. Then if $k \gg N/5$ the resulting bipartite system will be

entangled with high probability and if $k \ll N/5$ the resulting bipartite system will be separable with high probability.

Unfortunately, Stanisław didn’t have time to get into any proof details; I understand they’re quite sophisticated, introducing new ideas from random matrix theory and asymptotic geometric analysis.

The last speaker of the day was Christophe Garban. Christophe gave a very interactive talk concerning recent work (both his and that of others) concerning the speed at which Glauber dynamics mixes in the critical planar Ising model. Let’s unpack some of that jargon. (For more details you might look at some of Christophe’s writing, which has beautiful diagrams and beautiful exposition.) The planar Ising model is a physically natural *nonuniform* probability distribution on strings $\sigma \in \{-1,+1\}^{n^2}$. Think of an $n \times n$ square lattice, with each site $x \in [n] \times [n]$ having a “magnetic spin” $\sigma_x$, either $\pm 1$. Magnetic spins like to be the same as their neighbors, so one measure of how happy an overall configuration $\sigma$ is would be $E(\sigma) = \sum_{x \sim y} \sigma_x \sigma_y$, where $x \sim y$ denotes that $x$ and $y$ are nearest neighbors in the $[n] \times [n]$ grid. The *Ising model with parameter $\beta \geq 0$* corresponds to choosing $\boldsymbol{\sigma} \in \{-1,+1\}^{n^2}$ at random by stipulating that ${\bf Pr}[\boldsymbol{\sigma} = \sigma]$ be proportional to $\exp(\beta E(\sigma))$. Note that if $\beta = 0$ we just get the uniform distribution. On the other hand, as $\beta \to \infty$ the configurations $\sigma$ of maximal $E(\sigma)$ — i.e., the two constant strings — get more and more important, with the $\beta = \infty$ corresponding simply to the uniform distribution over $\{(-1, -1, \dots, -1), (+1, +1, \dots, +1)\} \subset \{-1,+1\}^{n^2}$. Physically, $\beta$ corresponds to the reciprocal of “temperature”, so this last distribution corresponds to an ice-cold (“frozen”) iron square and the uniform distribution ($\beta = 0$) corresponds to a super-hot iron square.

What about $\beta$’s in between? Actually, there’s a super-special value of $\beta$ called the *critical* $\beta_c$ (known to equal $\frac12 \ln(1+\sqrt{2})$ where something particularly interesting happens. If you take a draw $\boldsymbol{\sigma}$ from the Ising model with parameter $\beta$, you can ask about the correlation ${\bf E}[\boldsymbol{\sigma}_x \boldsymbol{\sigma}_y]$ between two sites; more precisely, how it decays as a function of the lattice-distance between $x$ and $y$. If $\beta$ is close to $0$ then the decay is exponential; if $\beta$ is close to $\infty$ then there is hardly any decay from $1$ at all. The critical $\beta_c$ is characterized by being the unique value such that the decay is *polynomial*. In fact it’s known that with parameter $\beta_c$ we have ${\bf E}[\boldsymbol{\sigma}_x \boldsymbol{\sigma}_y] = \Theta(\|x-y\|^{-1/4})$.

So in some sense this is the most “interesting” value of $\beta$, and it’s the one Christophe focused on in his talk. Now, once you have an interesting probability distribution on $\{-1,1\}^{n^2}$ you want an interesting Markov chain having it as the stationary distribution. (In the $\beta = 0$ case — i.e., uniform-distribution — this is just the standard random walk on the hypercube.) The natural one in the Ising model is called Glauber dynamics. As in the “usual case”, this Markov chain works by first choosing a random site $\boldsymbol{x} \in [n] \times [n]$ to potentially update. However, the probability of *actually* updating the spin $\boldsymbol{\sigma}_{\boldsymbol{x}}$ depends on the sum of the spins of its four neighbors (and on the parameter $\beta$), according to a simple formula.

The natural question now is: How long does it take for this Markov chain to mix to its stationary distribution, the $\beta_c$-Ising model? Unfortunately, no one knows! At the very least, a notable breakthrough of Lubetzky and Sly from 2012 (see also this popular account) showed that the mixing time is $\mathrm{poly}(n)$. There is a great deal of physical interest in determining the actual exponent in the polynomial, however. More attention is paid to the slightly simpler question of showing that the “second eigenvalue” of the Markov chain’s Laplacian is at least $1/n^z$ for some positive $z$. (Lubetzky and Sly’s work establishes *some* finite upper bound on $z$.)

Christophe mainly discussed his work in progress on the *other* side of the problem, trying to provide lower bounds on $z$. Some (questionable?) physics simulations suggest that $z$ is some modestly small number like $2.17$. To show lower bounds, what you precisely need to do is find “starting configurations” — or more generally, functions $f : \{-1,+1\}^n \to \mathbb{R}$ — with low “conductance” under the critical Ising model; more precisely, which have $\mathscr{E}[f,f]/{\bf Var}[f]$ small, where $\mathscr{E}[f,f]$ is the Dirichlet form, akin to “total influence” in the uniform-distribution case. Christophe showed us that it’s not too hard to calculate what you get for such classics as the Dictator, Majority $ = \mathrm{sgn}(\sum_x \sigma_x)$, and even simply $f = \sum_x \sigma_x$ (called “magnetization” by the physicists). The last of these actually gives the best known lower bound on $z$, namely $z \geq 7/4$. Christophe has been working valiantly to try to come up with other examples which beat $7/4$ (and several audience members proposed some suggestions). His most interesting result is that indicator function of left-right percolation on the square — which is a horribly noise-sensitive example in the uniform-distribution case — is actually quite a good (“low sensitivity”) example for the critical Ising model. It actually shows $z \geq 5/8$, which is the best lower bound known coming from a *boolean-valued* $f$.

As usual, we close out with some pictures.

]]>Tom started his talk with the celebrated theorem of Green and Tao establishing the existence of arbitrarily long arithmetic progressions (APs) in the primes. At the heart of Green and Tao’s proof is Szemerédi’s theorem, which states the following: Let $A \subseteq \{1,\ldots,N\}$ be a subset of density $\alpha > 0$. Then either $A$ contains a $k$-term AP or $\alpha = o_k(1)$ — in words, every subset of the integers of sufficiently large density contains a long AP. (Pinning down the exact rate at which the term $o_k(1)$ tends to zero is an important open problem.) We may consider analogues of the Green-Tao and Szemerédi theorems in the Boolean hypercube $\{-1,1\}^n$, or rather $\mathbb{F}_2^n$ to be more precise, where APs “become” affine subspaces (throughout we will associate $|\mathbb{F}_2^n| = 2^n$ with $N$ in this correspondence). As it turns out many arguments in additive combinatorics that apply in $[N]$ are modeled slightly more cleanly in $\mathbb{F}_2^n$; Tom mentions that this may be more a statement about the methods than the questions, since $\mathbb{F}_2^n$ naturally constitutes a group whereas this is not exactly true for $[N]$. Anyway, with this correspondence in mind we may wonder about an analogue of Szemerédi’s theorem in $\mathbb{F}_2^n$, and indeed, we have the following theorem: Let $A\subseteq \mathbb{F}_2^n$ be a subset of density $\alpha$. Then $A$ contains a subspace $V$ of dimension at least $(\log N)/\log(1/\alpha)$. The proof proceeds by repeated applications of Cauchy-Schwarz.

And now as Tom promised in the title for his talk, we next consider *sumsets* in $\mathbb{F}_2^n$; intuitively, taking the sumset of $A$ with itself should enrich the additive structure of $A$. (For any two subsets $A,B\subseteq \mathbb{F}_2^n$, the sumset of $A$ and $B$, denoted $A+B$, is simply the set $\{a + b \colon a \in A, b\in B\}$.) Consider a subset $A\subseteq \mathbb{F}_2^n$ of density $\alpha$. By “Szemerédi’s theorem for $\mathbb{F}_2^n$” above, we know that $A$ contains a large subspace, one of dimension at least $(\log N)/(\log(1/\alpha))$. The sumset problem of Bourgain and Green asks, “How about the largest subspace in $A+A$?” It is not hard to see (via the inclusion-exclusion principle) that if $\alpha > 1/2$ then $A+A = \mathbb{F}_2^n$, or to phrase it in a somewhat non-standard manner, $A+A$ contains a subspace of codimension $0$. Certainly this is tight, since if $\alpha = 1/2$ then $A$ could itself be a subspace of codimension $1$, in which case $A+A = A$.

In the remainder of his talk Tom surveyed some of the known results for $\alpha < 1/2$, beginning with the following theorem of his: Suppose $\alpha > 1/2-1/(1000\sqrt{n})$. Then $A+A$ contains a subspace of codimension $1$. And this is tight by a result of Green and Ruzsa: for all $\epsilon > 0$ there exists $A = A(\epsilon) \subseteq \mathbb{F}_2^n$ of density $\alpha > 1/2-\epsilon$ such that if $V \subseteq A + A$ is a subspace, then the codimension of $V$ is $\Omega(\epsilon\sqrt{n})$. (As you may have guessed the set $A$ achieving this bound is one of the usual suspects in the analysis of Boolean functions: the Hamming ball.) And finally, complementing this last result of Green and Ruzsa we have the following theorem due to Tom: Suppose $\alpha > 1/2-\epsilon$. Then $A+A$ contains a subspace of of codimension $O(n/\log(1/\epsilon))$. The proof of this proceeds via a density-increment-type argument reminiscent of Meshulam’s Fourier-analytic proof of an upper bound on the size of $3$-AP-free subsets of $\mathbb{F}_3^n$ (i.e. the “cap set problem”).

Tom concluded with two open problems, the first of which appears in his paper: Let $\alpha > 1/2-c/\sqrt{n}$, and show that $A+A$ contains a subspace of codimension $O(c^2)$ (or any bound that depends only on $c$). The second is a coloring-type problem: Suppose we partition $\mathbb{F}_2^n$ into $k$ color classes $A_1,\ldots,A_k$. Show that there exists an index $i$ such that $A_i + A_i$ contains a subspace of codimension $O_k(1)$. The special case when $k=3$ is open, as is the relaxation “there exist indices $i$ and $j$ such that $A_i + A_j$ contains a subspace of codimension $O_k(1)$”.

For more details, see the “Subspaces in sumsets” open problem from 2012 Symposium’s open problem list and references therein.

Next up was Thomas Vidick, who spoke about joint work with Assaf Naor and Oded Regev about non-commutative extensions of Grothendieck’s inequality. Given a matrix $A \in \mathbb{R}^{n\times n}$, consider the optimization problem of determining the quantity $\mathrm{opt}(A)$ where

\[ \mathrm{opt}(A) := \max_{x_i,y_j \in \{\pm 1\}} \sum_{i,j=1}^n A_{ij} x_i y_j. \]

Unsurprisingly computing $\mathrm{opt}(A)$ is NP-hard in general (it can encode $\mathsf{Max Cut}$, for example), and so we consider the relaxation

\[ \mathrm{opt}(A) \le \mathop{\max_{u_i,v_j \in \mathbb{R}^n}}_{\|u_i\|=\|v_j\|=1} \sum_{i,j=1}^n A_{ij} u_i \cdot v_j, \]

which we may then solve efficiently via semidefinite programming. The natural question is, how good of an approximation is this to $\mathrm{opt}(A)$? Grothendieck’s inequality provides an answer, showing that there exists a universal constant $K_G \le 1.78$ (the *Grothendieck constant*) such that the maximum on the right-hand size is at most $K_G \mathrm{opt}(A)$. (The precise value of $K_G$ is yet to be determined: Krivine in 1979 proved an upper bound of $K_G \le \pi/(2\ln(1+\sqrt{2})) = 1.7822\ldots$ and conjectured that this is in fact the right answer; Krivine’s conjecture was recently disproved by Braverman, Makarychev, Makarychev, and Naor.) Grothendieck’s inequality, fundamental in the study of tensor norms, was imported into TCS by the work of Alon and Naor, who gave an algorithmic proof of Grothendieck’s inequality, at the heart of which is a rounding scheme that turns vector solutions $u_i,v_j\in \mathbb{R}^n$ to the SDP into signs $x_i, y_j \in \{-1,1\}$ that come close to attaining $\mathrm{opt}(A)$.

Grothendieck’s inequality has the following non-commutative generalization due to Pisier and Haagerup: there exists a universal constant $K_H \le 2\sqrt{2}$ such that for all $M \in \mathbb{R}^{n\times n\times n\times n}$,

\[ \mathrm{opt}(M) := \mathop{\max_{X,Y\in \mathbb{R}^{n\times n}}}_{XX^T = YY^T = \mathrm{Id}} \sum_{i,j,k,\ell} M_{ijk\ell} X_{ij} Y_{k\ell} \le \mathop{\max_{\vec{U}\vec{U^T} = \vec{U^T} \vec{U} = \mathrm{Id}}}_{\vec{V}\vec{V^T} = \vec{V^T} \vec{V} = \mathrm{Id}} \sum_{i,j,k,\ell} M_{ijk\ell} \vec{U}_{ij}\cdot \vec{V}_{k\ell} \le K_H\mathrm{opt}(M) \]

where the second maximum is taken over all $n\times n$ vector-valued matrices $\vec{U}$ and $\vec{V}$ with entries in $\mathbb{R}^n$, with multiplication defined to be $(\vec{U}\vec{V})_{ij} = \sum_k \vec{U}_{ik}\cdot \vec{V}_{kj}$, i.e. just like regular matrix multiplication, except that we take the inner product of the corresponding vector-valued matrix entries instead of multiplying real-valued entries. A few remarks. First, we note that if $M$ is diagonal in the sense that there is some $A\in \mathbb{R}^{n\times n}$ such that $M_{ijk\ell} = A_{ik}$ if $i=j$ and $k=\ell$, and $0$ otherwise, then $\mathrm{opt}(M) = \mathrm{opt}(A)$, so this is indeed a generalization of Grothendiek’s inequality. Second, it seems natural to wonder if the constraint that $\vec{U^T}\vec{U} = \mathrm{Id}$ is actually necessary given that we are already enforcing that $\vec{U}\vec{U^T}=\mathrm{Id}$ (and likewise for $V$). These two additional constraints are in fact necessary; without them this second maximum would not be within a universal constant of $\mathrm{opt}(M)$. Thomas’s main result in his work with Oded and Assaf Naor is an analogue of Alon and Naor’s algorithmic proof of Grothendieck’s inequality in this non-commutative setting, giving an efficient scheme for rounding vector-valued orthogonal matrices into regular orthogonal matrices that preserves the objective value. In slightly more detail, given vector-valued matrices that have objective value within $(1-\epsilon)$ of the SDP optimum, they give an algorithm that runs in time polynomial in $n$ and $1/\epsilon$ and finds two orthogonal matrices $X, Y \in \mathbb{R}^{n\times n}$ with objective value

\[ \sum_{i,j,k,\ell} M_{ijk\ell} X_{ij} Y_{k\ell} \ge \Big(\frac1{2\sqrt{2}}-2\epsilon\Big) \mathrm{opt}(M). \]

For more details, check out Thomas’s blog post on this topic. Also look for Oded’s talk on the same topic in the list of videos from the 2012 edition, and this video of Assaf Naor for the two-hour version. There’s also this 87-page survey devoted to Grothendieck’s inequality.

Next up was Oded Regev, who spoke about ongoing work with Shachar Lovett and Raghu Meka on discrepancy, a beautiful area which has seen much exciting progress in recent years. Oded’s talk was based on a series of three superb blog posts on the topic by Raghu, so I will only mention here a “vector generalization” of Spencer’s Six Standard Deviations Suffice theorem which does not appear in those posts: For all $n\times n$ matrices $A$ of *unit vectors*, there exists $\epsilon_1,\ldots,\epsilon_n\in \{-1,1\}$ such that $\|\sum_{j=1}^n\epsilon_j A_{ij} \|_2 \le O(\sqrt{n})$ for all $i\in [n]$. (Oded attributed this generalization to Krzysztof Oleszkiewicz.)

Next I (Li-Yang) spoke about recent work with Rocco Servedio in property testing, giving a polynomial lower bound on the query complexity of testers for the monotonicity of Boolean functions. In this problem we are given as input a distance parameter $\epsilon > 0$ and black-box query access to an unknown Boolean function $f$, and using as few queries as possible, we would like to determine with high confidence whether $f$ is monotone or $\epsilon$-far from monotone. This problem was introduced in a 1998 paper of Goldreich et al., who proposed a non-adaptive tester and proved an $O_\epsilon(n^2)$ upper bound on its query complexity, subsequently improved to $O(n/\epsilon)$ in the journal version. Fischer et al. established the first lower bounds shortly after, showing the existence of a constant distance parameter $\epsilon_0 > 0$ such that $\Omega(\log n)$ queries are necessary for any non-adaptive tester. (This directly implies an $\Omega(\log\log n)$ lower bound for adaptive testers.) These upper and lower bounds were the best known for more than a decade, until a recent beautiful paper of Chakrabarty and Sheshadhri improved on the linear upper bound of Goldreich et al. with an $\tilde{O}_\epsilon(n^{7/8})$-query non-adaptive tester.

Our main result is an improvement of the lower bounds of Fischer et al.: we prove the existence of a universal constant $\epsilon_0 > 0$ such that any non-adaptive algorithm for testing whether an unknown $f$ is monotone versus $\epsilon_0$-far from monotone requires $\tilde{\Omega}(n^{1/5})$ many queries. We build on and extend techniques introduced by Blais and O’Donnell, applying multidimensional central limit theorems (CLTs) to reason about testers for certain properties; the main technical ingredient in our case is a recent multidimensional CLT of Valiant and Valiant for Wasserstein distance.

The last scheduled talk was by Ehud Friedgut, who spoke about a series of works with David Ellis, Yuval Filmus, and Haran Pilpel, codenamed [EFP], [EFF1], [EFF2], and [EFF3]. The general theme of these works is solving extremal problems in the symmetric group $S_n$ by Fourier-analytic (rather than combinatorial) means, which often has the benefit that one can upgrade them to “stability results”. Ehud gave a brief review of Fourier analysis over $S_n$. For those who haven’t seen it before, the quick-and-dirtiest explanation is that the Fourier coefficients $\widehat{f}(\lambda)$ of a function $f : S_n \to \mathbb{C}$ are *matrices*, indexed by partitions $\lambda$ of $n$; i.e., sequences of positive integers $\lambda_1 \geq \lambda_2 \geq \cdots$ which add up to $n$. As in the more familiar case of functions $f : \mathbb{F}_2^n \to \mathbb{R}$, the partitions are stratified into “levels”; the “$k$th level” corresponds to those partitions with $\lambda_1 \geq n – k$. This also coincides with the functions which are linear combinations of “$k$-cosets”; i.e., functions of the form $1[\sigma(i_1) = j_1, \dots, \sigma(i_k) = j_k]$ for some $i_1, \dots, i_k, j_1, \dots, j_k \in [n]$.

Friedgut and coauthors used Fourier-analytic methods to prove resolve several longstanding extremal combinatorics problems; e.g., the 34-year-old Deza-Frankl Conjecture, which states that if $A$ is a “$t$-intersecting” subset $S_n$ (meaning any two elements in $A$ agree about the image of at least $t$ points in $[n]$) then $|A| \leq (n-t)!$, with equality iff $A$ is a $t$-coset. Furthermore, by virtue of using spectral rather than combinatorial techniques, they were able to provide associated “stability” results. These often came by generalizing classic results in the Boolean hypercube setting (the Nisan-Szegedy Theorem, the FKN Theorem, the Kindler-Safra Theorem, …) to the $S_n$ setting. As just one example, [EFF2] showed the analogue of the FKN Theorem: if $f : S_n \to \{0,1\}$ is a balanced function with all but $\epsilon$ of its Fourier mass on “level $0$ and level $1$” then $f$ must be $\epsilon^{\Omega(1)}$-close to a union of disjoint $1$-cosets.

]]>Avi Wigderson kicked off this year’s symposium with a talk describing recent joint work with Dimitri Gavinsky, Or Meir, and Omri Weinstein attacking one of the central open problems in computational complexity: does ${\mathsf P} = \mathsf{NC}^1$, or in words, can every sequential computation be efficiently parallelized? For a Boolean function $f:\{0,1\}^n \to\{0,1\}$, [...]]]>

Avi Wigderson kicked off this year’s symposium with a talk describing recent joint work with Dimitri Gavinsky, Or Meir, and Omri Weinstein attacking one of the central open problems in computational complexity: does ${\mathsf P} = \mathsf{NC}^1$, or in words, can every sequential computation be efficiently parallelized?

For a Boolean function $f:\{0,1\}^n \to\{0,1\}$, let $S(f)$ denote the minimum size of any Boolean *circuit* computing $f$, and $L(f)$ the minimum size of any Boolean *formula* computing $f$. Certainly $S(f) \le L(f)$ for all $f$, and we know of examples showing that $S(f)$ can be polynomially smaller than $L(f)$ — the parity function, for example, has $S(\mathsf{PAR}_n) = O(n)$ whereas $L(\mathsf{PAR}_n) = \Omega(n^2)$. The $\mathsf{P}$ versus $\mathsf{NC}^1$ problem seeks an explicit function $f$ witnessing a super-polynomial separation between these two quantities.

The work of Karchmer, Raz, and Wigderson (KRW) proposed the following approach to the problem. Let $D(f)$ denote the minimum *depth* of any formula computing $f$ (and recall that $D(f) = O(\log(L(f))$ by Spira’s depth-reduction theorem). Given two Boolean functions $g:\{0,1\}^m \to\{0,1\}$ and $f:\{0,1\}^n\to\{0,1\}$, we write $g\circ f : \{0,1\}^{mn}\to\{0,1\}$ to denote their (disjoint) composition $(g\circ f)(x^1,\ldots,x^m) = g(f(x^1),\ldots,f(x^m))$. Clearly $D(g \circ f) \le D(g) + D(f)$: to compute $g\circ f$ with a formula, simply stick on a formula for $f$ at every leaf in a formula computing $g$. The KRW conjecture states that this naive construction is essentially the best one can do, i.e. that $D(g \circ f) \approx D(g) + D(f)$; equivalently, $C(R_{g\circ f}) \approx C(R_g) + C(R_f)$ where $R_g$ and $R_f$ are the communication problems corresponding to the Karchmer-Wigderson relations of $f$ and $g$ respectively, and $C(\cdot)$ denotes deterministic communication complexity. KRW showed that proving this conjecture would separate $\mathsf{P}$ from $\mathsf{NC}^1$ (notably, this approach appears to circumvent the Razborov-Rudich natural proofs barrier). They furthermore proposed the first step of proving the conjecture in the case when $f = U_n$ and $g=U_m$, where $U_k$ is the $k$-ary *Universal Relation *that defines the following communication problem: Alice receives $x\in \{0,1\}^k$ and Bob $y\in \{0,1\}^k$ with the sole guarantee that $x\ne y$, and their goal is to find a coordinate $i\in [k]$ such that $x_i \ne y_i$. (It is easy to see that every Karchmer-Wigderson relation reduces to the Universal Relation, hence its name.)

This first step was taken by Edmonds, Impagliazzo, Rudich, and Sgall; Håstad and Wigderson subsequently gave an alternative proof. Using tools from interactive information complexity Avi and his co-authors take the natural next step: they establish the KRW conjecture in the case when $f = U_n$ and $g$ is an arbitrary function. The case when $g = U_m$ and $f$ is an arbitrary function remains open: while superficially similar, Avi points out that this case appears to be of significantly different nature, and proposes it as a good next step to pursue.

For more details here’s a video of Or giving a talk on their results.

Next up were back-to-back talks by Rafał Latała and Witold Bednorz on their recent work on the suprema of Bernoulli processes, the main result of which is an affirmative answer to Talagrand’s \$5000 Bernoulli Conjecture (among the list of open problems from the 2012 Symposium). The problem can be stated as follows: let $T\subseteq \ell^2$ be a collection of vectors, and consider the quantity:

\[ b(T) := \mathop{\bf E} \left[\sup_{t\in T} \sum_{i =1}^\infty \boldsymbol{\epsilon}_i t_i \right], \]

where $\boldsymbol{\epsilon}_1,\boldsymbol{\epsilon}_2,\ldots$ are i.i.d. uniform $\pm$ 1 random variables. The goal is to obtain tight upper and lower bounds on $b(T)$; more precisely, to understand its relationship with the geometry of $T$. As is often the case with questions concerning Bernoulli random variables, it is natural to first consider the Gaussian setting where we can define the analogous quantity:

\[ g(T) := \mathop{\bf E}\left[ \sup_{t\in T} \sum_{i =1}^\infty \boldsymbol{g}_i t_i \right], \]

and where the $\boldsymbol{g}_i$’s are i.i.d. standard Gaussians. The generic chaining theory of Michel Talagrand (building on results of Fernique and Dudley, the culmination of a body of work dating back to Kolmogorov in the 30′s) has shown that $g(T)$ is characterized, up to universal constant factors, by the quantity

\[\gamma_2(T,\|\cdot \|_2) := \inf \sup_{t \in T} \sum_{n=0}^\infty 2^{n/2} \mathrm{diam}(\mathcal{A}_n(t)),\]

where the infimum is taken over all choices of $\{\mathcal{A}_n\}_{n=0}^\infty$, increasing sequences of partitions of $T$ satisfying $\mathcal{A}_0 = \{T\}$ and $|\mathcal{A}_n| = 2^{2^n}$, and $\mathcal{A}_n(t)$ denotes the unique set in the partition $\mathcal{A}_n$ that contains $t$.

With Talagrand’s characterization of $g(T)$ in mind we now revisit the Bernoulli setting. An easy application of Jensen’s inequality yields the relationship $b(T) \le g(T)\sqrt{\pi/2}$, and an even simpler “$\ell_1$-bound” yields $b(T) \le \sup_{t\in T} \| t \|_1$. Together these give us the upper bound:

\[ b(T) \le \inf\{ \sup_{t\in T_1} \|t\|_1 + g(T_2) \colon T \subseteq T_1 + T_2\}.\]

Talagrand’s Bernoulli Conjecture essentially asks: “Can we reverse this inequality?”, and Rafał and Witek’s work answers this in the affirmative. Their main theorem states the following: for all $T$ such that $b(T) < \infty$, there exists $T_1$ and $T_2$ such that $T \subseteq T_1 + T_2$ and $\sup _{t\in T_1}\| t \|_1 \le O(b(T))$ and $g(T_2) = O(b(T))$. Rafał’s talk ended with the following intriguing question from Yuval Peres and Elchanan Mossel: is there an efficient algorithm for computing or approximating $b(T)$? (This is very closely related to Raghu Meka’s talk; see below.)

Witek spoke about several potential extensions of the Bernoulli Theorem which remain open. Here I will highlight two:

The first is a strict generalization, conjectured by Kwapień, to the Banach space setting. Let $(B, \| \cdot \|)$ be a normed space, and let $u_1, u_2, \dots \in B$ be such that $\sum_i u_i \boldsymbol{\epsilon}_i$ converges almost surely. (Here the $\boldsymbol{\epsilon}_i$’s represent independent uniformly $\pm 1$ random variables.) Is it possible to write each $u_i$ as $u_i = v_i + w_i$ such that

\[

\mathop{\bf E}\left[\sum_i u_i \boldsymbol{\epsilon}_i\right]

\gtrsim \sup_{\delta_i \in \{-1,1\}} \left\{\left\|\sum_i v_i

\delta_i\right\| \right\}, \ \

\mathop{\bf E}\left[\sum_i u_i \boldsymbol{\epsilon}_i\right]

\gtrsim {\bf E}\left[\sum_i w_i \boldsymbol{g}_i\right]?

\]

(Here as usual the $\boldsymbol{g}_i$’s represent independent standard Gaussians, and $\gtrsim$ means inequality up to a universal constant.) In case $B = \ell^\infty$ this follows from Rafal and Witek’s proof of the Bernoulli Conjecture; however, this potential generalization does not seem so easy because their proof does not give much control over the “ $T_1$” and “$T_2$” sets.

Another potential generalization of the Bernoulli Theorem is to $p$-biased $\pm 1$ random variables. More specifically, let $p > 0$ be small and let $\boldsymbol{\delta} = (\boldsymbol{\delta}_1,\dots, \boldsymbol{\delta} _n)$ be a sequence of independent random variables which are $1-p$ with probability $p$ and $-p$ with probability $1-p$. Let $T$ be a finite collection of vectors in ${\mathbb R}^n$. As usual, we are interesting in finding deterministic quantities which bound

\[

b_p(T) := \mathop{\bf E}\left[\max_{t \in T} |\langle t,

\boldsymbol{\delta}\rangle| \right]

\]

up to a constant factor not depending on $n$ *or on $p$*.

As in the Bernoulli Conjecture, one can give upper bounds based on: (a) trivial $\ell_1$ norm considerations; (b) chaining (roughly speaking, the union bound with clever clustering of vectors). In the Bernoulli Conjecture case (i.e., $p = 1/2$), the chaining bound relies on Chernoff bounds; i.e., the subgaussian property of uniform $\pm 1$ random vectors. In this $p$-biased case one has to be a little more careful using Bernstein bounds. In any case, Talagrand has again conjectured that these are the “only” two ways to get an upper bound. Precisely, the conjecture boils down to showing the following: one can find vector sets $T_1, T_2$ with $T \subseteq T_1

+ T_2$, and

\[

b_p(T) \gtrsim \mathop{\bf E}\left[\max_{t \in T} \sum_i |t_i|

\boldsymbol{\delta}_i\right],\ \ \sqrt{p} \cdot \gamma_2(T_1, \| \cdot

\|_2), \ \ \gamma_1(T_2, d_{\infty}),

\]

where $\gamma_2$ is as before, and the $\gamma_1$ quantity is like $\gamma_2$ except that the diameter is measured with respect to the 1-norm and the coefficient on contribution from the $j$th part is $2^j$ rather than $2^{j/2}$. (And, it should be stressed again, the constant hidden by the $\gtrsim$ should not depend on $p$ or $n$.)

For more details here’s a video of Rafal giving a longer version of his talk; see also this great series of blog posts by James Lee.

The next speaker was Yuval Filums, who spoke about the Erdős-Ko-Rado (EKR) theorem, stability results, and association schemes. The classical EKR theorem from extremal combinatorics says the following: for all $n \ge 2k$, if $\mathcal{F} \subseteq {{[n]}\choose k}$ is an intersecting family then $|\mathcal{F}| \le {{n-1}\choose {k-1}}$. (A family $\mathcal{F}$ is *intersecting *if $A \cap B \ne \emptyset$ for all $A,B \in \mathcal{F}$.) The extremal family is perhaps the first that comes to mind: take an arbitrary element in the universe $[n]$, say $1$, and let $\mathcal{F}$ be the family of all ${{n-1}\choose {k-1}}$ subsets that include $1$; in fact, these are the only families attaining the bound of ${{n-1}\choose {k-1}}$.

Let’s call such a family $\mathcal{F}$ a *star*. It is natural to wonder if a “stability version” of the EKR theorem is true: if $|\mathcal{F}| \ge (1-\epsilon){{n-1}\choose {k-1}}$, must there exist some star $\mathcal{S}$ such that $|\mathcal{F} \triangle \mathcal{S}| \le \epsilon |\mathcal{F}|$? There are by now several different proofs of the EKR theorem; however many of these, like the original shifting-based proof EKR, do not appear to be amendable to obtaining stability results. An analytic approach introduced by Ehud Friedgut has the key advantage that it automatically yields stability, showing that families of cardinality close to the extremal bound of ${{n-1}\choose {k-1}}$ must in fact themselves be close to a star. Roughly speaking, Ehud accomplishes this by considering the Fourier expansion of $\mathbf{1}_{\mathcal{F}}: {{[n]}\choose k} \to\{0,1\}$, and showing that its Fourier mass is concentrated on the first two levels. This brings to mind the Friedgut-Kalai-Naor theorem, which says the Boolean functions $f:\{0,1\}^n\to\{0,1\}$ with most of their Fourier mass concentrated on the first two levels are close to dictators (or more precisely, $1$-juntas) — we note that the indicator of an extremal family is in fact a dictator, since $\mathbf{1}_{\mathcal{F}}(S) = 1$ iff the distinguished element ($1$ in the case of our example above) is in $S$. Of course to carry out this proof strategy one would need an FKN-type theorem for Boolean(-valued) functions ${{[n]}\choose k}\to\{0,1\}$ over “slices of the hypercube”; in other words, functions over the Johnson scheme. This falls into a line of work extending classical Fourier-analytic theorems concerning Boolean-valued functions over the discrete cube $\{-1,1\}^n$ to non-independent domains. Yuval ended his talk by with a wish-list of six such theorems he would like to see extended to various association schemes: the KKL theorem, Friedgut’s junta theorem, the FKN theorem, the Kindler-Safra theorem and the closely-related Bourgain’s theorem, and finally the invariance principle. Analogues of the first two for the Johnson scheme have been established by O’Donnell and Wimmer and Wimmer respectively.

For more on related spectral techniques in extremal combinatorics, check out Yuval’s thesis.

The final speaker of the day was Raghu Meka, who presented a PTAS for computing the supremum of Gaussian processes (answering an open problem of Lee and Ding). More precisely, he constructs a deterministic algorithm which, given a finite set of vectors $T \subseteq \mathbb{R}^d$ and $\epsilon > 0$, computes a $(1+\epsilon)$-factor approximation to the quantity $\mathbf{E}_{\mathbf{X} \leftarrow \mathcal{N}^d}[\sup_{t \in T} |\langle t, \mathbf{X}\rangle|]$ in time $d^{O(1)}\cdot |T|^{O_\epsilon(1)}$ (here $\mathcal{N}$ is the standard Gaussian). Gaussian processes play a central role in the study of stochastic processes, functional analysis, convex geometry, and machine learning (among others); we elaborate on one specific application which is of particular relevance to Raghu’s result.

Given a finite connected graph $G = (V,E)$, we may associate with it the following fundamental parameter: the *cover time* of $G=(V,E)$, denoted $\mathrm{cov}(G)$, is the maximum over all vertices $v \in V$, of the expected time it takes for a simple random walk starting at $v$ to visit all every other vertex of $G$. (A simple random walk is one which, at every step, moves to a uniformly random neighbor.) Aldous and Fill asked the natural question: “Can we approximate $\mathrm{cov}(G)$ deterministically?” (Note that there is a simple *randomized* algorithm: simulate a random walk a few of times, and output the median time required to visit all vertices.) The first result in this direction was due to Kahn, Kim, Lovász, and Vu, who gave a $O((\log\log n)^2)$-factor approximation algorithm; this was followed by the work of Feige and Zeitouni who gave an FPTAS for trees. In recent exciting work Ding, Lee, and Peres significantly improved this state of affairs with a deterministic polynomial-time $O(1)$-factor approximation algorithm. They accomplish this by bridging a connection between the cover time of $G$ and an associated Gaussian process on $G$, known as the Gaussian free field on $G$; more precisely, they show that the supremum of the Gaussian free field on $G$ characterizes, up to universal constant factors, the cover time of $G$. With this connection in hand they then give a dynamic programming-based algorithm for computing $\gamma_2(\cdot,\|\cdot \|_2)$, which by Talagrand’s characterization yields a $O(1)$-factor approximation to the supremum of Gaussian processes.

Motivated by this application, Ding and Lee asked if there is a PTAS for computing the supremum of Gaussian processes, and Raghu’s work answers this question in the affirmative (with a “direct” algorithm, one that does not incur the $O(1)$-factor loss that is likely to be inherent in any approach that goes through Talagrand’s characterization). Raghu’s algorithm can be viewed as proceeding in two modular steps, the first of which is dimension reduction. Using the standard Johnson-Lindenstrauss (JL) lemma (or rather, a derandomized version of the lemma), we first project $T$ onto a space of dimension roughly $k=O(\log |T|)$. In the analysis of this first step, the JL lemma ensures that the distances between pairs of points are preserved, while Slepian’s lemma allows us to relate the supremum of the original Gaussian process to the supremum of the projected Gaussian process. As one would expect, the next step involves solving the problem in the projected space (with an algorithm that is allowed run in time exponential in the dimension $k$); we accomplish this via the construction of explicit $\epsilon$-nets for convex functions in Gaussian space. A naive rounding-based approach yields a net of size $k^{O(k)}$, and Dadush and Vempala give a construction achieving size $(\log k)^{O(k)}$. Using a classical comparison lemma called Kanter’s lemma to lift a simple net for the univariate setting to the multivariate setting, Raghu constructs an optimal $\epsilon$-net of size $(1/\epsilon)^{O(k)}$.

Raghu concluded with two open problems. The first is a conjecture of Ding, Lee, and Peres, which states that the supremum of the Gaussian free field on a graph and characterizes its cover time *asymptotically*; if true, this coupled with Raghu’s result would give us a PTAS for computing the cover time of all graphs. The second is whether an analogue of Slepian’s lemma holds for graphs: Given a graph $G = (V,E)$, for any $u,v\in V$ we write $\mathrm{hit}_G(u,v)$ to denote the expected time it takes for a random walk starting at $u$ to hit $v$. Let $G, H$ be two graphs on the set vertex set $V$, and suppose $\mathrm{hit}_G(u,v)\le \mathrm{hit}_H(u,v)$ for all $u,v\in V$. Does this imply that $\mathrm{cov}(G) \le \mathrm{cov}(H)$?

For more details, here’s a video of Raghu’s talk at FOCS 2012; see also this blog post by James Lee for more on the nice connection to cover times and the Gaussian free field.

And finally, here are a few pictures from the first day!

]]>I’m also happy to report that we will have guest blogging by symposium attendee Li-Yang Tan. This year’s talk lineup looks quite diverse, with topics ranging from the Bernoulli ~~Conjecture~~ Theorem to Fourier analysis on the symmetric group, to additive number theory. Stay tuned!

Suppose $f : {\mathbb R}^n \to [-1,1]$ is a “nice” function (smooth, say, with all derivatives bounded) having $\mathop{\bf E}[f] = 0$. You’re encouraged to think of $f$ as (a smooth approximation to) the indicator $\pm 1_A$ of some set $A \subseteq {\mathbb R}^n$ of Gaussian volume $\mathrm{vol}_\gamma(A) = \tfrac{1}{2}$. Now consider the Boolean function $g : \{-1,1\}^{nM} \to \{0,1\}$ defined by \[ g = f \circ \text{BitsToGaussians}_{M}^{n}. \] Using the multidimensional Central Limit Theorem, for any $\rho \in (0,1)$ we should have \[ \mathbf{Stab}_\rho[g] \xrightarrow{M \to \infty} \mathbf{Stab}_\rho[f], \] where on the left we have Boolean noise stability and on the right we have Gaussian noise stability. Using $\mathop{\bf E}[g] \to \mathop{\bf E}[f] = 0$, the Majority Is Stablest Theorem would tell us that \[ \mathbf{Stab}_\rho[g] \leq 1 – \tfrac{2}{\pi}\arccos \rho + o_\epsilon(1), \] where $\epsilon = \mathbf{MaxInf}[g]$. But $\epsilon = \epsilon(M) \to 0$ as $M \to \infty$. Thus we should simply have the Gaussian noise stability bound \begin{equation} \label{eqn:vol12-Borell} \mathbf{Stab}_\rho[f] \leq 1 – \tfrac{2}{\pi}\arccos \rho. \end{equation} (By a standard approximation argument this extends from “nice” $f : {\mathbb R}^n \to [-1,1]$ with $\mathop{\bf E}[f] = 0$ to any measurable $f : {\mathbb R}^n \to [-1,1]$ with $\mathop{\bf E}[f] = 0$.) Note that the upper bound \eqref{eqn:vol12-Borell} is achieved when $f$ is the $\pm 1$-indicator of any halfspace through the origin; see Corollary 20. (Note also that if $n = 1$ and $f = \mathrm{sgn}$, then the function $g$ is simply $\mathrm{Maj}_M$.)

The “isoperimetric inequality” \eqref{eqn:vol12-Borell} is indeed true, and is a special case of a theorem first proved by Borell **[Bor85]**.

Borell’s Isoperimetric Theorem (volume-${\frac{\mathbf 1}{\mathbf 2}}$ case)Fix $\rho \in (0,1)$. Then for any $f \in L^2({\mathbb R}^n,\gamma)$ with range $[-1,1]$ and $\mathop{\bf E}[f] = 0$, \[ \mathbf{Stab}_{\rho}[f] \leq 1 – \tfrac{2}{\pi}\arccos \rho, \] with equality if $f$ is the $\pm 1$-indicator of any halfspace through the origin.

Remark 41In Borell’s Isoperimetric Theorem, nothing is lost by restricting attention to functions with range $\{-1,1\}$, i.e., by considering only $f = \pm 1_A$ for $A \subseteq {\mathbb R}^n$. This is because the case of range $[-1,1]$ follows straightforwardly from the case of range $\{-1,1\}$, essentially because $\sqrt{\mathbf{Stab}_\rho[f]} = \|\mathrm{U}_{\sqrt{\rho}} f\|_2$ is a convex functional of $f$; see the exercises.

More generally, Borell showed that for any fixed volume $\alpha \in [0,1]$, the maximum Gaussian noise stability of a set of volume $\alpha$ is no greater than that of a halfspace of volume $\alpha$. We state here the more general theorem, using range $\{0,1\}$ rather than range $\{-1,1\}$ for future notational convenience (and with Remark 41 applying equally):

Borell’s Isoperimetric TheoremFix $\rho \in (0,1)$. Then for any $f \in L^2({\mathbb R}^n, \gamma)$ with range $[0,1]$ and $\mathop{\bf E}[f] = \alpha$, \[ \mathbf{Stab}_{\rho}[f] \leq \Lambda_\rho(\alpha). \] Here $\Lambda_\rho(\alpha)$ is theGaussian quadrant probabilityfunction, discussed in Exercise 5.33, and equal to $\mathbf{Stab}_\rho[1_H]$ for any (every) halfspace $H \subseteq {\mathbb R}^n$ having Gaussian volume $\mathrm{vol}_\gamma(H) = \alpha$.

We’ve seen that the volume-$\tfrac{1}{2}$ case of Borell’s Isoperimetric Theorem is a special case of the Majority Is Stablest Theorem, and similarly, the general version of Borell’s theorem is a special case of the General-Volume Majority Is Stablest Theorem mentioned at the beginning of the chapter. As a consequence, proving Borell’s Isoperimetric Theorem is a *prerequisite* for proving the General-Volume Majority Is Stablest Theorem. In fact, our proof in Section 7 of the latter will be a reduction to the former.

The proof of Borell’s Isoperimetric Theorem itself is not too hard; one of five known proofs, the one due to Mossel and Neeman **[MN12]**, is outlined in the exercises. If our main goal is just to prove the basic Majority Is Stablest Theorem, then we only need the volume-$\tfrac{1}{2}$ case of Borell’s Isoperimetric Inequality. Luckily, there’s a very simple proof of this volume-$\tfrac{1}{2}$ case for “many” values of $\rho$, as we will now explain.

Let’s first slightly rephrase the statement of Borell’s Isoperimetric Theorem in the volume-$\tfrac{1}{2}$ case. By Remark 41 we can restrict attention to sets; then the theorem asserts that among sets of Gaussian volume $\tfrac{1}{2}$, halfspaces through the origin have maximal noise stability, for each positive value of $\rho$. Equivalently, halfspaces through the origin have minimal noise *sensitivity* under correlation $\cos \theta$, for $\theta \in (0,\frac{\pi}{2})$. The formula for this minimal noise sensitivity was given as inequality (2) in our proof of Sheppard’s Formula. Thus we have:

Equivalent statement of the volume-${\frac{\mathbf 1}{\mathbf 2}}$ Borell Isoperimetric TheoremFix $\theta \in (0,\frac{\pi}{2})$. Then for any $A \subset {\mathbb R}^n$ with $\mathrm{vol}_\gamma(A) = \tfrac{1}{2}$, \[ \mathop{\bf Pr}_{\substack{(\boldsymbol{z}, \boldsymbol{z}') \\\text{$\cos \theta$-correlated}}}[1_A(\boldsymbol{z}) \neq 1_A(\boldsymbol{z}')] \geq \tfrac{\theta}{\pi}, \] with equality if $A$ is any halfspace through the origin.

In the remainder of this section we’ll show how to prove this formulation of the theorem whenever $\theta = \frac{\pi}{2\ell}$, where $\ell$ is a positive integer. This gives the volume-$\tfrac{1}{2}$ case of Borell’s Isoperimetric Inequality for all $\rho$ of the form $\arccos\frac{\pi}{2\ell}$, $\ell \in {\mathbb N}^+$; in particular, for an infinite sequence of $\rho$’s tending to $1$. To prove the theorem for these values of $\theta$, it’s convenient to introduce notation for the following noise sensitivity variant:

Definition 42For $A \subseteq {\mathbb R}^n$ and $\delta \in {\mathbb R}$ (usually $\delta \in [0,\pi]$) we write $\mathbf{RS}_A(\delta)$ for the defined by \[ \mathbf{RS}_A(\delta) = \mathop{\bf Pr}_{\substack{(\boldsymbol{z}, \boldsymbol{z}') \\ \cos \delta\text{-correlated}}}[1_A(\boldsymbol{z}) \neq 1_A(\boldsymbol{z}')]. \]

The key property of this definition is the following:

Theorem 43For any $A \subseteq {\mathbb R}^n$ the function $\mathbf{RS}_A(\delta)$ is subadditive; i.e., \[ \mathbf{RS}_A(\delta_1 + \cdots + \delta_\ell) \leq \mathbf{RS}_A(\delta_1) + \cdots + \mathbf{RS}_A(\delta_\ell). \] In particular, for any $\delta \in {\mathbb R}$ and $\ell \in {\mathbb N}^+$, \[ \mathbf{RS}_A(\delta) \leq \ell \cdot \mathbf{RS}_A(\delta/\ell). \]

*Proof:* Let $\boldsymbol{g}, \boldsymbol{g}’ \sim \mathrm{N}(0,1)^n$ be drawn independently and define $\boldsymbol{z}(\theta) = (\cos\theta)\boldsymbol{g} + (\sin\theta)\boldsymbol{g}’$. Geometrically, as $\theta$ goes from $0$ to $\frac{\pi}{2}$ the random vectors $\boldsymbol{z}(\theta)$ trace from $\boldsymbol{g}$ to $\boldsymbol{g}’$ along the origin-centered ellipse passing through these two points. The random vectors $\boldsymbol{z}(\theta)$ are jointly normal, with each individually distributed as $\mathrm{N}(0,1)^n$. Further, for each fixed $\theta, \theta’ \in {\mathbb R}$ the pair $(\boldsymbol{z}(\theta), \boldsymbol{z}(\theta’))$ constitute $\rho$-correlated Gaussians with \[ \rho = \cos\theta \cos\theta' + \sin\theta \sin\theta' = \cos(\theta' - \theta). \] Now consider the sequence $\theta_0, \dots, \theta_\ell$ defined by the partial sums of the $\delta_i$’s, i.e., $\theta_j = \sum_{i = 1}^j \delta_i$. We get that $\boldsymbol{z}(\theta_0)$ and $\boldsymbol{z}(\theta_\ell)$ are $\cos(\delta_1 + \cdots + \delta_\ell)$-correlated, and that $\boldsymbol{z}(\theta_{j-1})$ and $\boldsymbol{z}(\theta_{j})$ are $\cos\delta_j$-correlated for each $j \in [\ell]$. Thus \begin{align} \mathbf{RS}_A(\delta_1 + \cdots + \delta_\ell) &= \mathop{\bf Pr}[1_A(\boldsymbol{z}(\theta_0)) \neq 1_A(\boldsymbol{z}(\theta_\ell))] \nonumber\\ &\leq \sum_{j=1}^\ell \mathop{\bf Pr}[1_A(\boldsymbol{z}(\theta_{j})) \neq 1_A(\boldsymbol{z}(\theta_{j-1}))] = \sum_{j=1}^\ell \mathbf{RS}_A(\delta_j), \label{eqn:ellipse-union-bound} \end{align} where the inequality is the union bound. $\Box$

With this subadditivity result in hand, it’s indeed easy to prove the equivalent statement of the volume-$\tfrac{1}{2}$ Borell Isoperimetric Theorem for any $\theta \in \{\frac{\pi}{4}, \frac{\pi}{6}, \frac{\pi}{8}, \frac{\pi}{10}, \dots\}$. As we’ll see in Section 7, the case of $\theta = \frac{\pi}{4}$ can be used to give an excellent UG-hardness result for the Max-Cut CSP.

Corollary 44The equivalent statement of the volume-$\tfrac{1}{2}$ Borell Isoperimetric Theorem holds whenever $\theta = \frac{\pi}{2\ell}$ for $\ell \in {\mathbb N}^+$.

*Proof:* The exact statement we need to show is $\mathbf{RS}_A(\frac{\pi}{2\ell}) \geq \frac{1}{2\ell}$. This follows by taking $\delta = \frac{\pi}{2}$ in Theorem 43 because \[ \mathbf{RS}_A(\tfrac{\pi}{2}) = \mathop{\bf Pr}_{\substack{(\boldsymbol{z}, \boldsymbol{z}') \\ 0\text{-correlated}}}[1_A(\boldsymbol{z}) \neq 1_A(\boldsymbol{z}')] = \tfrac12, \] using that $0$-correlated Gaussians are independent and that $\mathrm{vol}_\gamma(A) = \frac12$. $\Box$

Remark 45Although Sheppard’s Formula already tells us that equality holds in this corollary when $A$ is a halfspace through the origin, it’s also not hard to derive this directly from the proof. The only inequality in the proof, \eqref{eqn:ellipse-union-bound}, is an equality when $A$ is a halfspace through the origin, because the elliptical arc can only cross such a halfspace $0$ or $1$ times.

]]>

Remark 46Suppose that $A \subseteq {\mathbb R}^n$ not only has volume $\tfrac{1}{2}$, it has the property that $x \in A$ if and only if $-x \not \in A$; in other words, the $\pm 1$-indicator of $A$ is an odd function. (In both statements, we allow a set of measure $0$ to be ignored.) An example set with this property is any halfspace through the origin. Then $\mathbf{RS}_A(\pi) = 1$, and hence we can establish Corollary 44 more generally for any $\theta \in \{\frac{\pi}{1}, \frac{\pi}{2}, \frac{\pi}{3}, \frac{\pi}{4}, \frac{\pi}{5}, \dots\}$ by taking $\delta = \pi$ in the proof.

To do this we’ll proceed as in Chapter 8.1, looking for a complete orthonormal “Fourier basis” for $L^2({\mathbb R},\gamma)$, which we can extend to $L^2({\mathbb R}^n,\gamma)$ by taking products. It’s natural to start with polynomials; by Theorem 22 we know that the collection $(\phi_j)_{j \in {\mathbb N}}$, $\phi_j(z) = z^j$ is a complete basis for $L^2({\mathbb R},\gamma)$. To get an orthonormal (“Fourier”) basis we can simply perform the Gram–Schmidt process. Calling the resulting basis $(h_j)_{j \in {\mathbb N}}$ (with “$h$” standing for “Hermite”), we get \begin{equation} \label{eqn:hermite-examples} h_0(z) = 1, \quad h_1(z) = z, \quad h_2(z) = \frac{z^2-1}{\sqrt{2}}, \quad h_3(z) = \frac{z^3 – 3z}{\sqrt{6}}, \quad \dots \end{equation} Here, e.g., we obtained $h_3(z)$ in two steps. First, we made $\phi_3(z) = z^3$ orthogonal to $h_0, \dots, h_2$ as \[ z^3 - \langle\boldsymbol{z}^3, 1 \rangle \cdot 1 - \langle\boldsymbol{z}^3, \boldsymbol{z} \rangle \cdot z - \langle\boldsymbol{z}^3, \tfrac{\boldsymbol{z}^2 - 1}{\sqrt{2}} \rangle \cdot \tfrac{z^2-1}{\sqrt{2}} = z^3 - 3z, \] where $\boldsymbol{z} \sim \mathrm{N}(0,1)$ and we used the fact that $\boldsymbol{z}^3$ and $\boldsymbol{z}^3 \cdot \tfrac{\boldsymbol{z}^2 – 1}{\sqrt{2}}$ are odd functions and hence have Gaussian expectation $0$. Then we defined $h_3(z) = \frac{z^3 – 3z}{\sqrt{6}}$ after determining that $\mathop{\bf E}[(\boldsymbol{z}^3-3\boldsymbol{z})^2] = 6$.

Let’s develop a more explicit definition of these Hermite polynomials. The computations involved in the Gram–Schmidt process require knowledge of the moments of a Gaussian random variable $\boldsymbol{z} \sim \mathrm{N}(0,1)$. It’s most convenient to understand these moments through the moment generating function of $\boldsymbol{z}$, namely \begin{equation} \label{eqn:g-mgf} \mathop{\bf E}[\exp(t\boldsymbol{z})] = \tfrac{1}{\sqrt{2\pi}} \int_{\mathbb R} e^{tz} e^{-\frac12 z^2}\,dz = e^{\frac12 t^2} \tfrac{1}{\sqrt{2\pi}} \int_{\mathbb R} e^{-\frac12 (z-t)^2}\,dz = \exp(\tfrac{1}{2} t^2). \end{equation} In light of our interest in the $\mathrm{U}_\rho$ operators, and the fact that orthonormality involves pairs of basis functions, we’ll in fact study the moment generating function of a pair $(\boldsymbol{z}, \boldsymbol{z}’)$ of $\rho$-correlated standard Gaussians. To compute it, assume $(\boldsymbol{z},\boldsymbol{z}’)$ are generated as in Fact 7 with $\vec{u}, \vec{v}$ unit vectors in ${\mathbb R}^2$. Then \begin{align*} \mathop{\bf E}_{\substack{(\boldsymbol{z},\boldsymbol{z}’) \\ \text{$\rho$-correlated}}}[\exp(s\boldsymbol{z} + t\boldsymbol{z}')] &= \mathop{\bf E}_{\substack{\boldsymbol{g}_1, \boldsymbol{g}_2 \sim \mathrm{N}(0,1) \\ \text{independent}}}[\exp(s(u_1\boldsymbol{g}_1 + u_2 \boldsymbol{g}_2) + t(v_1\boldsymbol{g}_1 + v_2 \boldsymbol{g}_2))]\\ &= \mathop{\bf E}_{\boldsymbol{g}_1 \sim \mathrm{N}(0,1)}[\exp((su_1+tv_1)\boldsymbol{g}_1)]\mathop{\bf E}_{\boldsymbol{g}_2 \sim \mathrm{N}(0,1)}[\exp((su_2+tv_2)\boldsymbol{g}_2)]\\ &= \exp(\tfrac{1}{2} (su_1+tv_1)^2)\exp(\tfrac{1}{2} (su_2+tv_2)^2)\\ &= \exp(\tfrac{1}{2} \|\vec{u}\|_2^2 s^2 + \langle \vec{u},\vec{v} \rangle st + \tfrac{1}{2} \|\vec{v}\|_2^2 t^2) \\ &= \exp(\tfrac{1}{2} (s^2 + 2\rho st + t^2)), \end{align*} where the third equality used \eqref{eqn:g-mgf}. Dividing by $\exp(\tfrac{1}{2}(s^2+t^2))$ it follows that \begin{equation} \label{eqn:hermite-dev} \mathop{\bf E}_{\substack{(\boldsymbol{z},\boldsymbol{z}’) \\ \text{$\rho$-correlated}}}[\exp(s\boldsymbol{z} - \tfrac{1}{2} s^2)\exp(t\boldsymbol{z}' - \tfrac{1}{2} t^2)] = \exp(\rho st) = \sum_{j=0}^\infty \frac{\rho^j}{j!} s^jt^j. \end{equation} Inside the expectation above we essentially have the expression $\exp(tz-\tfrac{1}{2} t^2)$ appearing twice. It’s easy to see that if we take the power series in $t$ for this expression, the coefficient on $t^j$ will be a polynomial in $z$ with leading term $\frac{1}{j!}z^j$. Let’s therefore write \begin{equation} \label{eqn:prob-herm} \exp(tz-\tfrac{1}{2} t^2) = \sum_{j=0}^\infty \frac{1}{j!} H_j(z)t^j, \end{equation} where $H_j(z)$ is a monic polynomial of degree $j$. Now substituting this into \eqref{eqn:hermite-dev} yields \[ \sum_{j,k=0}^\infty \frac{1}{j!k!}\mathop{\bf E}_{\substack{(\boldsymbol{z},\boldsymbol{z}') \\ \text{$\rho$-correlated}}}[H_j(\boldsymbol{z})H_k(\boldsymbol{z}')]s^jt^k = \sum_{j=0}^\infty \frac{\rho^j}{j!} s^jt^j. \] Equating coefficients, it follows that we must have \[ \mathop{\bf E}_{\substack{(\boldsymbol{z},\boldsymbol{z}') \\ \text{$\rho$-correlated}}}[H_j(\boldsymbol{z})H_k(\boldsymbol{z}')] = \begin{cases} j!\rho^j & \text{if $j = k$,}\\ 0 & \text{if $j \neq k$.} \end{cases} \] In particular (taking $\rho = 1$), \begin{equation} \label{eqn:hermite-done2} \langle H_j, H_k \rangle = \begin{cases} j! & \text{if $j = k$,}\\ 0 & \text{if $j \neq k$}; \end{cases} \end{equation} i.e., the polynomials $(H_j)_{j \in {\mathbb N}}$ are orthogonal. Furthermore, since $H_j$ is monic and of degree $j$, it follows that the $H_j$’s are precisely the polynomials that arise in the Gram–Schmidt orthogonalization of $\{1, z, z^2, \dots \}$. We also see from \eqref{eqn:hermite-done2} that the orthonormalized polynomials $(h_j)_{j \in {\mathbb N}}$ are obtained by setting $h_j = \frac{1}{\sqrt{j!}} H_j$.

Let’s summarize and introduce the terminology for what we’ve deduced.

Definition 29Theprobabilists’ Hermite polynomials$(H_j)_{j \in {\mathbb N}}$ are the univariate polynomials defined by the identity \eqref{eqn:prob-herm}. An equivalent definition (exercise) is \begin{equation} \label{eqn:hermite-alternate} H_j(z) = \frac{(-1)^j}{\varphi(z)} \cdot \frac{d^j}{dz^j} \varphi(z). \end{equation} Thenormalized Hermite polynomials$(h_j)_{j \in {\mathbb N}}$ are defined by $h_j = \frac{1}{\sqrt{j!}} H_j$; the first four are given explicitly in \eqref{eqn:hermite-examples}. For brevity we’ll simply refer to the $h_j$’s as the “Hermite polynomials”, though this is not standard terminology.

Proposition 30The Hermite polynomials $(h_j)_{j \in {\mathbb N}}$ form a complete orthonormal basis for $L^2({\mathbb R},\gamma)$. They are also a “Fourier basis”, since $h_0 = 1$.

Proposition 31For any $\rho \in [-1,1]$ we have \[ \mathop{\bf E}_{\substack{(\boldsymbol{z},\boldsymbol{z}') \\ \text{$\rho$-correlated}}}[h_j(\boldsymbol{z})h_k(\boldsymbol{z}')] = \langle h_j, \mathrm{U}_\rho h_k \rangle = \langle \mathrm{U}_\rho h_j, h_k \rangle = \begin{cases} \rho^j & \text{if $j = k$,}\\ 0 & \text{if $j \neq k$}. \end{cases} \]

From this “Fourier basis” for $L^2({\mathbb R},\gamma)$ we can construct a “Fourier basis” for $L^2({\mathbb R}^n,\gamma)$ just by taking products, as in Proposition 8.13.

Definition 32For a multi-index $\alpha \in {\mathbb N}^n$ we define the(normalized multivariate) Hermite polynomial$h_\alpha : {\mathbb R}^n \to {\mathbb R}$ by \[ h_\alpha(z) = \prod_{j = 1}^n h_{\alpha_j}(z_j). \] Note that the total degree of $h_\alpha$ is $|\alpha| = \sum_j \alpha_j$. We also identify a subset $S \subseteq [n]$ with its indicator $\alpha$ defined by $\alpha_j = 1_{j \in S}$; thus $h_S(z)$ denotes $z^S = \prod_{j \in S} z_j$.

Proposition 33The Hermite polynomials $(h_\alpha)_{\alpha \in {\mathbb N}^n}$ form a complete orthonormal (Fourier) basis for $L^2({\mathbb R}^n,\gamma)$. Further, for any $\rho \in [-1,1]$ we have \[ \mathop{\bf E}_{\substack{(\boldsymbol{z},\boldsymbol{z}') \\ \text{$\rho$-correlated}}}[h_\alpha(\boldsymbol{z})h_\beta(\boldsymbol{z}')] = \langle h_\alpha, \mathrm{U}_\rho h_\beta \rangle = \langle \mathrm{U}_\rho h_\alpha, h_\beta \rangle = \begin{cases} \rho^{|\alpha|} & \text{if $\alpha = \beta$,}\\ 0 & \text{if $\alpha \neq \beta$}. \end{cases} \]

We can now define the “Hermite expansion” of Gaussian functions.

Definition 34Every $f \in L^2({\mathbb R}^n,\gamma)$ is uniquely expressible as \[ f = \sum_{\alpha \in {\mathbb N}^n} \widehat{f}(\alpha) h_\alpha, \] where the real numbers $\widehat{f}(\alpha)$ are called theHermite coefficients of $f$and the convergence is in $L^2({\mathbb R}^n,\gamma)$; i.e., \[ \left\|f - \sum_{|\alpha| \leq k} \widehat{f}(\alpha) h_\alpha\right\|_2 \to 0 \quad \text{as $k \to \infty$}. \] This is called theHermite expansionof $f$.

Remark 35If $f : {\mathbb R}^n \to {\mathbb R}$ is a multilinear polynomial, then it “is its own Hermite expansion”: \[ f(z) = \sum_{S \subseteq [n]} \widehat{f}(S) z^S = \sum_{S \subseteq [n]} \widehat{f}(S) h_S(z) = \sum_{\alpha_1, \dots, \alpha_n \leq 1} \widehat{f}(\alpha) h_\alpha(z). \]

Proposition 36The Hermite coefficients of $f \in L^2({\mathbb R}^n,\gamma)$ satisfy the formula \[ \widehat{f}(\alpha) = \langle f, h_\alpha \rangle, \] and for $f, g \in L^2({\mathbb R}^n,\gamma)$ we have the Plancherel formula \[ \langle f, g \rangle = \sum_{\alpha \in {\mathbb N}^n} \widehat{f}(\alpha)\widehat{g}(\alpha). \]

From this we may deduce:

Proposition 37For $f \in L^2({\mathbb R}^n,\gamma)$, the function $\mathrm{U}_\rho f$ has Hermite expansion \[ \mathrm{U}_\rho f = \sum_{\alpha \in {\mathbb N}^n} \rho^{|\alpha|} \widehat{f}(\alpha) h_\alpha \] and hence \[ \mathbf{Stab}_\rho[f] = \sum_{\alpha \in {\mathbb N}^n} \rho^{|\alpha|} \widehat{f}(\alpha)^2. \]

*Proof:* Both statements follow from Proposition 36, with the first using \[ \widehat{\mathrm{U}_\rho f}(\alpha) = \langle \mathrm{U}_\rho f, h_\alpha \rangle = \langle \mathop{{\textstyle \sum}}_{\beta} \mathrm{U}_\rho \widehat{f}(\beta) h_\beta, h_\alpha \rangle = \mathop{{\textstyle \sum}}_\beta \widehat{f}(\beta) \langle \mathrm{U}_\rho h_\beta, h_\alpha \rangle = \rho^{|\alpha|} \widehat{f}(\alpha); \] we also used Proposition 33 and the fact that $\mathrm{U}_\rho$ is a contraction in $L^2({\mathbb R}^n,\gamma)$. $\Box$

Remark 38When $f : {\mathbb R}^n \to {\mathbb R}$ is a multilinear polynomial, this formula for $\mathrm{U}_\rho f$ agrees with the formula $f(\rho z)$ given in Fact 13.

Remark 39In a sense it’s not very important to know the explicit formulas for the Hermite polynomials, \eqref{eqn:hermite-examples}, \eqref{eqn:prob-herm}; it’s usually enough just to know that the formula for $\mathrm{U}_\rho f$ from Proposition 37 holds.

Finally, by differentiating the formula in Proposition 37 at $\rho = 1$ we deduce the following formula for the Ornstein–Uhlenbeck operator (explaining why it’s sometimes called the number operator):

Proposition 40For $f \in L^2({\mathbb R}^n, \gamma)$ in the domain of $\mathrm{L}$ we have \[ \mathrm{L} f = \sum_{\alpha \in {\mathbb N}^n} |\alpha| \widehat{f}(\alpha) h_\alpha. \]

(Actually, in the exercises you are asked to formally justify this and the fact that $f$ is in the domain of $\mathrm{L}$ if and only if $\sum_{\alpha} |\alpha|^2 \widehat{f}(\alpha)^2 < \infty$.) For additional facts about Hermite polynomials, see the exercises.

]]>Notation 1 Throughout this chapter we write $\varphi$ for the pdf of a standard Gaussian random variable, $\varphi(z) = \frac{1}{\sqrt{2\pi}}\exp(-\tfrac{1}{2} z^2)$. We also write $\Phi$ for its cdf, and $\overline{\Phi}$ for the complementary cdf $\overline{\Phi}(t) = 1 – \Phi(t) = \Phi(-t)$. We write $\boldsymbol{z} \sim [...]]]>

Notation 1Throughout this chapter we write $\varphi$ for the pdf of a standard Gaussian random variable, $\varphi(z) = \frac{1}{\sqrt{2\pi}}\exp(-\tfrac{1}{2} z^2)$. We also write $\Phi$ for its cdf, and $\overline{\Phi}$ for the complementary cdf $\overline{\Phi}(t) = 1 – \Phi(t) = \Phi(-t)$. We write $\boldsymbol{z} \sim \mathrm{N}(0,1)^n$ to denote that $\boldsymbol{z} = (\boldsymbol{z}_1, \dots, \boldsymbol{z}_n)$ is a random vector in ${\mathbb R}^n$ whose components $\boldsymbol{z}_i$ are independent Gaussians. Perhaps the most important property of this distribution is that it’s rotationally symmetric; this follows because the pdf at $z$ is $\frac{1}{(2\pi)^{n/2}}\exp(-\tfrac{1}{2}(z_1^2 + \cdots + z_n^2))$, which depends only on the length $\|z\|_2^2$ of $z$.

Definition 2For $n \in {\mathbb N}^+$ and $1 \leq p \leq \infty$ we write $L^p({\mathbb R}^n,\gamma)$ for the space of Borel functions $f : {\mathbb R}^n \to {\mathbb R}$ that have finite $p$th moment $\|f\|_p^p$ under the Gaussian measure (the “$\gamma$” stands for Gaussian). Here for a function $f$ on Gaussian space we use the notation \[ \|f\|_p = \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[|f(\boldsymbol{z})|^p]^{1/p}. \] All functions $f : {\mathbb R}^n \to {\mathbb R}$ and sets $A \subseteq {\mathbb R}^n$ are henceforth assumed to be Borel without further mention.

Notation 3When it’s clear from context that $f$ is a function on Gaussian space we’ll use shorthand notation like $\mathop{\bf E}[f] = \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[f(\boldsymbol{z})]$. If $f = 1_A$ is the $0$-$1$ indicator of a subset $A \subseteq {\mathbb R}^n$ we’ll also write \[ \mathrm{vol}_\gamma(A) = \mathop{\bf E}[1_A] = \mathop{\bf Pr}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[\boldsymbol{z} \in A] \] for theGaussian volumeof $A$.

Notation 4For $f, g \in L^2({\mathbb R}^n,\gamma)$ we use the inner product notation $\langle f, g \rangle = \mathop{\bf E}[fg]$, under which $L^2({\mathbb R}^n,\gamma)$ is a separable Hilbert space.

If you’re only interested in Boolean functions $f : \{-1,1\}^n \to \{-1,1\}$ you might wonder why it’s necessary to study Gaussian space. As discussed at the beginning of the chapter, the reason is that functions on Gaussian space are *special cases* of Boolean functions. Conversely, even if you’re only interested in studying functions of Gaussian random variables, sometimes the easiest proof technique involves “simulating” the Gaussians using sums of random bits. Let’s discuss this in a little more detail. Recall that the Central Limit Theorem tells us that for ${\boldsymbol{x}} \sim \{-1,1\}^M$, the distribution of $\frac{1}{\sqrt{M}}({\boldsymbol{x}}_1 + \cdots + {\boldsymbol{x}}_M)$ approaches that of a standard Gaussian as $M \to \infty$. This is the sense in which a standard Gaussian random variable $\boldsymbol{z} \sim \mathrm{N}(0,1)$ can be “simulated” by random bits. If we want $d$ independent Gaussians we can simulate them by summing up $M$ independent $d$-dimensional vectors of random bits.

Definition 5The function $\text{BitsToGaussians}_{M}^{} : \{-1,1\}^{M} \to {\mathbb R}$ is defined by \[ \text{BitsToGaussians}_{M}^{}(x) = \tfrac{1}{\sqrt{M}}(x_1 + \cdots + x_M). \] More generally, the function $\text{BitsToGaussians}_{M}^{d} : \{-1,1\}^{dM} \to {\mathbb R}^d$ is defined on an input $x \in \{-1,1\}^{d \times M}$, thought of as a matrix of column vectors $\vec{x}_1, \dots, \vec{x}_M \in \{-1,1\}^d$, by \[ \text{BitsToGaussians}_{M}^{d}(x) = \tfrac{1}{\sqrt{M}}(\vec{x}_1 + \cdots + \vec{x}_M). \]

Although $M$ needs to be large for this simulation to be accurate, many of the results we’ve developed in the analysis of Boolean functions $f : \{-1,1\}^M \to {\mathbb R}$ are independent of $M$. A further key point is that this simulation preserves polynomial degree: if $p(\boldsymbol{z}_1, \dots, \boldsymbol{z}_d)$ is a degree-$k$ polynomial applied to $d$ independent standard Gaussians, the “simulated version” $p \circ\text{BitsToGaussians}_{M}^{d} : \{-1,1\}^{dM} \to {\mathbb R}$ is a degree-$k$ Boolean function. These facts allow us to transfer many results from the analysis of Boolean functions to the analysis of Gaussian functions. On the other hand, it also means that to fully understand Boolean functions, we need to understand the “special case” of functions on Gaussian space: a Boolean function may essentially be a function on Gaussian space “in disguise”. For example, as we saw in Chapter 5.3, there is a sense in which the majority function $\mathrm{Maj}_n$ “converges” as $n \to \infty$; what it’s converging to is the sign function on $1$-dimensional Gaussian space, $\mathrm{sgn} \in L^1({\mathbb R}, \gamma)$.

We’ll begin our study of Gaussian functions by developing the analogue of the most important operator on Boolean functions, namely the noise operator $\mathrm{T}_\rho$. Suppose we take a pair of $\rho$-correlated $M$-bit strings $({\boldsymbol{x}}, {\boldsymbol{x}}’)$ and use them to form approximate Gaussians, \[ \boldsymbol{y} = \text{BitsToGaussians}_{M}^{}({\boldsymbol{x}}), \qquad \boldsymbol{y}' = \text{BitsToGaussians}_{M}^{}({\boldsymbol{x}}'). \] For each $M$ it’s easy to compute that $\mathop{\bf E}[\boldsymbol{y}] = \mathop{\bf E}[\boldsymbol{y}'] = 0$, $\mathop{\bf Var}[\boldsymbol{y}] = \mathop{\bf Var}[\boldsymbol{y}'] = 1$, and $\mathop{\bf E}[\boldsymbol{y} \boldsymbol{y}'] = \rho$. As noted in Chapter 5.2, a multidimensional version of the Central Limit Theorem (see, e.g., Exercise 5.34 and another in this chapter) tells us that the joint distribution of $(\boldsymbol{y},\boldsymbol{y}’)$ converges to a pair of Gaussian random variables with the same properties. We call these $\rho$-correlated Gaussians.

Definition 6For $-1 \leq \rho \leq 1$, we say that the random variables $(\boldsymbol{z}, \boldsymbol{z}’)$ are$\rho$-correlated (standard) Gaussiansif they are jointly Gaussian and satisfy $\mathop{\bf E}[\boldsymbol{z}] = \mathop{\bf E}[\boldsymbol{z}'] = 0$, $\mathop{\bf Var}[\boldsymbol{z}] = \mathop{\bf Var}[\boldsymbol{z}'] = 1$, and $\mathop{\bf E}[\boldsymbol{z} \boldsymbol{z}'] = \rho$. In other words, if \[ (\boldsymbol{z}, \boldsymbol{z}') \sim \mathrm{N}\left(\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}\right). \] Note that the definition is symmetric in $\boldsymbol{z}$, $\boldsymbol{z}’$ and that each is individually distributed as $\mathrm{N}(0,1)$.

Fact 7An equivalent definition is to say that $\boldsymbol{z} = \langle \vec{u}, \vec{\boldsymbol{g}}\rangle$ and $\boldsymbol{z}’ = \langle \vec{v}, \vec{\boldsymbol{g}} \rangle$, where $\vec{\boldsymbol{g}} \sim \mathrm{N}(0,1)^d$ and $\vec{u}, \vec{v} \in {\mathbb R}^d$ are any two unit vectors satisfying $\langle \vec{u}, \vec{v} \rangle = \rho$. In particular we may choose $d = 2$, $\vec{u} = (1,0)$, and $\vec{v} = (\rho, \sqrt{1-\rho^2})$, thereby defining $\boldsymbol{z} = \boldsymbol{g}_1$ and $\boldsymbol{z}’ = \rho \boldsymbol{g}_1 + \sqrt{1-\rho^2} \boldsymbol{g}_2$.

Remark 8In Fact 7 it’s often convenient to write $\rho = \cos \theta$ for some $\theta \in {\mathbb R}$, in which case we may define the $\rho$-correlated Gaussians as $\boldsymbol{z} = \langle \vec{u}, \vec{\boldsymbol{g}} \rangle$ and $\boldsymbol{z}’ = \langle \vec{v}, \vec{\boldsymbol{g}}\rangle$ for any unit vectors $\vec{u}, \vec{v}$ making an angle of $\theta$; e.g., $\vec{u} = (1,0)$, $\vec{v} = (\cos\theta, \sin \theta)$.

Definition 9For a fixed $z \in {\mathbb R}$ we say random variable $\boldsymbol{z}’$ is aGaussian $\rho$-correlated to $z$, written $\boldsymbol{z}’ \sim N_\rho(z)$, if $\boldsymbol{z}’$ is distributed as $\rho z + \sqrt{1-\rho^2} \boldsymbol{g}$ where $\boldsymbol{g} \sim \mathrm{N}(0,1)$. By Fact 7, if we draw $\boldsymbol{z} \sim \mathrm{N}(0,1)$ and then form $\boldsymbol{z}’ \sim N_\rho(\boldsymbol{z})$, we obtain a $\rho$-correlated pair of Gaussians $(\boldsymbol{z}, \boldsymbol{z}’)$.

Definition 10For $-1 \leq \rho \leq 1$ and $n \in {\mathbb N}^+$ we say that the ${\mathbb R}^n$-valued random variables $(\boldsymbol{z}, \boldsymbol{z}’)$ are$\rho$-correlated $n$-dimensional Gaussian random vectorsif each component pair $({\boldsymbol{z}}_1, {\boldsymbol{z}}’_1)$, \dots, $({\boldsymbol{z}}_n, {\boldsymbol{z}}’_n)$ is a $\rho$-correlated pair of Gaussians, and the $n$ pairs are mutually independent. We also naturally extend the definition of $\boldsymbol{z}’ \sim N_\rho(z)$ to the case of $z \in {\mathbb R}^n$; this means $\boldsymbol{z}’ = \rho z + \sqrt{1-\rho^2} \boldsymbol{g}$ for $\boldsymbol{g} \sim \mathrm{N}(0,1)^n$.

Remark 11Thus, if $\boldsymbol{z} \sim \mathrm{N}(0,1)^n$ and then $\boldsymbol{z} \sim N_\rho(\boldsymbol{z}’)$ we obtain a $\rho$-correlated $n$-dimensional pair $(\boldsymbol{z}, \boldsymbol{z}’)$. It follows from this that the joint distribution of such a pair is rotationally symmetric (since the distribution of a single $n$-dimensional Gaussian is).

Now we can introduce the Gaussian analogue of the noise operator.

Definition 12For $\rho \in [-1,1]$, theGaussian noise operator$\mathrm{U}_\rho$ is the linear operator defined on the space of functions $f \in L^1({\mathbb R}^n,\gamma)$ by \[ \mathrm{U}_\rho f(z) = \mathop{\bf E}_{\boldsymbol{z}' \sim N_\rho(z)}[f(\boldsymbol{z}')] = \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[f(\rho z + \sqrt{1-\rho^2} \boldsymbol{g})]. \]

Fact 13(Exercise.) If $f \in L^1({\mathbb R}^n,\gamma)$ is an $n$-variate multilinear polynomial, then $\mathrm{U}_\rho f(z) = f(\rho z)$.

Remark 14Our terminology is nonstandard. The Gaussian noise operators are usually collectively referred to as theOrnstein–Uhlenbeck semigroup(or sometimes as theMehler transforms). They are typically defined for $\rho = e^{-t} \in [0,1]$ (i.e., for $t \in [0,\infty]$) by \[ \mathrm{P}_t f(z) = \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[f(e^{-t} z + \sqrt{1-e^{-2t}} \boldsymbol{g})] = \mathrm{U}_{e^{-t}} f(z). \] The term “semigroup” refers to the fact that the operators satisfy $\mathrm{P}_{t_1} \mathrm{P}_{t_2} = \mathrm{P}_{t_1+t_2}$, i.e., $\mathrm{U}_{\rho_1} \mathrm{U}_{\rho_2} = \mathrm{U}_{\rho_1 \rho_2}$ (which holds for all $\rho_1, \rho_2 \in [-1,1]$; see the exercises).

Before going further let’s check that $\mathrm{U}_\rho$ is a bounded operator on all of $L^p({\mathbb R}^n,\gamma)$ for $p \geq 1$; in fact, it’s a contraction:

Proposition 15For each $\rho \in [-1,1]$ and $1 \leq p \leq \infty$ the operator $\mathrm{U}_\rho$ is a contraction on $L^p({\mathbb R}^n,\gamma)$; i.e., $\|\mathrm{U}_\rho f\|_p \leq \|f\|_p$.

*Proof:* The proof for $p = \infty$ is easy; otherwise, the result follows from Jensen’s inequality, using that $t \mapsto |t|^p$ is convex: \begin{align*} \|\mathrm{U}_\rho f\|_p^p = \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[|\mathrm{U}_\rho f(\boldsymbol{z})|^p] &= \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}\left[\left|\mathop{\bf E}_{\boldsymbol{z}' \sim N_\rho(\boldsymbol{z})}[f(\boldsymbol{z}')]\right|^p\right] \\ &\leq \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}\left[\mathop{\bf E}_{\boldsymbol{z}' \sim N_\rho(\boldsymbol{z})}[|f(\boldsymbol{z}')|^p]\right] = \|f\|_p^p. \quad \Box \end{align*}

As in the Boolean case, you should think of the Gaussian noise operator as having a “smoothing” effect on functions. As $\rho$ goes from $1$ down to $0$, $\mathrm{U}_\rho f$ involves averaging $f$’s values over larger and larger neighborhoods. In particular $\mathrm{U}_1$ is the identity operator, $\mathrm{U}_1f = f$, and $\mathrm{U}_0 f = \mathop{\bf E}[f]$, the constant function. In the exercises you are asked to verify the following facts, which say that for any $f$, as $\rho \to 1^-$ we get a sequence of smooth (i.e., $\mathcal{C}^\infty$) functions $\mathrm{U}_\rho f$ that tend to $f$.

Proposition 16Let $f \in L^1({\mathbb R}^n,\gamma)$ and let $-1 < \rho < 1$. Then $\mathrm{U}_\rho f$ is a smooth function.

Proposition 17Let $f \in L^1({\mathbb R}^n,\gamma)$. As $\rho \to 1^-$ we have $\|\mathrm{U}_\rho f – f\|_1 \to 0$.

Having defined the Gaussian noise operator, we can also make the natural definition of Gaussian noise stability (for which we’ll use the same notation as in the Boolean case):

Definition 18For $f \in L^2({\mathbb R}^n,\gamma)$ and $\rho \in [-1,1]$, theGaussian noise stability of $f$ at $\rho$is defined to be \[ \mathbf{Stab}_\rho[f] = \mathop{\bf E}_{\substack{(\boldsymbol{z}, \boldsymbol{z}’) \text{ $n$-dimensional} \\ \text{$\rho$-correlated Gaussians}}}[f(\boldsymbol{z}) f(\boldsymbol{z}')] = \langle f, \mathrm{U}_\rho f \rangle = \langle \mathrm{U}_\rho f, f \rangle. \] (Here we used that $(\boldsymbol{z}’, \boldsymbol{z})$ has the same distribution as $(\boldsymbol{z}, \boldsymbol{z}’)$ and hence $\mathrm{U}_\rho$ is self-adjoint.)

Example 19Let $f : {\mathbb R} \to \{0,1\}$ be the $0$-$1$ indicator of the nonpositive halfline: $f = 1_{(-\infty, 0]}$. Then \begin{equation} \label{eqn:01-sheppard} \mathbf{Stab}_\rho[f] = \mathop{\bf E}_{\substack{(\boldsymbol{z}, \boldsymbol{z}’) \text{ $\rho$-correlated} \\ \text{standard Gaussians}}}[f(\boldsymbol{z}) f(\boldsymbol{z}')] = \mathop{\bf Pr}[\boldsymbol{z} \leq 0, \boldsymbol{z}' \leq 0] = \frac12 – \frac{1}{2}\frac{\arccos \rho}{\pi}, \end{equation} with the last equality beingSheppard’s Formula, which we stated in Section 5.2 and now prove.

*Proof:* Since $(-\boldsymbol{z}, -\boldsymbol{z}’)$ has the same distribution as $(\boldsymbol{z}, \boldsymbol{z}’)$, proving \eqref{eqn:01-sheppard} is equivalent to proving \[ \mathop{\bf Pr}[\boldsymbol{z} \leq 0, \boldsymbol{z}' \leq 0 \text{ or } \boldsymbol{z} > 0, \boldsymbol{z}' > 0] = 1 – \frac{\arccos \rho}{\pi}. \] The complement of the above event is the event that $f(\boldsymbol{z}) \neq f(\boldsymbol{z}’)$ (up to measure $0$); thus it’s further equivalent to prove \begin{equation} \label{eqn:elegant-sheppard} \mathop{\bf Pr}_{\substack{(\boldsymbol{z}, \boldsymbol{z}’) \\\text{$\cos \theta$-correlated}}}[f(\boldsymbol{z}) \neq f(\boldsymbol{z}')] = \tfrac{\theta}{\pi} \end{equation} for all $\theta \in [0,\pi]$. As in Remark 8, this suggests defining $\boldsymbol{z} = \langle \vec{u}, \vec{\boldsymbol{g}} \rangle$, $\boldsymbol{z}’ = \langle \vec{v}, \vec{\boldsymbol{g}} \rangle$, where $\vec{u}, \vec{v} \in {\mathbb R}^2$ is some fixed pair of unit vectors making an angle of $\theta$, and $\vec{\boldsymbol{g}} \sim \mathrm{N}(0,1)^2$. Thus we want to show \[ \mathop{\bf Pr}_{\vec{\boldsymbol{g}} \sim \mathrm{N}(0,1)^2}[\langle \vec{u}, \vec{\boldsymbol{g}} \rangle \leq 0 \text{ & } \langle \vec{v}, \vec{\boldsymbol{g}} \rangle > 0 \text{ or vice versa}] = \tfrac{\theta}{\pi}. \] But this last identity is easy: If we look at the diameter of the unit circle that is perpendicular to $\vec{\boldsymbol{g}}$, then the event above is equivalent (up to measure $0$) to the event that this diameter “splits” $\vec{u}$ and $\vec{v}$. By the rotational symmetry of $\vec{\boldsymbol{g}}$, the probability is evidently $\theta$ (the angle between $\vec{u}, \vec{v}$) divided by $\pi$ (the range of angles for the diameter). $\Box$

Corollary 20Let $H \subset {\mathbb R}^n$ be any halfspace (open or closed) with boundary hyperplane containing the origin. Let $h = \pm 1_{H}$. Then $\mathbf{Stab}_\rho[h] = 1 – \tfrac{2}{\pi} \arccos \rho$.

*Proof:* We may assume $H$ is open (since its boundary has measure $0$). By the rotational symmetry of correlated Gaussians (Remark 11), we may rotate $H$ to the form $H = \{z \in {\mathbb R}^n : z_1 > 0\}$. Then it’s clear that the noise stability of $h = \pm 1_H$ doesn’t depend on $n$, i.e., we may assume $n = 1$. Thus $h = \mathrm{sgn} = 1-2f$, where $f = 1_{(-\infty, 0]}$ as in Example 19. Now if $(\boldsymbol{z}, \boldsymbol{z}’)$ denote $\rho$-correlated standard Gaussians, it follows from \eqref{eqn:01-sheppard} that \begin{align*} \mathbf{Stab}_\rho[h] = \mathop{\bf E}[h(\boldsymbol{z}) h(\boldsymbol{z}')] &= \mathop{\bf E}[(1-2f(\boldsymbol{z}))(1-2f(\boldsymbol{z}'))] \\ &= 1 – 4\mathop{\bf E}[f] + 4\mathbf{Stab}_\rho[f] = 1 – \tfrac{2}{\pi} \arccos \rho. \quad \Box \end{align*}

Remark 21The quantity $\mathbf{Stab}_\rho[\mathrm{sgn}] = 1 – \tfrac{2}{\pi} \arccos \rho$ is also precisely the limiting noise stability of $\mathrm{Maj}_n$, as stated in Theorem 2.44 and justified in Chapter 5.2.

We’ve defined the key Gaussian noise operator $\mathrm{U}_\rho$ and seen (Proposition 15) that it’s a contraction on all $L^p({\mathbb R}^n,\gamma)$. Is it also hypercontractive? In fact, we’ll now show that the Hypercontractivity Theorem for uniform $\pm 1$ bits holds identically in the Gaussian setting. The proof is simply a reduction to the Boolean case, and it will use the following standard fact (see **[Jan97]** or **[Teu12]** for the proof in case of $L^2$; to extend to other $L^p$, you can use Exercise 1 of this chapter):

Theorem 22For each $n \in {\mathbb N}^+$, the set of multivariate polynomials is dense in $L^p({\mathbb R}^n,\gamma)$ for all $1 \leq p < \infty$.

Gaussian Hypercontractivity TheoremLet $f, g \in L^1({\mathbb R}^n,\gamma)$, let $r, s \geq 0$, and assume $0 \leq \rho \leq \sqrt{rs} \leq 1$. Then \[ \langle f, \mathrm{U}_\rho g \rangle = \langle \mathrm{U}_\rho f, g \rangle = \mathop{\bf E}_{\substack{(\boldsymbol{z}, \boldsymbol{z}') \text{ $\rho$-correlated}\\ \text{$n$-dimensional Gaussians}}}[f(\boldsymbol{z})g(\boldsymbol{z}')] \leq \|f\|_{1+r}\|g\|_{1+s}. \]

*Proof:* (We give a sketch; you are asked to fill in the details in the exercises.) We may assume that $f \in L^{1+r}({\mathbb R}^n,\gamma)$ and $g \in L^{1+s}({\mathbb R}^n,\gamma)$. We may also assume $f, g \in L^2({\mathbb R}^n,\gamma)$ by a truncation and monotone convergence argument; thus the left-hand side is finite by Cauchy–Schwarz. Finally, we may assume that $f$ and $g$ are multivariate polynomials, using Theorem 22. For fixed $M \in {\mathbb N}^+$ we consider “simulating” $(\boldsymbol{z}, \boldsymbol{z}’)$ using bits. More specifically, let $({\boldsymbol{x}}, {\boldsymbol{x}}’) \in \{-1,1\}^{nM} \times \{-1,1\}^{nM}$ be a pair $\rho$-correlated random strings and define the joint ${\mathbb R}^n$-valued random variables $\boldsymbol{y}, \boldsymbol{y}’$ by \[ \boldsymbol{y} = \text{BitsToGaussians}_{M}^{n}({\boldsymbol{x}}), \qquad \boldsymbol{y}' = \text{BitsToGaussians}_{M}^{n}({\boldsymbol{x}}'). \] By a multidimensional Central Limit Theorem we have that \[ \mathop{\bf E}[f(\boldsymbol{y})g(\boldsymbol{y}')] \xrightarrow{M \to \infty} \mathop{\bf E}_{\substack{(\boldsymbol{z}, \boldsymbol{z}’) \\ \text{$\rho$-correlated}}}[f(\boldsymbol{z})g(\boldsymbol{z}')]. \] (Since $f$ and $g$ are polynomials, we can even reduce to a Central Limit Theorem for bivariate monomials.) We further have \[ \mathop{\bf E}[|f(\boldsymbol{y})|^{1+r}]^{1/(1+r)} \xrightarrow{M \to \infty} \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[|f(\boldsymbol{z})|^{1+r}]^{1/(1+r)} \] and similarly for $g$. (This can also be proven by the multidimensional Central Limit Theorem, or by the one-dimensional Central Limit Theorem together with some tricks.) Thus it suffices to show \[ \mathop{\bf E}[f(\boldsymbol{y})g(\boldsymbol{y}')] \leq \mathop{\bf E}[|f(\boldsymbol{y})|^{1+r}]^{1/(1+r)} \mathop{\bf E}[|g(\boldsymbol{y}')|^{1+s}]^{1/(1+s)} \] for any fixed $M$. But we can express $f(\boldsymbol{y}) = F({\boldsymbol{x}})$ and $g(\boldsymbol{y}’) = G({\boldsymbol{x}}’)$ for some $F, G : \{-1,1\}^{nM} \to {\mathbb R}$ and so the above inequality holds by the Two-Function Hypercontractivity Theorem (for $\pm 1$ bits). $\Box$

An immediate corollary, using the proof of Proposition 10.4, is the standard one-function form of hypercontractivity:

Theorem 23Let $1 \leq p \leq q \leq \infty$ and let $f \in L^p({\mathbb R}^n,\gamma)$. Then $\|\mathrm{U}_\rho f\|_q \leq \|f\|_p$ for $0 \leq \rho \leq \sqrt{\tfrac{p-1}{q-1}}$.

We conclude this section by discussing the Gaussian space analogue of the discrete Laplacian operator. Taking our cue from Exercise 2.14½ we make the following definition:

Definition 24TheOrnstein–Uhlenbeck operator$\mathrm{L}$ (also called theinfinitesimal generatorof the Ornstein–Uhlenbeck semigroup, or thenumber operator) is the linear operator acting on functions $f \in L^2({\mathbb R}^n,\gamma)$ by \[ \mathrm{L} f = \frac{d}{d\rho} \mathrm{U}_{\rho} f \Bigr|_{\rho = 1} = -\frac{d}{dt} \mathrm{U}_{e^{-t}} f \Bigr|_{t = 0} \] (provided $\mathrm{L} f$ exists in $L^2({\mathbb R}^n, \gamma)$). Notational warning: It is common to see this as the definition of $-\mathrm{L}$.

Remark 25We will not be completely careful about the domain of the operator $\mathrm{L}$ in this section; for precise details, see the exercises.

Let $f \in L^2({\mathbb R}^n,\gamma)$ be in the domain of $\mathrm{L}$, and further assume for simplicity that $f$ is $\mathcal{C}^3$. Then we have the formula \[ \mathrm{L} f(x) = x \cdot \nabla f(x) -\Delta f(x), \] where $\Delta$ denotes the usual Laplacian differential operator, $\cdot$ denotes the dot product, and $\nabla$ denotes the gradient.

*Proof:* We give the proof in the case $n = 1$, leaving the general case to the exercises. We have \begin{equation} \label{eqn:lap-formula} \mathrm{L} f(x) = -\lim_{t \to 0^+} \frac{\mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)}[f(e^{-t} x + \sqrt{1-e^{-2t}} \boldsymbol{z})] – f(x)}{t}. \end{equation} Applying Taylor’s theorem to $f$ we have \[ f(e^{-t} x + \sqrt{1-e^{-2t}} \boldsymbol{z}) \approx f(e^{-t} x) + f'(e^{-t} x) \sqrt{1-e^{-2t}} \boldsymbol{z} + \tfrac12 f''(e^{-t} x) (1-e^{-2t})\boldsymbol{z}^2, \] where the $\approx$ denotes that the two quantities differ by at most $C (1-e^{-2t})^{3/2}|\boldsymbol{z}|^3$ in absolute value, for some constant $C$ depending on $f$ and $x$. Substituting this into \eqref{eqn:lap-formula} and using $\mathop{\bf E}[\boldsymbol{z}] = 0$, $\mathop{\bf E}[\boldsymbol{z}^2] = 1$, and that $\mathop{\bf E}[|\boldsymbol{z}|^3]$ is an absolute constant, we get \[ \mathrm{L} f(x) = -\lim_{t \to 0^+} \left(\frac{f(e^{-t} x) - f(x)}{t} + \frac{\tfrac12 f''(e^{-t} x) (1-e^{-2t})}{t}\right), \] using the fact that $\frac{(1-e^{-2t})^{3/2}}{t} \to 0$. But this is easily seen to be $xf’(x) – f”(x)$, as claimed. $\Box$

An easy consequence of the semigroup property is the following:

Proposition 27The following equivalent identities hold: \begin{align*} \frac{d}{d\rho} \mathrm{U}_\rho f = \rho^{-1} \mathrm{L} \mathrm{U}_{\rho}& f = \rho^{-1} \mathrm{U}_{\rho} \mathrm{L} f, \\ \frac{d}{dt} \mathrm{U}_{e^{-t}} f = -\mathrm{L} \mathrm{U}_{e^{-t}}& f = -\mathrm{U}_{e^{-t}} \mathrm{L} f. \end{align*}

*Proof:* This follows from \begin{align*} \frac{d}{dt} \mathrm{U}_{e^{-t}} f(x) &= \lim_{\delta \to 0} \frac{\mathrm{U}_{e^{-t – \delta}} f(x) – \mathrm{U}_{e^{-t}} f(x) }{\delta} \\ &= \lim_{\delta \to 0} \frac{\mathrm{U}_{e^{-\delta}} \mathrm{U}_{e^{-t}} f(x) – \mathrm{U}_{e^{-t}} f(x) }{\delta} = \lim_{\delta \to 0} \frac{\mathrm{U}_{e^{-t}} \mathrm{U}_{e^{-\delta}} f(x) – \mathrm{U}_{e^{-t}} f(x) }{\delta}. \quad \Box \end{align*}

We also have the following formula:

Proposition 28Let $f, g \in L^2({\mathbb R}^n, \gamma)$ be in the domain of $\mathrm{L}$, and further assume for simplicity that they are $\mathcal{C}^3$. Then \begin{equation} \label{eqn:dirichlet2} \langle f, \mathrm{L} g \rangle = \langle \mathrm{L} f, g \rangle = \langle \nabla f, \nabla g \rangle. \end{equation}

*Proof:* It suffices to prove the inequality on the right of \eqref{eqn:dirichlet2}. We again treat only the case of $n = 1$, leaving the general case to the exercises. Using Proposition 26, \begin{align*} \langle \mathrm{L} f, g \rangle &= \int_{\mathbb R} (x f’(x) – f”(x))g(x) \varphi(x)\,dx \\ &= \int_{\mathbb R} x f’(x) g(x)\varphi(x) \,dx + \int_{{\mathbb R}} f’(x) (g \varphi)’(x)\,dx \tag{integration by parts} \\ &= \int_{\mathbb R} x f’(x) g(x)\varphi(x) \,dx + \int_{{\mathbb R}} f’(x) (g’(x) \varphi(x) + g(x) \varphi’(x))\,dx \\ &= \int_{{\mathbb R}} f’(x) g’(x) \varphi(x)\,dx, \end{align*} using the fact that $\varphi’(x) = -x \varphi(x)$. $\Box$

Finally, by differentiating the Gaussian Hypercontractivity Inequality we obtain the Gaussian Log-Sobolev Inequality (see Exercise 10.23; the proof is the same as in the Boolean case):

Gaussian Log-Sobolev InequalityLet $f \in L^2({\mathbb R}^n, \gamma)$ be in the domain of $\mathrm{L}$. Then \[ \tfrac{1}{2} \mathbf{Ent}[f^2] \leq \mathop{\bf E}[\|\nabla f\|^2]. \]

It’s tempting to use the notation $\mathbf{I}[f]$ for $\mathop{\bf E}[\|\nabla f\|^2]$; however, you have to be careful because this quantity is not equal to $\sum_{i=1}^n \mathop{\bf E}[\mathop{\bf Var}_{\boldsymbol{z}_i}[f]]$ unless $f$ is a multilinear polynomial. See the exercises.

]]>