§11.1: Gaussian space and the Gaussian noise operator

We begin with a few definitions concerning Gaussian space.

Notation 1 Throughout this chapter we write $\varphi$ for the pdf of a standard Gaussian random variable, $\varphi(z) = \frac{1}{\sqrt{2\pi}}\exp(-\tfrac{1}{2} z^2)$. We also write $\Phi$ for its cdf, and $\overline{\Phi}$ for the complementary cdf $\overline{\Phi}(t) = 1 – \Phi(t) = \Phi(-t)$. We write $\boldsymbol{z} \sim \mathrm{N}(0,1)^n$ to denote that $\boldsymbol{z} = (\boldsymbol{z}_1, \dots, \boldsymbol{z}_n)$ is a random vector in ${\mathbb R}^n$ whose components $\boldsymbol{z}_i$ are independent Gaussians. Perhaps the most important property of this distribution is that it’s rotationally symmetric; this follows because the pdf at $z$ is $\frac{1}{(2\pi)^{n/2}}\exp(-\tfrac{1}{2}(z_1^2 + \cdots + z_n^2))$, which depends only on the length $\|z\|_2^2$ of $z$.

Definition 2 For $n \in {\mathbb N}^+$ and $1 \leq p \leq \infty$ we write $L^p({\mathbb R}^n,\gamma)$ for the space of Borel functions $f : {\mathbb R}^n \to {\mathbb R}$ that have finite $p$th moment $\|f\|_p^p$ under the Gaussian measure (the “$\gamma$” stands for Gaussian). Here for a function $f$ on Gaussian space we use the notation $\|f\|_p = \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[|f(\boldsymbol{z})|^p]^{1/p}.$ All functions $f : {\mathbb R}^n \to {\mathbb R}$ and sets $A \subseteq {\mathbb R}^n$ are henceforth assumed to be Borel without further mention.

Notation 3 When it’s clear from context that $f$ is a function on Gaussian space we’ll use shorthand notation like $\mathop{\bf E}[f] = \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[f(\boldsymbol{z})]$. If $f = 1_A$ is the $0$-$1$ indicator of a subset $A \subseteq {\mathbb R}^n$ we’ll also write $\mathrm{vol}_\gamma(A) = \mathop{\bf E}[1_A] = \mathop{\bf Pr}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[\boldsymbol{z} \in A]$ for the Gaussian volume of $A$.

Notation 4 For $f, g \in L^2({\mathbb R}^n,\gamma)$ we use the inner product notation $\langle f, g \rangle = \mathop{\bf E}[fg]$, under which $L^2({\mathbb R}^n,\gamma)$ is a separable Hilbert space.

If you’re only interested in Boolean functions $f : \{-1,1\}^n \to \{-1,1\}$ you might wonder why it’s necessary to study Gaussian space. As discussed at the beginning of the chapter, the reason is that functions on Gaussian space are special cases of Boolean functions. Conversely, even if you’re only interested in studying functions of Gaussian random variables, sometimes the easiest proof technique involves “simulating” the Gaussians using sums of random bits. Let’s discuss this in a little more detail. Recall that the Central Limit Theorem tells us that for ${\boldsymbol{x}} \sim \{-1,1\}^M$, the distribution of $\frac{1}{\sqrt{M}}({\boldsymbol{x}}_1 + \cdots + {\boldsymbol{x}}_M)$ approaches that of a standard Gaussian as $M \to \infty$. This is the sense in which a standard Gaussian random variable $\boldsymbol{z} \sim \mathrm{N}(0,1)$ can be “simulated” by random bits. If we want $d$ independent Gaussians we can simulate them by summing up $M$ independent $d$-dimensional vectors of random bits.

Definition 5 The function $\text{BitsToGaussians}_{M}^{} : \{-1,1\}^{M} \to {\mathbb R}$ is defined by $\text{BitsToGaussians}_{M}^{}(x) = \tfrac{1}{\sqrt{M}}(x_1 + \cdots + x_M).$ More generally, the function $\text{BitsToGaussians}_{M}^{d} : \{-1,1\}^{dM} \to {\mathbb R}^d$ is defined on an input $x \in \{-1,1\}^{d \times M}$, thought of as a matrix of column vectors $\vec{x}_1, \dots, \vec{x}_M \in \{-1,1\}^d$, by $\text{BitsToGaussians}_{M}^{d}(x) = \tfrac{1}{\sqrt{M}}(\vec{x}_1 + \cdots + \vec{x}_M).$

Although $M$ needs to be large for this simulation to be accurate, many of the results we’ve developed in the analysis of Boolean functions $f : \{-1,1\}^M \to {\mathbb R}$ are independent of $M$. A further key point is that this simulation preserves polynomial degree: if $p(\boldsymbol{z}_1, \dots, \boldsymbol{z}_d)$ is a degree-$k$ polynomial applied to $d$ independent standard Gaussians, the “simulated version” $p \circ\text{BitsToGaussians}_{M}^{d} : \{-1,1\}^{dM} \to {\mathbb R}$ is a degree-$k$ Boolean function. These facts allow us to transfer many results from the analysis of Boolean functions to the analysis of Gaussian functions. On the other hand, it also means that to fully understand Boolean functions, we need to understand the “special case” of functions on Gaussian space: a Boolean function may essentially be a function on Gaussian space “in disguise”. For example, as we saw in Chapter 5.3, there is a sense in which the majority function $\mathrm{Maj}_n$ “converges” as $n \to \infty$; what it’s converging to is the sign function on $1$-dimensional Gaussian space, $\mathrm{sgn} \in L^1({\mathbb R}, \gamma)$.

We’ll begin our study of Gaussian functions by developing the analogue of the most important operator on Boolean functions, namely the noise operator $\mathrm{T}_\rho$. Suppose we take a pair of $\rho$-correlated $M$-bit strings $({\boldsymbol{x}}, {\boldsymbol{x}}’)$ and use them to form approximate Gaussians, $\boldsymbol{y} = \text{BitsToGaussians}_{M}^{}({\boldsymbol{x}}), \qquad \boldsymbol{y}' = \text{BitsToGaussians}_{M}^{}({\boldsymbol{x}}').$ For each $M$ it’s easy to compute that $\mathop{\bf E}[\boldsymbol{y}] = \mathop{\bf E}[\boldsymbol{y}'] = 0$, $\mathop{\bf Var}[\boldsymbol{y}] = \mathop{\bf Var}[\boldsymbol{y}'] = 1$, and $\mathop{\bf E}[\boldsymbol{y} \boldsymbol{y}'] = \rho$. As noted in Chapter 5.2, a multidimensional version of the Central Limit Theorem (see, e.g., Exercise 5.34 and another in this chapter) tells us that the joint distribution of $(\boldsymbol{y},\boldsymbol{y}’)$ converges to a pair of Gaussian random variables with the same properties. We call these $\rho$-correlated Gaussians.

Definition 6 For $-1 \leq \rho \leq 1$, we say that the random variables $(\boldsymbol{z}, \boldsymbol{z}’)$ are $\rho$-correlated (standard) Gaussians if they are jointly Gaussian and satisfy $\mathop{\bf E}[\boldsymbol{z}] = \mathop{\bf E}[\boldsymbol{z}'] = 0$, $\mathop{\bf Var}[\boldsymbol{z}] = \mathop{\bf Var}[\boldsymbol{z}'] = 1$, and $\mathop{\bf E}[\boldsymbol{z} \boldsymbol{z}'] = \rho$. In other words, if $(\boldsymbol{z}, \boldsymbol{z}') \sim \mathrm{N}\left(\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}\right).$ Note that the definition is symmetric in $\boldsymbol{z}$, $\boldsymbol{z}’$ and that each is individually distributed as $\mathrm{N}(0,1)$.

Fact 7 An equivalent definition is to say that $\boldsymbol{z} = \langle \vec{u}, \vec{\boldsymbol{g}}\rangle$ and $\boldsymbol{z}’ = \langle \vec{v}, \vec{\boldsymbol{g}} \rangle$, where $\vec{\boldsymbol{g}} \sim \mathrm{N}(0,1)^d$ and $\vec{u}, \vec{v} \in {\mathbb R}^d$ are any two unit vectors satisfying $\langle \vec{u}, \vec{v} \rangle = \rho$. In particular we may choose $d = 2$, $\vec{u} = (1,0)$, and $\vec{v} = (\rho, \sqrt{1-\rho^2})$, thereby defining $\boldsymbol{z} = \boldsymbol{g}_1$ and $\boldsymbol{z}’ = \rho \boldsymbol{g}_1 + \sqrt{1-\rho^2} \boldsymbol{g}_2$.

Remark 8 In Fact 7 it’s often convenient to write $\rho = \cos \theta$ for some $\theta \in {\mathbb R}$, in which case we may define the $\rho$-correlated Gaussians as $\boldsymbol{z} = \langle \vec{u}, \vec{\boldsymbol{g}} \rangle$ and $\boldsymbol{z}’ = \langle \vec{v}, \vec{\boldsymbol{g}}\rangle$ for any unit vectors $\vec{u}, \vec{v}$ making an angle of $\theta$; e.g., $\vec{u} = (1,0)$, $\vec{v} = (\cos\theta, \sin \theta)$.

Definition 9 For a fixed $z \in {\mathbb R}$ we say random variable $\boldsymbol{z}’$ is a Gaussian $\rho$-correlated to $z$, written $\boldsymbol{z}’ \sim N_\rho(z)$, if $\boldsymbol{z}’$ is distributed as $\rho z + \sqrt{1-\rho^2} \boldsymbol{g}$ where $\boldsymbol{g} \sim \mathrm{N}(0,1)$. By Fact 7, if we draw $\boldsymbol{z} \sim \mathrm{N}(0,1)$ and then form $\boldsymbol{z}’ \sim N_\rho(\boldsymbol{z})$, we obtain a $\rho$-correlated pair of Gaussians $(\boldsymbol{z}, \boldsymbol{z}’)$.

Definition 10 For $-1 \leq \rho \leq 1$ and $n \in {\mathbb N}^+$ we say that the ${\mathbb R}^n$-valued random variables $(\boldsymbol{z}, \boldsymbol{z}’)$ are $\rho$-correlated $n$-dimensional Gaussian random vectors if each component pair $({\boldsymbol{z}}_1, {\boldsymbol{z}}’_1)$, \dots, $({\boldsymbol{z}}_n, {\boldsymbol{z}}’_n)$ is a $\rho$-correlated pair of Gaussians, and the $n$ pairs are mutually independent. We also naturally extend the definition of $\boldsymbol{z}’ \sim N_\rho(z)$ to the case of $z \in {\mathbb R}^n$; this means $\boldsymbol{z}’ = \rho z + \sqrt{1-\rho^2} \boldsymbol{g}$ for $\boldsymbol{g} \sim \mathrm{N}(0,1)^n$.

Remark 11 Thus, if $\boldsymbol{z} \sim \mathrm{N}(0,1)^n$ and then $\boldsymbol{z} \sim N_\rho(\boldsymbol{z}’)$ we obtain a $\rho$-correlated $n$-dimensional pair $(\boldsymbol{z}, \boldsymbol{z}’)$. It follows from this that the joint distribution of such a pair is rotationally symmetric (since the distribution of a single $n$-dimensional Gaussian is).

Now we can introduce the Gaussian analogue of the noise operator.

Definition 12 For $\rho \in [-1,1]$, the Gaussian noise operator $\mathrm{U}_\rho$ is the linear operator defined on the space of functions $f \in L^1({\mathbb R}^n,\gamma)$ by $\mathrm{U}_\rho f(z) = \mathop{\bf E}_{\boldsymbol{z}' \sim N_\rho(z)}[f(\boldsymbol{z}')] = \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[f(\rho z + \sqrt{1-\rho^2} \boldsymbol{g})].$

Fact 13 (Exercise.) If $f \in L^1({\mathbb R}^n,\gamma)$ is an $n$-variate multilinear polynomial, then $\mathrm{U}_\rho f(z) = f(\rho z)$.

Remark 14 Our terminology is nonstandard. The Gaussian noise operators are usually collectively referred to as the Ornstein–Uhlenbeck semigroup (or sometimes as the Mehler transforms). They are typically defined for $\rho = e^{-t} \in [0,1]$ (i.e., for $t \in [0,\infty]$) by $\mathrm{P}_t f(z) = \mathop{\bf E}_{\boldsymbol{g} \sim \mathrm{N}(0,1)^n}[f(e^{-t} z + \sqrt{1-e^{-2t}} \boldsymbol{g})] = \mathrm{U}_{e^{-t}} f(z).$ The term “semigroup” refers to the fact that the operators satisfy $\mathrm{P}_{t_1} \mathrm{P}_{t_2} = \mathrm{P}_{t_1+t_2}$, i.e., $\mathrm{U}_{\rho_1} \mathrm{U}_{\rho_2} = \mathrm{U}_{\rho_1 \rho_2}$ (which holds for all $\rho_1, \rho_2 \in [-1,1]$; see the exercises).

Before going further let’s check that $\mathrm{U}_\rho$ is a bounded operator on all of $L^p({\mathbb R}^n,\gamma)$ for $p \geq 1$; in fact, it’s a contraction:

Proposition 15 For each $\rho \in [-1,1]$ and $1 \leq p \leq \infty$ the operator $\mathrm{U}_\rho$ is a contraction on $L^p({\mathbb R}^n,\gamma)$; i.e., $\|\mathrm{U}_\rho f\|_p \leq \|f\|_p$.

Proof: The proof for $p = \infty$ is easy; otherwise, the result follows from Jensen’s inequality, using that $t \mapsto |t|^p$ is convex: \begin{align*} \|\mathrm{U}_\rho f\|_p^p = \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[|\mathrm{U}_\rho f(\boldsymbol{z})|^p] &= \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}\left[\left|\mathop{\bf E}_{\boldsymbol{z}' \sim N_\rho(\boldsymbol{z})}[f(\boldsymbol{z}')]\right|^p\right] \\ &\leq \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}\left[\mathop{\bf E}_{\boldsymbol{z}' \sim N_\rho(\boldsymbol{z})}[|f(\boldsymbol{z}')|^p]\right] = \|f\|_p^p. \quad \Box \end{align*}

As in the Boolean case, you should think of the Gaussian noise operator as having a “smoothing” effect on functions. As $\rho$ goes from $1$ down to $0$, $\mathrm{U}_\rho f$ involves averaging $f$’s values over larger and larger neighborhoods. In particular $\mathrm{U}_1$ is the identity operator, $\mathrm{U}_1f = f$, and $\mathrm{U}_0 f = \mathop{\bf E}[f]$, the constant function. In the exercises you are asked to verify the following facts, which say that for any $f$, as $\rho \to 1^-$ we get a sequence of smooth (i.e., $\mathcal{C}^\infty$) functions $\mathrm{U}_\rho f$ that tend to $f$.

Proposition 16 Let $f \in L^1({\mathbb R}^n,\gamma)$ and let $-1 < \rho < 1$. Then $\mathrm{U}_\rho f$ is a smooth function.

Proposition 17 Let $f \in L^1({\mathbb R}^n,\gamma)$. As $\rho \to 1^-$ we have $\|\mathrm{U}_\rho f – f\|_1 \to 0$.

Having defined the Gaussian noise operator, we can also make the natural definition of Gaussian noise stability (for which we’ll use the same notation as in the Boolean case):

Definition 18 For $f \in L^2({\mathbb R}^n,\gamma)$ and $\rho \in [-1,1]$, the Gaussian noise stability of $f$ at $\rho$ is defined to be $\mathbf{Stab}_\rho[f] = \mathop{\bf E}_{\substack{(\boldsymbol{z}, \boldsymbol{z}’) \text{ n-dimensional} \\ \text{\rho-correlated Gaussians}}}[f(\boldsymbol{z}) f(\boldsymbol{z}')] = \langle f, \mathrm{U}_\rho f \rangle = \langle \mathrm{U}_\rho f, f \rangle.$ (Here we used that $(\boldsymbol{z}’, \boldsymbol{z})$ has the same distribution as $(\boldsymbol{z}, \boldsymbol{z}’)$ and hence $\mathrm{U}_\rho$ is self-adjoint.)

Example 19 Let $f : {\mathbb R} \to \{0,1\}$ be the $0$-$1$ indicator of the nonpositive halfline: $f = 1_{(-\infty, 0]}$. Then $$\label{eqn:01-sheppard} \mathbf{Stab}_\rho[f] = \mathop{\bf E}_{\substack{(\boldsymbol{z}, \boldsymbol{z}’) \text{ \rho-correlated} \\ \text{standard Gaussians}}}[f(\boldsymbol{z}) f(\boldsymbol{z}')] = \mathop{\bf Pr}[\boldsymbol{z} \leq 0, \boldsymbol{z}' \leq 0] = \frac12 – \frac{1}{2}\frac{\arccos \rho}{\pi},$$ with the last equality being Sheppard’s Formula, which we stated in Section 5.2 and now prove.

Proof: Since $(-\boldsymbol{z}, -\boldsymbol{z}’)$ has the same distribution as $(\boldsymbol{z}, \boldsymbol{z}’)$, proving \eqref{eqn:01-sheppard} is equivalent to proving $\mathop{\bf Pr}[\boldsymbol{z} \leq 0, \boldsymbol{z}' \leq 0 \text{ or } \boldsymbol{z} > 0, \boldsymbol{z}' > 0] = 1 – \frac{\arccos \rho}{\pi}.$ The complement of the above event is the event that $f(\boldsymbol{z}) \neq f(\boldsymbol{z}’)$ (up to measure $0$); thus it’s further equivalent to prove $$\label{eqn:elegant-sheppard} \mathop{\bf Pr}_{\substack{(\boldsymbol{z}, \boldsymbol{z}’) \\\text{\cos \theta-correlated}}}[f(\boldsymbol{z}) \neq f(\boldsymbol{z}')] = \tfrac{\theta}{\pi}$$ for all $\theta \in [0,\pi]$. As in Remark 8, this suggests defining $\boldsymbol{z} = \langle \vec{u}, \vec{\boldsymbol{g}} \rangle$, $\boldsymbol{z}’ = \langle \vec{v}, \vec{\boldsymbol{g}} \rangle$, where $\vec{u}, \vec{v} \in {\mathbb R}^2$ is some fixed pair of unit vectors making an angle of $\theta$, and $\vec{\boldsymbol{g}} \sim \mathrm{N}(0,1)^2$. Thus we want to show $\mathop{\bf Pr}_{\vec{\boldsymbol{g}} \sim \mathrm{N}(0,1)^2}[\langle \vec{u}, \vec{\boldsymbol{g}} \rangle \leq 0 \text{ & } \langle \vec{v}, \vec{\boldsymbol{g}} \rangle > 0 \text{ or vice versa}] = \tfrac{\theta}{\pi}.$ But this last identity is easy: If we look at the diameter of the unit circle that is perpendicular to $\vec{\boldsymbol{g}}$, then the event above is equivalent (up to measure $0$) to the event that this diameter “splits” $\vec{u}$ and $\vec{v}$. By the rotational symmetry of $\vec{\boldsymbol{g}}$, the probability is evidently $\theta$ (the angle between $\vec{u}, \vec{v}$) divided by $\pi$ (the range of angles for the diameter). $\Box$

Corollary 20 Let $H \subset {\mathbb R}^n$ be any halfspace (open or closed) with boundary hyperplane containing the origin. Let $h = \pm 1_{H}$. Then $\mathbf{Stab}_\rho[h] = 1 – \tfrac{2}{\pi} \arccos \rho$.

Proof: We may assume $H$ is open (since its boundary has measure $0$). By the rotational symmetry of correlated Gaussians (Remark 11), we may rotate $H$ to the form $H = \{z \in {\mathbb R}^n : z_1 > 0\}$. Then it’s clear that the noise stability of $h = \pm 1_H$ doesn’t depend on $n$, i.e., we may assume $n = 1$. Thus $h = \mathrm{sgn} = 1-2f$, where $f = 1_{(-\infty, 0]}$ as in Example 19. Now if $(\boldsymbol{z}, \boldsymbol{z}’)$ denote $\rho$-correlated standard Gaussians, it follows from \eqref{eqn:01-sheppard} that \begin{align*} \mathbf{Stab}_\rho[h] = \mathop{\bf E}[h(\boldsymbol{z}) h(\boldsymbol{z}')] &= \mathop{\bf E}[(1-2f(\boldsymbol{z}))(1-2f(\boldsymbol{z}'))] \\ &= 1 – 4\mathop{\bf E}[f] + 4\mathbf{Stab}_\rho[f] = 1 – \tfrac{2}{\pi} \arccos \rho. \quad \Box \end{align*}

Remark 21 The quantity $\mathbf{Stab}_\rho[\mathrm{sgn}] = 1 – \tfrac{2}{\pi} \arccos \rho$ is also precisely the limiting noise stability of $\mathrm{Maj}_n$, as stated in Theorem 2.44 and justified in Chapter 5.2.

We’ve defined the key Gaussian noise operator $\mathrm{U}_\rho$ and seen (Proposition 15) that it’s a contraction on all $L^p({\mathbb R}^n,\gamma)$. Is it also hypercontractive? In fact, we’ll now show that the Hypercontractivity Theorem for uniform $\pm 1$ bits holds identically in the Gaussian setting. The proof is simply a reduction to the Boolean case, and it will use the following standard fact (see [Jan97] or [Teu12] for the proof in case of $L^2$; to extend to other $L^p$, you can use Exercise 1 of this chapter):

Theorem 22 For each $n \in {\mathbb N}^+$, the set of multivariate polynomials is dense in $L^p({\mathbb R}^n,\gamma)$ for all $1 \leq p < \infty$.

Gaussian Hypercontractivity Theorem Let $f, g \in L^1({\mathbb R}^n,\gamma)$, let $r, s \geq 0$, and assume $0 \leq \rho \leq \sqrt{rs} \leq 1$. Then $\langle f, \mathrm{U}_\rho g \rangle = \langle \mathrm{U}_\rho f, g \rangle = \mathop{\bf E}_{\substack{(\boldsymbol{z}, \boldsymbol{z}') \text{ \rho-correlated}\\ \text{n-dimensional Gaussians}}}[f(\boldsymbol{z})g(\boldsymbol{z}')] \leq \|f\|_{1+r}\|g\|_{1+s}.$

Proof: (We give a sketch; you are asked to fill in the details in the exercises.) We may assume that $f \in L^{1+r}({\mathbb R}^n,\gamma)$ and $g \in L^{1+s}({\mathbb R}^n,\gamma)$. We may also assume $f, g \in L^2({\mathbb R}^n,\gamma)$ by a truncation and monotone convergence argument; thus the left-hand side is finite by Cauchy–Schwarz. Finally, we may assume that $f$ and $g$ are multivariate polynomials, using Theorem 22. For fixed $M \in {\mathbb N}^+$ we consider “simulating” $(\boldsymbol{z}, \boldsymbol{z}’)$ using bits. More specifically, let $({\boldsymbol{x}}, {\boldsymbol{x}}’) \in \{-1,1\}^{nM} \times \{-1,1\}^{nM}$ be a pair $\rho$-correlated random strings and define the joint ${\mathbb R}^n$-valued random variables $\boldsymbol{y}, \boldsymbol{y}’$ by $\boldsymbol{y} = \text{BitsToGaussians}_{M}^{n}({\boldsymbol{x}}), \qquad \boldsymbol{y}' = \text{BitsToGaussians}_{M}^{n}({\boldsymbol{x}}').$ By a multidimensional Central Limit Theorem we have that $\mathop{\bf E}[f(\boldsymbol{y})g(\boldsymbol{y}')] \xrightarrow{M \to \infty} \mathop{\bf E}_{\substack{(\boldsymbol{z}, \boldsymbol{z}’) \\ \text{\rho-correlated}}}[f(\boldsymbol{z})g(\boldsymbol{z}')].$ (Since $f$ and $g$ are polynomials, we can even reduce to a Central Limit Theorem for bivariate monomials.) We further have $\mathop{\bf E}[|f(\boldsymbol{y})|^{1+r}]^{1/(1+r)} \xrightarrow{M \to \infty} \mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)^n}[|f(\boldsymbol{z})|^{1+r}]^{1/(1+r)}$ and similarly for $g$. (This can also be proven by the multidimensional Central Limit Theorem, or by the one-dimensional Central Limit Theorem together with some tricks.) Thus it suffices to show $\mathop{\bf E}[f(\boldsymbol{y})g(\boldsymbol{y}')] \leq \mathop{\bf E}[|f(\boldsymbol{y})|^{1+r}]^{1/(1+r)} \mathop{\bf E}[|g(\boldsymbol{y}')|^{1+s}]^{1/(1+s)}$ for any fixed $M$. But we can express $f(\boldsymbol{y}) = F({\boldsymbol{x}})$ and $g(\boldsymbol{y}’) = G({\boldsymbol{x}}’)$ for some $F, G : \{-1,1\}^{nM} \to {\mathbb R}$ and so the above inequality holds by the Two-Function Hypercontractivity Theorem (for $\pm 1$ bits). $\Box$

An immediate corollary, using the proof of Proposition 10.4, is the standard one-function form of hypercontractivity:

Theorem 23 Let $1 \leq p \leq q \leq \infty$ and let $f \in L^p({\mathbb R}^n,\gamma)$. Then $\|\mathrm{U}_\rho f\|_q \leq \|f\|_p$ for $0 \leq \rho \leq \sqrt{\tfrac{p-1}{q-1}}$.

We conclude this section by discussing the Gaussian space analogue of the discrete Laplacian operator. Taking our cue from Exercise 2.14½ we make the following definition:

Definition 24 The Ornstein–Uhlenbeck operator $\mathrm{L}$ (also called the infinitesimal generator of the Ornstein–Uhlenbeck semigroup, or the number operator) is the linear operator acting on functions $f \in L^2({\mathbb R}^n,\gamma)$ by $\mathrm{L} f = \frac{d}{d\rho} \mathrm{U}_{\rho} f \Bigr|_{\rho = 1} = -\frac{d}{dt} \mathrm{U}_{e^{-t}} f \Bigr|_{t = 0}$ (provided $\mathrm{L} f$ exists in $L^2({\mathbb R}^n, \gamma)$). Notational warning: It is common to see this as the definition of $-\mathrm{L}$.

Remark 25 We will not be completely careful about the domain of the operator $\mathrm{L}$ in this section; for precise details, see the exercises.

Proposition 26

Let $f \in L^2({\mathbb R}^n,\gamma)$ be in the domain of $\mathrm{L}$, and further assume for simplicity that $f$ is $\mathcal{C}^3$. Then we have the formula $\mathrm{L} f(x) = x \cdot \nabla f(x) -\Delta f(x),$ where $\Delta$ denotes the usual Laplacian differential operator, $\cdot$ denotes the dot product, and $\nabla$ denotes the gradient.

Proof: We give the proof in the case $n = 1$, leaving the general case to the exercises. We have $$\label{eqn:lap-formula} \mathrm{L} f(x) = -\lim_{t \to 0^+} \frac{\mathop{\bf E}_{\boldsymbol{z} \sim \mathrm{N}(0,1)}[f(e^{-t} x + \sqrt{1-e^{-2t}} \boldsymbol{z})] – f(x)}{t}.$$ Applying Taylor’s theorem to $f$ we have $f(e^{-t} x + \sqrt{1-e^{-2t}} \boldsymbol{z}) \approx f(e^{-t} x) + f'(e^{-t} x) \sqrt{1-e^{-2t}} \boldsymbol{z} + \tfrac12 f''(e^{-t} x) (1-e^{-2t})\boldsymbol{z}^2,$ where the $\approx$ denotes that the two quantities differ by at most $C (1-e^{-2t})^{3/2}|\boldsymbol{z}|^3$ in absolute value, for some constant $C$ depending on $f$ and $x$. Substituting this into \eqref{eqn:lap-formula} and using $\mathop{\bf E}[\boldsymbol{z}] = 0$, $\mathop{\bf E}[\boldsymbol{z}^2] = 1$, and that $\mathop{\bf E}[|\boldsymbol{z}|^3]$ is an absolute constant, we get $\mathrm{L} f(x) = -\lim_{t \to 0^+} \left(\frac{f(e^{-t} x) - f(x)}{t} + \frac{\tfrac12 f''(e^{-t} x) (1-e^{-2t})}{t}\right),$ using the fact that $\frac{(1-e^{-2t})^{3/2}}{t} \to 0$. But this is easily seen to be $xf’(x) – f”(x)$, as claimed. $\Box$

An easy consequence of the semigroup property is the following:

Proposition 27 The following equivalent identities hold: \begin{align*} \frac{d}{d\rho} \mathrm{U}_\rho f = \rho^{-1} \mathrm{L} \mathrm{U}_{\rho}& f = \rho^{-1} \mathrm{U}_{\rho} \mathrm{L} f, \\ \frac{d}{dt} \mathrm{U}_{e^{-t}} f = -\mathrm{L} \mathrm{U}_{e^{-t}}& f = -\mathrm{U}_{e^{-t}} \mathrm{L} f. \end{align*}

Proof: This follows from \begin{align*} \frac{d}{dt} \mathrm{U}_{e^{-t}} f(x) &= \lim_{\delta \to 0} \frac{\mathrm{U}_{e^{-t – \delta}} f(x) – \mathrm{U}_{e^{-t}} f(x) }{\delta} \\ &= \lim_{\delta \to 0} \frac{\mathrm{U}_{e^{-\delta}} \mathrm{U}_{e^{-t}} f(x) – \mathrm{U}_{e^{-t}} f(x) }{\delta} = \lim_{\delta \to 0} \frac{\mathrm{U}_{e^{-t}} \mathrm{U}_{e^{-\delta}} f(x) – \mathrm{U}_{e^{-t}} f(x) }{\delta}. \quad \Box \end{align*}

We also have the following formula:

Proposition 28 Let $f, g \in L^2({\mathbb R}^n, \gamma)$ be in the domain of $\mathrm{L}$, and further assume for simplicity that they are $\mathcal{C}^3$. Then $$\label{eqn:dirichlet2} \langle f, \mathrm{L} g \rangle = \langle \mathrm{L} f, g \rangle = \langle \nabla f, \nabla g \rangle.$$

Proof: It suffices to prove the inequality on the right of \eqref{eqn:dirichlet2}. We again treat only the case of $n = 1$, leaving the general case to the exercises. Using Proposition 26, \begin{align*} \langle \mathrm{L} f, g \rangle &= \int_{\mathbb R} (x f’(x) – f”(x))g(x) \varphi(x)\,dx \\ &= \int_{\mathbb R} x f’(x) g(x)\varphi(x) \,dx + \int_{{\mathbb R}} f’(x) (g \varphi)’(x)\,dx \tag{integration by parts} \\ &= \int_{\mathbb R} x f’(x) g(x)\varphi(x) \,dx + \int_{{\mathbb R}} f’(x) (g’(x) \varphi(x) + g(x) \varphi’(x))\,dx \\ &= \int_{{\mathbb R}} f’(x) g’(x) \varphi(x)\,dx, \end{align*} using the fact that $\varphi’(x) = -x \varphi(x)$. $\Box$

Finally, by differentiating the Gaussian Hypercontractivity Inequality we obtain the Gaussian Log-Sobolev Inequality (see Exercise 10.23; the proof is the same as in the Boolean case):

Gaussian Log-Sobolev Inequality Let $f \in L^2({\mathbb R}^n, \gamma)$ be in the domain of $\mathrm{L}$. Then $\tfrac{1}{2} \mathbf{Ent}[f^2] \leq \mathop{\bf E}[\|\nabla f\|^2].$

It’s tempting to use the notation $\mathbf{I}[f]$ for $\mathop{\bf E}[\|\nabla f\|^2]$; however, you have to be careful because this quantity is not equal to $\sum_{i=1}^n \mathop{\bf E}[\mathop{\bf Var}_{\boldsymbol{z}_i}[f]]$ unless $f$ is a multilinear polynomial. See the exercises.

6 comments to §11.1: Gaussian space and the Gaussian noise operator

• Thanks for your very nice posts!
Minor typo in Defn 1: an 1/sqrt{2} missing in the defn. of \phi.

• Whoops! Thanks!

• Noam Lifshitz

In prop. 27 isn’t there a missing factor of $1/\rho$ ?

• Yes, you’re right. This is not a well-written proof by the way. Oh well, I’ll clarify it in the 2nd Edition

• Matt Franklin

maybe small typo in Remark 11.11 (p. 328 in book):
$z \tilde \cal{N}_{\rho}(z’)$ maybe should be
$z’ \tilde \cal{N}_{\rho}(z)$.

• Thank you!