The history of the Hypercontractivity Theorem is complicated.

Its earliest roots are in the work of Paley **[Pal32]** from 1932; he showed that for $1 < p < \infty$ there are constants $0 < c_p < C_p < \infty$ such that $c_p \|\mathrm{S} f\|_p \leq \|f\|_p \leq C_p \|\mathrm{S} f\|_p$ holds for any $f : \{-1,1\}^n \to {\mathbb R}$. Here $\mathrm{S} f = \sum_{t=1}^n \sqrt{\sum_{t = 1}^n (\mathrm{d}_t f)^2}$ is the “square function” of $f$, and $\mathrm{d}_t f = \sum_{S : \max(S) = t} \widehat{f}(S)\,\chi_S$ is the martingale difference sequence for $f$ defined in Exercise 8.17. The main task in Paley’s work is to prove the statement when $p$ is an even integer; other values of $p$ follow by the Riesz(–Thorin) interpolation theorem. Using this result, Paley showed the following hypercontractivity result: if $f : \{-1,1\}^n \to {\mathbb R}$ is homogeneous of degree $2$ then $c’_p \|f\|_2 \leq \|f\|_p \leq C’_p \|f\|_2$ for any $p \in {\mathbb R}^+$.

In 1968, Bonami **[Bon68]** stated the following variant of Theorem 21: If $f : \{-1,1\}^n \to {\mathbb R}$ is homogeneous of degree $k$ then for all $q \geq 2$, $\|f\|_q \leq c_k \sqrt{q} \|f\|_2$, where the constant $c_k$ may be taken to be $1$ if $q$ is an even integer. She remarks that this theorem can be deduced from Paley’s result but with a much worse (exponential) dependence on $q$. The proof she gives is combinatorial and actually only treats the case $k = 2$ and $q$ an even integer; it is similar to Exercise 37.

Independently in 1969, Kiener **[Kie69]** published his Ph.D. thesis which extended Paley’s hypercontractivity result as follows: if $f : \{-1,1\}^n \to {\mathbb R}$ is homogeneous of degree $k$ then $c_{p,k} \|f\|_2 \leq \|f\|_p \leq C_{p,k} \|f\|_2$ for any $p \in {\mathbb R}^+$. The proof is an induction on $k$ and again the bulk of the work is the case of even integer $p$. Kiener also gave a long combinatorial proof showing that if $f : \{-1,1\}^n \to {\mathbb R}$ is homogeneous of degree $2$ then $\mathop{\bf E}[f^4] \leq 51 \mathop{\bf E}[f^2]^2$. (Exercise 38(a) improves this $51$ to $15$.)

Also independently in 1969, Schreiber **[Sch69]** considered multilinear polynomials $f$ over a general orthonormal sequence ${\boldsymbol{x}}_1, \dots, {\boldsymbol{x}}_n$ of centred real (or complex) random variables. He showed that if $f$ has degree at most $k$ then for any even integer $q \geq 4$ it holds that $\|f\|_q \leq C \|f\|_2$, where $C$ depends only on $k$, $q$, and the $q$-norms of the ${\boldsymbol{x}}_i$’s. Again, the proof is very similar to Exercise 37; Schreiber does not estimate his analogue of $|\mathcal{M}|$ but merely notes that it’s finite. Schreiber was interested mainly in the case that the ${\boldsymbol{x}}_i$’s are Gaussian; indeed, **[Sch69]** is a generalization of his earlier work **[Sch67]** specific to the Gaussian case.

In 1970, Bonami published her Ph.D. thesis **[Bon70]** which contains the full Hypercontractivity Theorem as stated at the beginning of the chapter. Her proof follows the standard template seen in essentially all proofs of hypercontractivity: first an elementary proof for the case $n = 1$ and then an induction to extend to general $n$. She also gives the sharper combinatorial result appearing in Exercises 37, 38(c). (The stronger bound from Exercise 38(f) is due to Janson **[Jan97, Remark 5.20]**.) As in Corollary 6, Bonami notes that her combinatorial proof can be extended to a general sequence of symmetric orthonormal random variables, at the expense of including factors of $\|{\boldsymbol{x}}_i\|_q$ into the bound. She points out that this includes the Gaussian case independently studied by Schreiber.

Bonami’s work was published in French, and it remained unknown to most English-language mathematicians for about a decade. In the late ’60s and early ’70s, researchers in quantum field theory developed the theory of hypercontractivity for the Gaussian analogue of $\mathrm{T}_\rho$, namely the Ornstein–Uhlenbeck operator $\mathrm{U}_\rho$. This is now recognized as essentially being a *special case* of hypercontractivity for bits, in light of the fact that $\frac{{\boldsymbol{x}}_1 + \cdots + {\boldsymbol{x}}_n}{\sqrt{n}}$ tends to a Gaussian as $n \to \infty$ by the CLT. We summarize here some of the work in this setting. In 1966, Nelson **[Nel66]** showed that $\|\mathrm{U}_{1/\sqrt{q-1}} f\|_q \leq C_q \|f\|_2$ for all $q \geq 2$. Glimm **[Gli68]** gave the alternative result that for each $q \geq 2$ there is a sufficiently small $\rho_q > 0$ such that $\|\mathrm{U}_{\rho_q} f\|_q \leq \|f\|_2$. Segal **[Seg70]** observed that hypercontractive results can be proved by induction on the dimension $n$. In 1973, Nelson **[Nel73]** gave the full Hypercontractivity Theorem in the Gaussian setting: $\|\mathrm{U}_{\sqrt{(p-1)/(q-1)}} f\|_q \leq \|f\|_p$ for all $1 \leq p < q \leq \infty$. He also proved the combinatorial Exercise 37. The equivalence to the Two-Function Hypercontractivity Theorem is from the work of Neveu **[Nev76]**.

In 1975, Gross **[Gro75]** introduced the notion of Log-Sobolev inequalities and showed how to deduce hypercontractivity inequalities from them. He established the Log-Sobolev inequality for $1$-bit functions, used induction (citing Segal) to obtain it for $n$-bit functions, and then used the CLT to transfer results to the Gaussian setting. (For some earlier results along these lines, see **[Fed69,Gro72]**.) This gave a new proof of Nelson’s result and also independently established Bonami’s full Hypercontractivity Theorem. Also in 1975, Beckner **[Bec75]** published his Ph.D. thesis which proved a sharp form of the hypercontractive inequality for purely complex $\rho$. (It is unfortunate that the influential paper **[KKL88]** miscredited the Hypercontractivity Theorem to Beckner.) The case of general complex $\rho$ was subsequently treated by Weissler **[Wei79]**, with the sharp result being obtained by Epperson **[Epp89]**. Weissler **[Wei80]** also appears to have been the first to make the connection between this line of work and Bonami’s thesis.

Independently of all this work, the $(q,2)$-Hypercontractivity Theorem was reproved (without sharp constant) in the Banach spaces community by Rosenthal **[Ros76]** in 1975, using methods similar to those of Paley and Kiener. For additional early references, see **[Mul05, Chapter 1]**.

The *term* “hypercontractivity” was introduced in **[SH72]**; Definition 13 of a hypercontractive random variable is due to **[KS88]**. The short inductive proof of the Bonami Lemma may have appeared first in **[MOO05]**. Theorems 22 and 24 appear in **[Jan97]**. Theorem 23 dates back to **[PZ78,Bor79]**. The Small-Set Expansion theorem is due to **[KKL88]**; the Level-$k$ Inequalities appear in several places but can probably be fairly credited to **[KKL88]** as well. Ben-Or and Linial’s work **[BL85,BL90]** was motivated both by game theory and by the Byzantine Generals problem **[LSP82]** from distributed computing; the content of Exercise 25 is theirs. In turn it motivated the watershed paper by Kahn, Kalai, and Linial **[KKL88]**. (See also the intermediate work **[CGG87]**.) The “KKL Edge-Isoperimetric Theorem” (which is essentially a strengthening of the basic KKL Theorem) was first explicitly proved by Talagrand **[Tal94]** (possibly independently of **[KKL88]**?); he also treated the $p$-biased case. There is no known combinatorial proof of the KKL Theorem (i.e., one which does not involve real-valued functions); however several slightly different analytic proofs are know **[FS07,Ros06,OW13]**. The explicit lower bound on the “KKL constant” achieved in Exercise 30 is the best known; it was established in, e.g., **[FS07]**. It is still a factor of $2$ away from the best known upper bound, achieved by the tribes function.

Friedgut’s Junta Theorem is from **[Fri98]**. The observation that its junta size can be improved for functions which have $\mathbf{W}^{k}[f] \leq \epsilon$ for $k \ll \mathbf{I}[f]/\epsilon$ was independently made by Li-Yang Tan in 2011; so was the consequence Corollary 31 and its extension to constant-degree PTFs. A stronger result than Corollary 31 is known: Diakonikolas and Servedio **[DS09]** showed that every LTF is $\epsilon$-close to a $\mathbf{I}[f]^2 \mathrm{poly}(1/\epsilon)$-junta. As for Corollary 30, it’s incomparable with a result from **[GMR12]** which shows that every width-$w$ DNF is $\epsilon$-close to a $(w \log(1/\epsilon))^{O(w)}$-junta.

Exercise 3 was suggested to the author by Krzysztof Oleszkiewicz. Exercise 12 is from **[GOWZ10]**. Exercise 21 appears in **[OS07]**; Exercise 22 appears in **[OW09]**. The estimate in Exercise 24 is from **[dKPW04]** (see also **[RR01,KKMO07]**). Exercises 27 and 28 are due to **[KKL88]**. Exercise 34 was suggested to the author by John Wright. Exercise 36 appears in **[KOTZ13]**.

Dear Ryan, Perhaps it makes sense to mention also Khintchine’s Inequality (used for the law of iterated logarithm that he discovered) and a later result by Littlewood.

Hmm. I do mention Khintchine’s Inequality in the text… as for tracing the history of the hypercontractivity theorem, I suppose one can only go back so far I tried to look for the first place that Khintchine-type inequalities for Walsh series (i.e., polynomials of degree higher than one, in our language) were studied. It seems to me this was Paley’s papers.

best,

Ryan