The origins of the orthogonal decomposition described in Section 3 date back to the work of Hoeffding **[Hoe48]** (see also **[vMis47]**).

Hoeffding’s work introduced *U-statistics*; i.e., functions $f$ of independent random variables $\boldsymbol{X}_1, \dots, \boldsymbol{X}_n$ of the form $\mathop{\mathrm{avg}}_{1 \leq i_1 < \cdots < i_k \leq n} g(\boldsymbol{X}_{i_1}, \dots, \boldsymbol{X}_{i_k})$, where $g : {\mathbb R}^k \to {\mathbb R}$ is a symmetric function. Such functions are themselves symmetric. For these functions, Hoeffding introduced $f^{\subseteq S}$ (which, by symmetry, depends only on $|S|$) and proved certain inequalities relating $\mathop{\bf Var}[f]$ to the quantities $\|f^{\subseteq S}\|_2^2$, $\|f^{=S}\|_2^2$; e.g., Exercise 19. Nonsymmetric functions $f$ were considered only rarely in the subsequent three decades of statistics research. One notable exception comes in the work of Hájek **[Haj68]**, who effectively introduced $f^{\leq 1}$, known as the *Hájek projection* of $f$. Also, a work of Bourgain **[Bou79]** essentially describes the decomposition $f = \sum_{k} f^{=k}$. The first work which mentions the general orthogonal decomposition for not-necessarily-symmetric functions appears to be that of Efron and Stein **[ES81]** from the late ’70s. Efron and Stein’s description is brief; the subsequent work of Karlin and Rinott **[KR82]** gives a more thorough development. Efron and Stein’s main result was a proof of the statement $\mathop{\bf Var}[f] \leq \mathbf{I}[f]$ for symmetric $f$; in the statistics literature this is known as the *Efron–Stein inequality*. Steele **[Ste86]** extended this to the case of nonsymmetric $f$ by a simple proof that used the Fourier basis approach to orthogonal decomposition. This approach via Fourier bases originated in the work of Rubin and Vitale **[RV80]**; see also **[Tak83,Vit84]**. The terminology “Fourier basis” we use is not standard.

The $p$-biased hypercube distribution is strongly motivated by the Erdős–Rényi **[ER59]** theory of random graphs (see e.g., **[BR08]** for history) and by percolation theory (introduced in **[BH57]**). Influences under the $p$-biased distribution — and their connection to threshold phenomena — were studied by Russo **[Rus81,Rus82]**. The former work proved the Margulis–Russo formula independently of Margulis, who had proven it earlier **[Mar74]**. Fourier analysis under the $p$-biased distribution seems to have been first introduced to the theoretical computer science literature by Furst, Jackson, and Smith **[FJS91]**, who extended the LMN learning algorithm for $\mathsf{AC}^0$ to this setting. Talagrand **[Tal93,Tal94]** developed $p$-biased Fourier for the study of threshold phenomena, strengthening Margulis and Russo’s work and proving the KKL Theorem in the $p$-biased setting. Similar results were obtained by Friedgut and Kalai **[FK96b]** using an earlier work of Bourgain, Kahn, Kalai, Linial, and Katznelson **[BKK+92]** which proved a version of the KKL Theorem in the setting of general product spaces. The statements about sharp thresholds for cliques and connectivity in Examples 49 are essentially due to Matula and to Erdős–Rényi, respectively; see, e.g., **[Bol01]**. Weak threshold results similar to the ones in Exercise 25 were proved by Bollobás and Thomason **[BT87]**, using the Kruskal–Katona Theorem rather than the Poincaré inequality.

Fourier analysis on finite abelian groups — and more generally, on locally compact abelian groups — is an enormous subject upon which we have touched only briefly. We cannot survey it here but refer instead to the standard textbook of Rudin **[Rud62]** and to the friendly textbook of Terras **[Ter99]** which focuses on finite groups.

One of the earliest works on randomized decision tree complexity is that of Saks and Wigderson **[SW86]**; they proved the contents of Exercise 36. (We note that $\mathrm{RDT}(f)$ is usually denoted $R(f)$ in the literature, and ${\mathrm{DT}}(f)$ is usually denoted $D(f)$.) One basic lower bound in the area is that $\mathrm{RDT}(f) \geq \sqrt{{\mathrm{DT}}(f)}$ for any $f : \{-1,1\}^n \to \{-1,1\}$; in fact, this lower bound holds even for “nondeterministic decision tree complexity”, as proved in **[BI87,Tar89]**. Yao’s Conjecture is also sometimes attributed to Karp. Regarding the recursive majority-of-$3$ function, Boppana was the first to point out that $\mathrm{RDT}(\mathrm{Maj}_3^{\otimes d}) = o(3^d)$ even though ${\mathrm{DT}}(\mathrm{Maj}_3^{\otimes d}) = 3^d$. Saks and Wigderson noted the bound $\mathrm{RDT}(\mathrm{Maj}_3^{\otimes d}) \leq (8/3)^d$ and also that it is not optimal. Following subsequent works **[JKS03,She08a]** the best known upper bound is $O(2.65^d)$ **[MNSX11]** and the best known lower bound is $\Omega(2.55^d)$ **[Leo12]**.

The proof of the OSSS Inequality we presented is essentially due to Lee **[Lee10]**; the alternate proof from Exercise 42 is due to Jain and Zhang **[JZ11]**. The Condorcet Jury Theorem is from **[dC85]**. The Shapley value described in Exercise 27 was introduced by the Nobelist Shapley in **[Sha53]**; for more, see **[Rot88]**. Exercise 30 is from **[BOW10]**. Exercises 33(a) and 41 are from **[BSW05]**; the term “revealment” was introduced by Schramm and Steif **[SS10]**.

You write Hoeffing instead of Hoeffding in the first line. I guess that is a typo.

Thank you!

Thank you, fixed!

Leonardos corrected the proof in his article and the lower bound for randomized decision tree complexity of recursive majority of depth d is Omega(2.55^d) (instead of Omega(2.6^d))

Thanks Avishay, I hadn’t seen this!

At the very last line of 2nd paragraph, you have written “Bollob\’{a}s”.

Fixed, thanks!

Extraneous “the”: The p-biased hypercube distribution is strongly motivated by “the the” Erdős–Rényi

Thanks, you’re the the master of finding duplicated “the”‘s in this book!

Thanks, I will wear this title with pride!

As another (inconsequential) typo – I noticed my name is misspelled in the acknowledgments.

Argh! I specifically remember double-checking your name. Guess I should have triple-checked. Sorry!!