# §8.4: $p$-biased analysis

Perhaps the most common generalized domain in analysis of boolean functions is the case of the hypercube with “biased” bits.

In this setting we think of a random input in $\{-1,1\}^n$ as having each bit independently equal to $-1$ ($\mathsf{True}$) with probability $p \in (0,1)$ and equal to $1$ ($\mathsf{False}$) with probability $q = 1-p$. (We could also consider different parameters $p_i$ for each coordinate; see the exercises.) In the notation of the chapter this means $L^2(\Omega^n, \pi_p^{\otimes n})$, where $\Omega = \{-1,1\}$ and $\pi_p$ is the distribution on $\Omega$ defined by $\pi_p(-1) = p$, $\pi_p(1) = q$. This context is often referred to as $p$-biased Fourier analysis, though it would be more consistent with our terminology if it were called “$\mu$-biased”, where $\mu = \mathop{\bf E}_{{\boldsymbol{x}}_i \sim \pi_p}[{\boldsymbol{x}}_i] = q-p = 1-2p.$ One of the more interesting features of the setting is that we can fix a combinatorial boolean function $f : \{-1,1\}^n \to \{-1,1\}$ and then consider its properties for various $p$ between $0$ and $1$; we will discuss this further later in this section. We will also sometimes use the abbreviated notation $\mathop{\bf Pr}_{\pi_p}[\cdot]$ in place of $\mathop{\bf Pr}_{{\boldsymbol{x}} \sim \pi_p^{\otimes n}}[\cdot]$, and similarly $\mathop{\bf E}_{\pi_p}[\cdot]$. The $p$-biased hypercube is one of the generalized domains where it can pay to look at an explicit Fourier basis. In fact, since we have $|\Omega| = 2$ there is a unique Fourier basis $\{\phi_0, \phi_1\}$ (up to negating $\phi_1$). For notational simplicity we’ll write $\phi$ instead of $\phi_1$ and use “set notation” rather than multi-index notation:

Definition 39 In the context of $p$-biased Fourier analysis we define the basis function $\phi : \{-1,1\} \to {\mathbb R}$ by $\phi(x_i) = \frac{x_i - \mu}{\sigma},$ where $\mu = \mathop{\bf E}_{{\boldsymbol{x}}_i \sim \pi_p}[{\boldsymbol{x}}_i] = q-p = 1- 2p, \quad \sigma = \mathop{\bf stddev}_{{\boldsymbol{x}}_i \sim \pi_p}[{\boldsymbol{x}}_i] = \sqrt{4pq} = 2\sqrt{p}\sqrt{1-p}.$ Note that $\sigma^2 = 1 – \mu^2$. We also have the formula $\phi(1) = \sqrt{p/q}$, $\phi(-1) = -\sqrt{q/p}$.

We will use the notation $\mu$ and $\sigma$ throughout this section. It’s clear that $\{1, \phi\}$ is indeed a Fourier basis for $L^2(\{-1,1\}, \pi_p)$ because $\mathop{\bf E}[\phi({\boldsymbol{x}}_i)] = 0$ and $\mathop{\bf E}[\phi({\boldsymbol{x}}_i)^2] = 1$ by design.

Definition 40 In the context of $L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ we define the product Fourier basis functions $(\phi_S)_{S \subseteq [n]}$ by $\phi_S(x) = \prod_{i \in S} \phi(x_i).$ Given $f \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ we write $\widehat{f}(S)$ for the associated Fourier coefficient; i.e., $\widehat{f}(S) = \mathop{\bf E}_{{\boldsymbol{x}} \sim \pi_p^{\otimes n}}[f({\boldsymbol{x}})\,\phi_S({\boldsymbol{x}})].$ Thus we have the biased Fourier expansion $f(x) = \sum_{S \subseteq [n]} \widehat{f}(S)\,\phi_S(x).$

Although the notation is very similar to that of the classic uniform-distribution Fourier analysis, we caution that in general, $\phi_S \phi_T \neq \phi_{S \triangle T}.$

Example 41 Let $\chi_i \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ be the $i$th dictator function, $\chi_i(x) = x_i$, viewed under the $p$-biased distribution. We have $\phi(x_i) = \frac{x_i - \mu}{\sigma} \quad\Rightarrow\quad x_i = \mu + \sigma\phi(x_i),$ and the latter is evidently $f$’s (biased) Fourier expansion. I.e., $\widehat{\chi_i}(\emptyset) = \mu, \quad \widehat{\chi_i}(\{i\}) = \sigma, \quad \widehat{\chi_i}(S) = 0 \text{ otherwise}.$

This example lets us see a link between a function’s “usual” Fourier expansion and its biased Fourier expansion. (For more on this, see the exercises.) Let’s abuse notation a little by writing simply $\phi_i$ instead of $\phi(x_i)$. We have the formulas $$\label{eqn:p-biased-substitution} \phi_i = \frac{x_i – \mu}{\sigma} \quad\Leftrightarrow\quad x_i = \mu + \sigma \phi_i,$$ and we can go from the usual Fourier expansion to the biased Fourier expansion simply by plugging in the latter.

Example 42 Recall the “selection function” $\mathrm{Sel} : \{-1,1\}^3 \to \{-1,1\}$ from Exercise 1.1(j); $\mathrm{Sel}(x_1, x_2, x_2)$ outputs $x_2$ if $x_1 = -1$ and outputs $x_3$ if $x_1 = 1$. The usual Fourier expansion of $\mathrm{Sel}$ is $\mathrm{Sel}(x_1, x_2, x_3) = \tfrac{1}{2} x_2 + \tfrac{1}{2} x_3 -\tfrac{1}{2} x_1 x_2 + \tfrac{1}{2} x_1 x_3.$ Using the substitution from \eqref{eqn:p-biased-substitution} we get \begin{align} \mathrm{Sel}(x_1, x_2, x_3) &= \tfrac{1}{2} (\mu + \sigma \phi_2) + \tfrac{1}{2} (\mu + \sigma \phi_3) -\tfrac{1}{2} (\mu + \sigma \phi_1) (\mu + \sigma \phi_2) + \tfrac{1}{2} (\mu + \sigma \phi_1) (\mu + \sigma \phi_3) \nonumber\\ &= \mu + (\tfrac{1}{2} – \tfrac{1}{2} \mu)\sigma \, \phi_{2} + (\tfrac{1}{2} + \tfrac{1}{2} \mu) \sigma \, \phi_{3} – \tfrac{1}{2} \sigma^2 \, \phi_1 \phi_2 + \tfrac{1}{2} \sigma^2 \, \phi_1 \phi_3. \label{eqn:sel-biased} \end{align} Thus if we write $\mathrm{Sel}^{(p)}$ for the selection function thought of as an element of $L^2(\{-1,1\}^3, \pi_p^{\otimes 3})$, we have \begin{gather*} \widehat{\mathrm{Sel}^{(p)}}(\emptyset) = \mu, \quad \widehat{\mathrm{Sel}^{(p)}}(2) = (\tfrac{1}{2} – \tfrac{1}{2} \mu)\sigma, \quad \widehat{\mathrm{Sel}^{(p)}}(3) = (\tfrac{1}{2} + \tfrac{1}{2} \mu)\sigma, \\ \widehat{\mathrm{Sel}^{(p)}}(\{1,2\}) = -\tfrac{1}{2}\sigma^2, \quad \widehat{\mathrm{Sel}^{(p)}}(\{1,3\}) = \tfrac{1}{2}\sigma^2, \quad \widehat{\mathrm{Sel}^{(p)}}(S) = 0 \text{ else}. \end{gather*} By the Fourier formulas of Section 2 we can deduce, e.g., that $\mathop{\bf E}[\mathrm{Sel}^{(p)}] = \mu$, $\mathbf{Inf}_1[\mathrm{Sel}^{(p)}] = (-\tfrac{1}{2} \sigma^2)^2 + (\tfrac{1}{2} \sigma^2)^2 = \tfrac{1}{2} \sigma^4$, etc.

Let’s codify a piece of notation from this example:

Notation 43 Let $f : \{-1,1\}^n \to {\mathbb R}$ and let $p \in (0,1)$. We write $f^{(p)}$ for the function when viewed as an element of $L^2(\{-1,1\}^n, \pi_{p}^{\otimes n})$.

We now discuss derivative operators. We would like to define an operator $\mathrm{D}_i$ on $L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ which acts like differentiation on the biased Fourier expansion. E.g., referring to \eqref{eqn:sel-biased} we would like to have $\mathrm{D}_3 \mathrm{Sel}^{(p)} = (\tfrac{1}{2} + \tfrac{1}{2} \mu) \sigma + \tfrac{1}{2} \sigma^2 \, \phi_1.$ In general we are seeking $\frac{\partial}{\partial \phi_i}$ which, by basic calculus and the relationship \eqref{eqn:p-biased-substitution}, satisfies $\frac{\partial}{\partial \phi_i} = \frac{\partial x_i}{\partial \phi_i} \cdot \frac{\partial}{\partial x_i} = \sigma \cdot \frac{\partial}{\partial x_i}.$ Recognizing $\frac{\partial}{\partial x_i}$ as the “usual” $i$th derivative operator, we are led to the following:

Definition 44 For $i \in [n]$, the $i$th (discrete) derivative operator $\mathrm{D}_i$ on $L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ is defined by $\mathrm{D}_i f(x) = \sigma \cdot \frac{f(x^{(i\mapsto 1)}) - f(x^{(i \mapsto -1)})}{2}.$ Note that this defines a different operator for each value of $p$. We sometimes write the above definition as $\mathrm{D}_{\phi_i} = \sigma \cdot \mathrm{D}_{x_i}.$ With respect to the biased Fourier expansion of $f \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ the operator $\mathrm{D}_i$ satisfies $$\label{eqn:p-biased-deriv} \mathrm{D}_i f = \sum_{S \ni i} \widehat{f}(S)\,\phi_{S \setminus \{i\}}.$$

Given this definition we can derive some additional formulas for influences, including a generalization of Proposition 2.20:

Proposition 45 Suppose $f \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ is boolean-valued (i.e., has range $\{-1,1\}$). Then $\mathbf{Inf}_i[f] = \sigma^2 \mathop{\bf Pr}_{{\boldsymbol{x}} \sim \pi_p^{\otimes n}}[f({\boldsymbol{x}}) \neq f({\boldsymbol{x}}^{\oplus i})]$ for each $i \in [n]$. If furthermore $f$ is monotone then $\mathbf{Inf}_i[f] = \sigma \widehat{f}(i)$.

Proof: Using Definition 44‘s notation and \eqref{eqn:p-biased-deriv} we have $\mathbf{Inf}_i[f] = \mathop{\bf E}_{\pi_p}[(\mathrm{D}_{\phi_i} f)^2] = \sigma^2 \mathop{\bf E}_{\pi_p}[(\mathrm{D}_{x_i} f)^2].$ Since $(\mathrm{D}_{x_i} f)^2$ is the $0$-$1$ indicator that $i$ is pivotal for $f$, the first formula follows. When $f$ is monotone we furthermore have that $(\mathrm{D}_{x_i} f)^2 = \mathrm{D}_{x_i} f$ and hence $\mathbf{Inf}_i[f] = \sigma^2 \mathop{\bf E}_{\pi_p}[\mathrm{D}_{x_i} f] = \sigma \mathop{\bf E}_{\pi_p}[\mathrm{D}_{\phi_i} f] = \sigma \widehat{f}(i),$ as claimed. $\Box$

The remainder of this section is devoted to the topic of threshold phenomena in boolean functions. Much of the motivation for this comes from theory of random graphs, which we now briefly introduce.

Definition 46 Given an undirected graph $G$ on $v \geq 2$ vertices, we identify it with the string in $\{\mathsf{True}, \mathsf{False}\}^{\binom{v}{2}}$ which indicates which edges are present ($\mathsf{True}$) and which are absent ($\mathsf{False}$). We write $\mathcal{G}(v,p)$ for the distribution $\pi_{p}^{\otimes \binom{v}{2}}$; this is called the Erdős–Rényi random graph model. Note that if we permute the $v$ vertices of a graph, this induces a permutation on the $\binom{v}{2}$ edges. A ($v$-vertex) graph property is a boolean function $f : \{\mathsf{True}, \mathsf{False}\}^{\binom{v}{2}} \to \{\mathsf{True}, \mathsf{False}\}$ which is invariant under all $v!$ such permutations of its input; colloquially, this means that $f$ “does not depend on the names of the vertices”.

Graph properties are always transitive-symmetric functions in the sense of Definition 2.10.

Examples 47 The following are all $v$-vertex graph properties: \begin{align*} \text{Conn}(G) &= \mathsf{True} \text{ if $G$ is connected;} \\ \text{3Col}(G) &= \mathsf{True} \text{ if $G$ is $3$-colourable;} \\ \text{Clique}_{k}(G) &= \mathsf{True} \text{ if $G$ is contains a clique on at least $k$ vertices;} \\ \mathrm{Maj}_{n}(G) &= \mathsf{True} \textstyle \text{ (assuming $n = \binom{v}{2}$ is odd) if $G$ has at least $\binom{v}{2}/2$ edges;} \\ \chi_{[n]}(G) &= \mathsf{True} \text{ if $G$ has an odd number of edges.} \\ \end{align*} Note that each of these actually defines a family of boolean functions, one for each value of $v$; this is the typical situation in the study of graph properties. An example of a function $f : \{\mathsf{True},\mathsf{False}\}^{\binom{v}{2}} \to \{\mathsf{True}, \mathsf{False}\}$ which is not a graph property is the one defined by $f(G) = \mathsf{True}$ if vertex #$1$ has at least one neighbour; this $f$ is not invariant under permuting the vertices.

Graph properties which are monotone are particularly nice to study; these are the ones for which adding edges can never make the property go from $\mathsf{True}$ to $\mathsf{False}$. The properties $\text{Conn}$, $\text{Clique}_{k}$, and $\mathrm{Maj}_n$ defined above are all monotone, as is $\neg\text{3Col}$. Take any monotone graph property, say $\text{Conn}$; a typical question in random graph theory would be, “how many edges does a graph need to have before it is likely to be connected?” Or more precisely, how does $\mathop{\bf Pr}_{\boldsymbol{G} \sim \mathcal{G}(v,p)}[\text{Conn}(\boldsymbol{G}) = \mathsf{True}]$ vary as $p$ increases from $0$ to $1$?

There’s no need to ask this question just for graph properties. Given any monotone boolean function $f : \{\mathsf{True},\mathsf{False}\}^n \to \{\mathsf{True}, \mathsf{False}\}$ it is intuitively clear that when $p$ increases from $0$ to $1$ this causes $\mathop{\bf Pr}_{\pi_p} [f({\boldsymbol{x}}) = \mathsf{True}]$ to increase from $0$ to $1$ (unless $f$ is a constant function). As illustration, we show a plot of $\mathop{\bf Pr}_{\pi_p} [f({\boldsymbol{x}}) = \mathsf{True}]$ versus $p$ for the dictator function, $\mathrm{AND}_2$, and $\mathrm{Maj}_{101}$.

Plot of Pr_{π_p}(f(x) = True) versus p for f a dictator (dotted), f = AND_2 (dashed), and f =Maj_{101} (solid)

The Margulis–Russo Formula quantifies the rate at which $\mathop{\bf Pr}_{\pi_p} [f({\boldsymbol{x}}) = \mathsf{True}]$ increases with $p$; specifically, it relates the slope of the curve at $p$ to the total influence of $f$ under $\pi_p^{\otimes n}$. To prove the formula we switch to $\pm 1$ notation.

Margulis–Russo Formula Let $f : \{-1,1\}^n \to {\mathbb R}$. Recalling Notation 43 and the relation $\mu = 1-2p$, we have $$\label{eqn:marg-russo1} \frac{d}{d\mu}\mathop{\bf E}[f^{(p)}] = \frac{1}{\sigma} \cdot \sum_{i=1}^n \widehat{f^{(p)}}(i).$$ In particular, if $f : \{-1,1\}^n \to \{-1,1\}$ is monotone then $$\label{eqn:marg-russo2} \frac{d}{dp}\,\mathop{\bf Pr}_{{\boldsymbol{x}} \sim \pi_p^{\otimes n}}[f({\boldsymbol{x}}) = -1] = \frac{d}{d\mu}\mathop{\bf E}[f^{(p)}] = \frac{1}{\sigma^2} \cdot \mathbf{I}[f^{(p)}].$$

Proof: Treating $f$ as a multilinear polynomial over $x_1, \dots, x_n$ we have $\mathop{\bf E}[f^{(p)}] = \mathrm{T}_\mu f(1, \dots, 1) = f(\mu, \dots, \mu)$ (this also follows from Exercise 1.5). By basic calculus, $\frac{d}{d\mu}f(\mu, \dots, \mu) = \sum_{i=1}^n \mathrm{D}_{x_i} f(\mu, \dots, \mu).$ But $\mathrm{D}_{x_i} f(\mu, \dots, \mu) = \mathop{\bf E}[\mathrm{D}_{x_i} f^{(p)}] = \frac{1}{\sigma} \mathop{\bf E}[\mathrm{D}_{\phi_i} f^{(p)}] = \frac{1}{\sigma} \widehat{f^{(p)}}(i),$ completing the proof of \eqref{eqn:marg-russo1}. As for \eqref{eqn:marg-russo2}, the second equality follows immediately from Proposition 45. The first equality holds because $\mu = 1-2p$ and $\mathop{\bf E}[f] = 1-2\mathop{\bf Pr}[f = -1]$; the two factors of $-2$ cancel. $\Box$

Looking again at the figure we see that the plot for $\mathrm{Maj}_{101}$ looks very much like a step function, jumping from nearly $0$ to nearly $1$ around the critical value $p = 1/2$. For $\mathrm{Maj}_n$, this “sharp threshold at $p = 1/2$” becomes more and more pronounced as $n$ increases. This is clearly suggested by the Margulis–Russo Formula: the derivative of the curve at $p = 1/2$ is equal to $\mathbf{I}[\mathrm{Maj}_n]$ (the usual, uniform-distribution total influence), which has the very large value $\Theta(\sqrt{n})$ (Theorem 2.32). Such sharp thresholds exist for many boolean functions; we give some examples:

Examples 48 In the exercises you are asked to show that for every $\epsilon > 0$ there is a $C$ such that $\mathop{\bf Pr}_{\pi_{1/2 - C/\sqrt{n}}}[\mathrm{Maj}_n = \mathsf{True}] \leq \epsilon, \quad \mathop{\bf Pr}_{\pi_{1/2 + C/\sqrt{n}}}[\mathrm{Maj}_n = \mathsf{True}] \geq 1 – \epsilon.$ Regarding the Erdős–Rényi graph model, the following facts are known: \begin{align*} \mathop{\bf Pr}_{\boldsymbol{G} \sim \mathcal{G}(v,p)}[\text{Clique}_{\log v}(\boldsymbol{G}) = \mathsf{True}] &\xrightarrow[v \to \infty]{} \begin{cases} 0 & \text{if } p < 1/4, \\ 1 & \text{if } p > 1/4. \end{cases} \\ \mathop{\bf Pr}_{\boldsymbol{G} \sim \mathcal{G}(v,p)}[\text{Conn}(\boldsymbol{G}) = \mathsf{True}] &\xrightarrow[v \to \infty]{} \begin{cases} 0 & \text{if } p < \frac{\ln v}{v}(1-\tfrac{\log \log v}{\log v}), \\ 1 & \text{if } p > \frac{\ln v}{v}(1+\tfrac{\log \log v}{\log v}). \end{cases} \\ \end{align*}

In the above examples you can see that the “jump” occurs at various values of $p$. To investigate this phenomenon, we first single out the value for which $\mathop{\bf Pr}_{\pi_p}[f({\boldsymbol{x}}) = \mathsf{True}] = 1/2$:

Definition 49 Let $f : \{\mathsf{True},\mathsf{False}\}^n \to \{\mathsf{True}, \mathsf{False}\}$ be monotone and nonconstant. The critical probability for $f$, denoted $p_c$, is the unique value in $(0,1)$ for which $\mathop{\bf Pr}_{{\boldsymbol{x}} \sim \pi_p^{\otimes n}}[f({\boldsymbol{x}}) = \mathsf{True}] = 1/2$. We also write $q_c = 1-p_c$, $\mu_c = q_c – p_c = 1-2p_c$, and $\sigma = \sqrt{4p_cq_c}$.

In the exercises you are asked to verify that $p_c$ is well defined.

Now suppose that $\Delta$ is the derivative of $\mathop{\bf Pr}_{\pi_p}[f({\boldsymbol{x}}) = \mathsf{True}]$ at $p = p_c$. Roughly speaking, we would expect $\mathop{\bf Pr}_{\pi_p}[f({\boldsymbol{x}}) = \mathsf{True}]$ to jump from near $0$ to near $1$ in an interval of around $p_c$ of width about $1/\Delta$. Thus a “sharp threshold” should roughly correspond to the case that $1/\Delta$ is small, even compared to $\min(p_c, q_c)$. The Margulis–Russo Formula says that $\Delta = \frac{1}{\sigma_c^2} \mathbf{I}[f^{(p_c)}]$, and since $\min(p_c,q_c)$ is proportional to $4p_cq_c = \sigma_c^2$ it follows that $1/\Delta$ is “small” compared to $\min(p_c,q_c)$ if and only if $\mathbf{I}[f^{(p_c)}]$ is “large”. Thus we have a neat criterion:

Sharp threshold principle: Let $f : \{\mathsf{True}, \mathsf{False}\}^n \to \{\mathsf{True}, \mathsf{False}\}$ be monotone. Then, roughly speaking, $\mathop{\bf Pr}_{\pi_p}[f({\boldsymbol{x}}) = \mathsf{True}]$ has a “sharp threshold” if and only if $f$ has “large” (“superconstant”) total influence under its critical probability distribution.

Of course this should all be made a bit more precise; see the exercises for details. In light of this principle, we may try to prove that a given $f$ has a sharp threshold by proving that $\mathbf{I}[f^{(p_c)}]$ is not “small”. This strongly motivates the problem of “characterizing” functions $f \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ for which $\mathbf{I}[f]$ is small. Friedgut’s Junta Theorem, mentioned at the end of Section 3.1 and proved in a later chapter, tells us that in the uniform distribution case $p = 1/2$, the only way $\mathbf{I}[f]$ can be small is if $f$ is close to a junta. In particular, any monotone graph property with $p_c = 1/2$ must have a very large derivative $\frac{d}{dp}\Pr_{\pi_p}[f = \mathsf{True}]$ at $p = p_c$: since the function is transitive-symmetric, all $n$ coordinates are equally influential and it can’t be close to a junta. These results also hold so long as $p$ is bounded away from $0$ and $1$, as we will show in Chapter 9. However many interesting monotone graph properties have $p_c$ very close to $0$: e.g. connectivity, as we saw in Examples 48. Characterizing the functions $f \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ with small $\mathbf{I}[f]$ when $p = o_n(1)$ is a trickier task; see the work of Friedgut, Bourgain, and Hatami described in Chapter 10.

### 8 comments to §8.4: $p$-biased analysis

• Deepak

A couple of small things:

Ex. 42, under equation (2), \pi_p^{\otimes n} can probably be \pi_p^{\otimes 3}.

Ex. 48, it should probably be \pi_{1/2 – C/\sqrt{n}} and \pi_{1/2 + C/\sqrt{n}} (right now you have C\sqrt{n}).
Also in Ex. 48, is there a reason to use both \ln v and \log v ?

And second to last sentence of the chapter “Examples 48″ should be “Example 48″.

Thanks!

• 1. Fixed, thanks
2. Fixed, thanks
3. For the first two occurrences, it’s important; ln is base e, log is base 2. For the final occurrences with log log v / log v, it doesn’t really matter, so I write log like I normally do when it doesn’t matter.
4. I guess the title of that paragraph is Examples 48… I’ve been writing it like that in the past even though I agree it looks kind of weird.

Thanks!

• Deepak

Cool! That all makes sense.

• Amirali

Is the hanging “a” right after the end of the first sentence deliberate?

• Just a stray typo. Thanks Amirali!

• Matt Franklin

In the first line of the proof of Proposition 8.45 (bottom of p. 214 in book),
it says “Using Definition 8.44′s notation and (8.7) we have …”. This is a
bit unclear, because (8.7) doesn’t seem to be needed here (maybe not needed
until the last line of the proof?).

• Absolutely right, thanks.

• Chin Ho Lee

In Example 42, the first Sel(x_1, x_2, x_2) should be Sel(x_1, x_2, x_3).