Perhaps the most common generalized domain in analysis of boolean functions is the case of the hypercube with “biased” bits.

In this setting we think of a random input in $\{-1,1\}^n$ as having each bit independently equal to $-1$ ($\mathsf{True}$) with probability $p \in (0,1)$ and equal to $1$ ($\mathsf{False}$) with probability $q = 1-p$. (We could also consider different parameters $p_i$ for each coordinate; see the exercises.) In the notation of the chapter this means $L^2(\Omega^n, \pi_p^{\otimes n})$, where $\Omega = \{-1,1\}$ and $\pi_p$ is the distribution on $\Omega$ defined by $\pi_p(-1) = p$, $\pi_p(1) = q$. This context is often referred to as *$p$-biased Fourier analysis*, though it would be more consistent with our terminology if it were called “$\mu$-biased”, where \[ \mu = \mathop{\bf E}_{{\boldsymbol{x}}_i \sim \pi_p}[{\boldsymbol{x}}_i] = q-p = 1-2p. \] One of the more interesting features of the setting is that we can fix a combinatorial boolean function $f : \{-1,1\}^n \to \{-1,1\}$ and then consider its properties for various $p$ between $0$ and $1$; we will discuss this further later in this section. We will also sometimes use the abbreviated notation $\mathop{\bf Pr}_{\pi_p}[\cdot]$ in place of $\mathop{\bf Pr}_{{\boldsymbol{x}} \sim \pi_p^{\otimes n}}[\cdot]$, and similarly $\mathop{\bf E}_{\pi_p}[\cdot]$. The $p$-biased hypercube is one of the generalized domains where it can pay to look at an explicit Fourier basis. In fact, since we have $|\Omega| = 2$ there is a *unique* Fourier basis $\{\phi_0, \phi_1\}$ (up to negating $\phi_1$). For notational simplicity we’ll write $\phi$ instead of $\phi_1$ and use “set notation” rather than multi-index notation:

Definition 39In the context of $p$-biased Fourier analysis we define the basis function $\phi : \{-1,1\} \to {\mathbb R}$ by \[ \phi(x_i) = \frac{x_i - \mu}{\sigma}, \] where \[ \mu = \mathop{\bf E}_{{\boldsymbol{x}}_i \sim \pi_p}[{\boldsymbol{x}}_i] = q-p = 1- 2p, \quad \sigma = \mathop{\bf stddev}_{{\boldsymbol{x}}_i \sim \pi_p}[{\boldsymbol{x}}_i] = \sqrt{4pq} = 2\sqrt{p}\sqrt{1-p}. \] Note that $\sigma^2 = 1 – \mu^2$. We also have the formula $\phi(1) = \sqrt{p/q}$, $\phi(-1) = -\sqrt{q/p}$.

We will use the notation $\mu$ and $\sigma$ throughout this section. It’s clear that $\{1, \phi\}$ is indeed a Fourier basis for $L^2(\{-1,1\}, \pi_p)$ because $\mathop{\bf E}[\phi({\boldsymbol{x}}_i)] = 0$ and $\mathop{\bf E}[\phi({\boldsymbol{x}}_i)^2] = 1$ by design.

Definition 40In the context of $L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ we define the product Fourier basis functions $(\phi_S)_{S \subseteq [n]}$ by \[ \phi_S(x) = \prod_{i \in S} \phi(x_i). \] Given $f \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ we write $\widehat{f}(S)$ for the associated Fourier coefficient; i.e., \[ \widehat{f}(S) = \mathop{\bf E}_{{\boldsymbol{x}} \sim \pi_p^{\otimes n}}[f({\boldsymbol{x}})\,\phi_S({\boldsymbol{x}})]. \] Thus we have the biased Fourier expansion \[ f(x) = \sum_{S \subseteq [n]} \widehat{f}(S)\,\phi_S(x). \]

Although the notation is very similar to that of the classic uniform-distribution Fourier analysis, we caution that in general, \[ \phi_S \phi_T \neq \phi_{S \triangle T}. \]

Example 41Let $\chi_i \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ be the $i$th dictator function, $\chi_i(x) = x_i$, viewed under the $p$-biased distribution. We have \[ \phi(x_i) = \frac{x_i - \mu}{\sigma} \quad\Rightarrow\quad x_i = \mu + \sigma\phi(x_i), \] and the latter is evidently $f$’s (biased) Fourier expansion. I.e., \[ \widehat{\chi_i}(\emptyset) = \mu, \quad \widehat{\chi_i}(\{i\}) = \sigma, \quad \widehat{\chi_i}(S) = 0 \text{ otherwise}. \]

This example lets us see a link between a function’s “usual” Fourier expansion and its biased Fourier expansion. (For more on this, see the exercises.) Let’s abuse notation a little by writing simply $\phi_i$ instead of $\phi(x_i)$. We have the formulas \begin{equation} \label{eqn:p-biased-substitution} \phi_i = \frac{x_i – \mu}{\sigma} \quad\Leftrightarrow\quad x_i = \mu + \sigma \phi_i, \end{equation} and we can go from the usual Fourier expansion to the biased Fourier expansion simply by plugging in the latter.

Example 42Recall the “selection function” $\mathrm{Sel} : \{-1,1\}^3 \to \{-1,1\}$ from Exercise 1.1(j); $\mathrm{Sel}(x_1, x_2, x_2)$ outputs $x_2$ if $x_1 = -1$ and outputs $x_3$ if $x_1 = 1$. The usual Fourier expansion of $\mathrm{Sel}$ is \[ \mathrm{Sel}(x_1, x_2, x_3) = \tfrac{1}{2} x_2 + \tfrac{1}{2} x_3 -\tfrac{1}{2} x_1 x_2 + \tfrac{1}{2} x_1 x_3. \] Using the substitution from \eqref{eqn:p-biased-substitution} we get \begin{align} \mathrm{Sel}(x_1, x_2, x_3) &= \tfrac{1}{2} (\mu + \sigma \phi_2) + \tfrac{1}{2} (\mu + \sigma \phi_3) -\tfrac{1}{2} (\mu + \sigma \phi_1) (\mu + \sigma \phi_2) + \tfrac{1}{2} (\mu + \sigma \phi_1) (\mu + \sigma \phi_3) \nonumber\\ &= \mu + (\tfrac{1}{2} – \tfrac{1}{2} \mu)\sigma \, \phi_{2} + (\tfrac{1}{2} + \tfrac{1}{2} \mu) \sigma \, \phi_{3} – \tfrac{1}{2} \sigma^2 \, \phi_1 \phi_2 + \tfrac{1}{2} \sigma^2 \, \phi_1 \phi_3. \label{eqn:sel-biased} \end{align} Thus if we write $\mathrm{Sel}^{(p)}$ for the selection function thought of as an element of $L^2(\{-1,1\}^3, \pi_p^{\otimes 3})$, we have \begin{gather*} \widehat{\mathrm{Sel}^{(p)}}(\emptyset) = \mu, \quad \widehat{\mathrm{Sel}^{(p)}}(2) = (\tfrac{1}{2} – \tfrac{1}{2} \mu)\sigma, \quad \widehat{\mathrm{Sel}^{(p)}}(3) = (\tfrac{1}{2} + \tfrac{1}{2} \mu)\sigma, \\ \widehat{\mathrm{Sel}^{(p)}}(\{1,2\}) = -\tfrac{1}{2}\sigma^2, \quad \widehat{\mathrm{Sel}^{(p)}}(\{1,3\}) = \tfrac{1}{2}\sigma^2, \quad \widehat{\mathrm{Sel}^{(p)}}(S) = 0 \text{ else}. \end{gather*} By the Fourier formulas of Section 2 we can deduce, e.g., that $\mathop{\bf E}[\mathrm{Sel}^{(p)}] = \mu$, $\mathbf{Inf}_1[\mathrm{Sel}^{(p)}] = (-\tfrac{1}{2} \sigma^2)^2 + (\tfrac{1}{2} \sigma^2)^2 = \tfrac{1}{2} \sigma^4$, etc.

Let’s codify a piece of notation from this example:

Notation 43Let $f : \{-1,1\}^n \to {\mathbb R}$ and let $p \in (0,1)$. We write $f^{(p)}$ for the function when viewed as an element of $L^2(\{-1,1\}^n, \pi_{p}^{\otimes n})$.

We now discuss derivative operators. We would like to define an operator $\mathrm{D}_i$ on $L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ which acts like differentiation on the biased Fourier expansion. E.g., referring to \eqref{eqn:sel-biased} we would like to have \[ \mathrm{D}_3 \mathrm{Sel}^{(p)} = (\tfrac{1}{2} + \tfrac{1}{2} \mu) \sigma + \tfrac{1}{2} \sigma^2 \, \phi_1. \] In general we are seeking $\frac{\partial}{\partial \phi_i}$ which, by basic calculus and the relationship \eqref{eqn:p-biased-substitution}, satisfies \[ \frac{\partial}{\partial \phi_i} = \frac{\partial x_i}{\partial \phi_i} \cdot \frac{\partial}{\partial x_i} = \sigma \cdot \frac{\partial}{\partial x_i}. \] Recognizing $\frac{\partial}{\partial x_i}$ as the “usual” $i$th derivative operator, we are led to the following:

Definition 44For $i \in [n]$, the$i$th (discrete) derivativeoperator $\mathrm{D}_i$ on $L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ is defined by \[ \mathrm{D}_i f(x) = \sigma \cdot \frac{f(x^{(i\mapsto 1)}) - f(x^{(i \mapsto -1)})}{2}. \] Note that this defines a different operator for each value of $p$. We sometimes write the above definition as \[ \mathrm{D}_{\phi_i} = \sigma \cdot \mathrm{D}_{x_i}. \] With respect to the biased Fourier expansion of $f \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ the operator $\mathrm{D}_i$ satisfies \begin{equation} \label{eqn:p-biased-deriv} \mathrm{D}_i f = \sum_{S \ni i} \widehat{f}(S)\,\phi_{S \setminus \{i\}}. \end{equation}

Given this definition we can derive some additional formulas for influences, including a generalization of Proposition 2.20:

Proposition 45Suppose $f \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ is boolean-valued (i.e., has range $\{-1,1\}$). Then \[ \mathbf{Inf}_i[f] = \sigma^2 \mathop{\bf Pr}_{{\boldsymbol{x}} \sim \pi_p^{\otimes n}}[f({\boldsymbol{x}}) \neq f({\boldsymbol{x}}^{\oplus i})] \] for each $i \in [n]$. If furthermore $f$ is monotone then $\mathbf{Inf}_i[f] = \sigma \widehat{f}(i)$.

*Proof:* Using Definition 44‘s notation and \eqref{eqn:p-biased-deriv} we have \[ \mathbf{Inf}_i[f] = \mathop{\bf E}_{\pi_p}[(\mathrm{D}_{\phi_i} f)^2] = \sigma^2 \mathop{\bf E}_{\pi_p}[(\mathrm{D}_{x_i} f)^2]. \] Since $(\mathrm{D}_{x_i} f)^2$ is the $0$-$1$ indicator that $i$ is pivotal for $f$, the first formula follows. When $f$ is monotone we furthermore have that $(\mathrm{D}_{x_i} f)^2 = \mathrm{D}_{x_i} f$ and hence \[ \mathbf{Inf}_i[f] = \sigma^2 \mathop{\bf E}_{\pi_p}[\mathrm{D}_{x_i} f] = \sigma \mathop{\bf E}_{\pi_p}[\mathrm{D}_{\phi_i} f] = \sigma \widehat{f}(i), \] as claimed. $\Box$

The remainder of this section is devoted to the topic of *threshold phenomena* in boolean functions. Much of the motivation for this comes from theory of random graphs, which we now briefly introduce.

Definition 46Given an undirected graph $G$ on $v \geq 2$ vertices, we identify it with the string in $\{\mathsf{True}, \mathsf{False}\}^{\binom{v}{2}}$ which indicates which edges are present ($\mathsf{True}$) and which are absent ($\mathsf{False}$). We write $\mathcal{G}(v,p)$ for the distribution $\pi_{p}^{\otimes \binom{v}{2}}$; this is called theErdős–Rényi random graph model. Note that if we permute the $v$ vertices of a graph, this induces a permutation on the $\binom{v}{2}$ edges. A ($v$-vertex)graph propertyis a boolean function $f : \{\mathsf{True}, \mathsf{False}\}^{\binom{v}{2}} \to \{\mathsf{True}, \mathsf{False}\}$ which is invariant under all $v!$ such permutations of its input; colloquially, this means that $f$ “does not depend on the names of the vertices”.

Graph properties are always transitive-symmetric functions in the sense of Definition 2.10.

Examples 47The following are all $v$-vertex graph properties: \begin{align*} \text{Conn}(G) &= \mathsf{True} \text{ if $G$ is connected;} \\ \text{3Col}(G) &= \mathsf{True} \text{ if $G$ is $3$-colourable;} \\ \text{Clique}_{k}(G) &= \mathsf{True} \text{ if $G$ is contains a clique on at least $k$ vertices;} \\ \mathrm{Maj}_{n}(G) &= \mathsf{True} \textstyle \text{ (assuming $n = \binom{v}{2}$ is odd) if $G$ has at least $\binom{v}{2}/2$ edges;} \\ \chi_{[n]}(G) &= \mathsf{True} \text{ if $G$ has an odd number of edges.} \\ \end{align*} Note that each of these actually defines a family of boolean functions, one for each value of $v$; this is the typical situation in the study of graph properties. An example of a function $f : \{\mathsf{True},\mathsf{False}\}^{\binom{v}{2}} \to \{\mathsf{True}, \mathsf{False}\}$ which isnota graph property is the one defined by $f(G) = \mathsf{True}$ if vertex #$1$ has at least one neighbour; this $f$ is not invariant under permuting the vertices.

Graph properties which are *monotone* are particularly nice to study; these are the ones for which adding edges can never make the property go from $\mathsf{True}$ to $\mathsf{False}$. The properties $\text{Conn}$, $\text{Clique}_{k}$, and $\mathrm{Maj}_n$ defined above are all monotone, as is $\neg\text{3Col}$. Take any monotone graph property, say $\text{Conn}$; a typical question in random graph theory would be, “how many edges does a graph need to have before it is likely to be connected?” Or more precisely, how does $\mathop{\bf Pr}_{\boldsymbol{G} \sim \mathcal{G}(v,p)}[\text{Conn}(\boldsymbol{G}) = \mathsf{True}]$ vary as $p$ increases from $0$ to $1$?

There’s no need to ask this question just for graph properties. Given any monotone boolean function $f : \{\mathsf{True},\mathsf{False}\}^n \to \{\mathsf{True}, \mathsf{False}\}$ it is intuitively clear that when $p$ increases from $0$ to $1$ this causes $\mathop{\bf Pr}_{\pi_p} [f({\boldsymbol{x}}) = \mathsf{True}]$ to increase from $0$ to $1$ (unless $f$ is a constant function). As illustration, we show a plot of $\mathop{\bf Pr}_{\pi_p} [f({\boldsymbol{x}}) = \mathsf{True}]$ versus $p$ for the dictator function, $\mathrm{AND}_2$, and $\mathrm{Maj}_{101}$.

The *Margulis–Russo Formula* quantifies the rate at which $\mathop{\bf Pr}_{\pi_p} [f({\boldsymbol{x}}) = \mathsf{True}]$ increases with $p$; specifically, it relates the slope of the curve at $p$ to the total influence of $f$ under $\pi_p^{\otimes n}$. To prove the formula we switch to $\pm 1$ notation.

Margulis–Russo FormulaLet $f : \{-1,1\}^n \to {\mathbb R}$. Recalling Notation 43 and the relation $\mu = 1-2p$, we have \begin{equation} \label{eqn:marg-russo1} \frac{d}{d\mu}\mathop{\bf E}[f^{(p)}] = \frac{1}{\sigma} \cdot \sum_{i=1}^n \widehat{f^{(p)}}(i). \end{equation} In particular, if $f : \{-1,1\}^n \to \{-1,1\}$ is monotone then \begin{equation} \label{eqn:marg-russo2} \frac{d}{dp}\,\mathop{\bf Pr}_{{\boldsymbol{x}} \sim \pi_p^{\otimes n}}[f({\boldsymbol{x}}) = -1] = \frac{d}{d\mu}\mathop{\bf E}[f^{(p)}] = \frac{1}{\sigma^2} \cdot \mathbf{I}[f^{(p)}]. \end{equation}

*Proof:* Treating $f$ as a multilinear polynomial over $x_1, \dots, x_n$ we have \[ \mathop{\bf E}[f^{(p)}] = \mathrm{T}_\mu f(1, \dots, 1) = f(\mu, \dots, \mu) \] (this also follows from Exercise 1.5). By basic calculus, \[ \frac{d}{d\mu}f(\mu, \dots, \mu) = \sum_{i=1}^n \mathrm{D}_{x_i} f(\mu, \dots, \mu). \] But \[ \mathrm{D}_{x_i} f(\mu, \dots, \mu) = \mathop{\bf E}[\mathrm{D}_{x_i} f^{(p)}] = \frac{1}{\sigma} \mathop{\bf E}[\mathrm{D}_{\phi_i} f^{(p)}] = \frac{1}{\sigma} \widehat{f^{(p)}}(i), \] completing the proof of \eqref{eqn:marg-russo1}. As for \eqref{eqn:marg-russo2}, the second equality follows immediately from Proposition 45. The first equality holds because $\mu = 1-2p$ and $\mathop{\bf E}[f] = 1-2\mathop{\bf Pr}[f = -1]$; the two factors of $-2$ cancel. $\Box$

Looking again at the figure we see that the plot for $\mathrm{Maj}_{101}$ looks very much like a step function, jumping from nearly $0$ to nearly $1$ around the critical value $p = 1/2$. For $\mathrm{Maj}_n$, this “sharp threshold at $p = 1/2$” becomes more and more pronounced as $n$ increases. This is clearly suggested by the Margulis–Russo Formula: the derivative of the curve at $p = 1/2$ is equal to $\mathbf{I}[\mathrm{Maj}_n]$ (the usual, uniform-distribution total influence), which has the very large value $\Theta(\sqrt{n})$ (Theorem 2.32). Such sharp thresholds exist for many boolean functions; we give some examples:

Examples 48In the exercises you are asked to show that for every $\epsilon > 0$ there is a $C$ such that \[ \mathop{\bf Pr}_{\pi_{1/2 - C/\sqrt{n}}}[\mathrm{Maj}_n = \mathsf{True}] \leq \epsilon, \quad \mathop{\bf Pr}_{\pi_{1/2 + C/\sqrt{n}}}[\mathrm{Maj}_n = \mathsf{True}] \geq 1 – \epsilon. \] Regarding the Erdős–Rényi graph model, the following facts are known: \begin{align*} \mathop{\bf Pr}_{\boldsymbol{G} \sim \mathcal{G}(v,p)}[\text{Clique}_{\log v}(\boldsymbol{G}) = \mathsf{True}] &\xrightarrow[v \to \infty]{} \begin{cases} 0 & \text{if } p < 1/4, \\ 1 & \text{if } p > 1/4. \end{cases} \\ \mathop{\bf Pr}_{\boldsymbol{G} \sim \mathcal{G}(v,p)}[\text{Conn}(\boldsymbol{G}) = \mathsf{True}] &\xrightarrow[v \to \infty]{} \begin{cases} 0 & \text{if } p < \frac{\ln v}{v}(1-\tfrac{\log \log v}{\log v}), \\ 1 & \text{if } p > \frac{\ln v}{v}(1+\tfrac{\log \log v}{\log v}). \end{cases} \\ \end{align*}

In the above examples you can see that the “jump” occurs at various values of $p$. To investigate this phenomenon, we first single out the value for which $\mathop{\bf Pr}_{\pi_p}[f({\boldsymbol{x}}) = \mathsf{True}] = 1/2$:

Definition 49Let $f : \{\mathsf{True},\mathsf{False}\}^n \to \{\mathsf{True}, \mathsf{False}\}$ be monotone and nonconstant. Thecritical probabilityfor $f$, denoted $p_c$, is the unique value in $(0,1)$ for which $\mathop{\bf Pr}_{{\boldsymbol{x}} \sim \pi_p^{\otimes n}}[f({\boldsymbol{x}}) = \mathsf{True}] = 1/2$. We also write $q_c = 1-p_c$, $\mu_c = q_c – p_c = 1-2p_c$, and $\sigma = \sqrt{4p_cq_c}$.

In the exercises you are asked to verify that $p_c$ is well defined.

Now suppose that $\Delta$ is the derivative of $\mathop{\bf Pr}_{\pi_p}[f({\boldsymbol{x}}) = \mathsf{True}]$ at $p = p_c$. Roughly speaking, we would expect $\mathop{\bf Pr}_{\pi_p}[f({\boldsymbol{x}}) = \mathsf{True}]$ to jump from near $0$ to near $1$ in an interval of around $p_c$ of width about $1/\Delta$. Thus a “sharp threshold” should roughly correspond to the case that $1/\Delta$ is small, even compared to $\min(p_c, q_c)$. The Margulis–Russo Formula says that $\Delta = \frac{1}{\sigma_c^2} \mathbf{I}[f^{(p_c)}]$, and since $\min(p_c,q_c)$ is proportional to $4p_cq_c = \sigma_c^2$ it follows that $1/\Delta$ is “small” compared to $\min(p_c,q_c)$ if and only if $\mathbf{I}[f^{(p_c)}]$ is “large”. Thus we have a neat criterion:

**Sharp threshold principle:** *Let $f : \{\mathsf{True}, \mathsf{False}\}^n \to \{\mathsf{True}, \mathsf{False}\}$ be monotone. Then, roughly speaking, $\mathop{\bf Pr}_{\pi_p}[f({\boldsymbol{x}}) = \mathsf{True}]$ has a “sharp threshold” if and only if $f$ has “large” (“superconstant”) total influence under its critical probability distribution.*

Of course this should all be made a bit more precise; see the exercises for details. In light of this principle, we may try to prove that a given $f$ has a sharp threshold by proving that $\mathbf{I}[f^{(p_c)}]$ is not “small”. This strongly motivates the problem of “characterizing” functions $f \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ for which $\mathbf{I}[f]$ is small. Friedgut’s Junta Theorem, mentioned at the end of Section 3.1 and proved in a later chapter, tells us that in the uniform distribution case $p = 1/2$, the only way $\mathbf{I}[f]$ can be small is if $f$ is close to a junta. In particular, any monotone graph property with $p_c = 1/2$ must have a very large derivative $\frac{d}{dp}\Pr_{\pi_p}[f = \mathsf{True}]$ at $p = p_c$: since the function is transitive-symmetric, all $n$ coordinates are equally influential and it can’t be close to a junta. These results also hold so long as $p$ is bounded away from $0$ and $1$, as we will show in Chapter 9. However many interesting monotone graph properties have $p_c$ very close to $0$: e.g. connectivity, as we saw in Examples 48. Characterizing the functions $f \in L^2(\{-1,1\}^n, \pi_p^{\otimes n})$ with small $\mathbf{I}[f]$ when $p = o_n(1)$ is a trickier task; see the work of Friedgut, Bourgain, and Hatami described in Chapter 10.

A couple of small things:

Ex. 42, under equation (2), \pi_p^{\otimes n} can probably be \pi_p^{\otimes 3}.

Ex. 48, it should probably be \pi_{1/2 – C/\sqrt{n}} and \pi_{1/2 + C/\sqrt{n}} (right now you have C\sqrt{n}).

Also in Ex. 48, is there a reason to use both \ln v and \log v ?

And second to last sentence of the chapter “Examples 48″ should be “Example 48″.

Thanks!

1. Fixed, thanks

2. Fixed, thanks

3. For the first two occurrences, it’s important; ln is base e, log is base 2. For the final occurrences with log log v / log v, it doesn’t really matter, so I write log like I normally do when it doesn’t matter.

4. I guess the title of that paragraph is Examples 48… I’ve been writing it like that in the past even though I agree it looks kind of weird.

Thanks!

Cool! That all makes sense.

Is the hanging “a” right after the end of the first sentence deliberate?

Just a stray typo. Thanks Amirali!