Chapter 8 exercises, continued

  1. Fix a small constant $0 < \epsilon < 1/2$. Let $f : \{\mathsf{True}, \mathsf{False}\}^n \to \{\mathsf{True}, \mathsf{False}\}$ be a nonconstant monotone function. Let $p_0$ (respectively, $p_c$, $p_1$) be the unique value of $p \in (0,1)$ such that $\mathop{\bf Pr}_{\pi_{p}}[f({\boldsymbol{x}}) = \mathsf{True}] = \epsilon$ (respectively, $1/2$, $1-\epsilon$). (This is a valid definition by Exercise 23.) The threshold interval for $f$ is defined to be $[p_0, p_1]$, and $\delta = p_1 – p_0$ is the threshold width. Now let $(f_n)_{n \in {\mathbb N}}$ be a sequence of nonconstant monotone boolean functions (usually “naturally related”, with $f_n$’s input length an increasing function of $n$). Define the sequences $p_0(n)$, $p_c(n)$, $p_1(n)$, $\delta(n)$. We say that the family $(f_n)$ has a sharp threshold if $\delta(n)/(4p_c(n)(1-p_c(n)) \to 0$ as $n \to \infty$; otherwise, we say it has a coarse threshold. Show that if $(f_n)$ has a coarse threshold then there exists $C < \infty$, an infinite sequence $n_1 < n_2 < n_3 < \cdots$, and a sequence $(p(n_i))_{i \in {\mathbb N}}$ such that:
    • $\epsilon < \mathop{\bf Pr}_{\pi_{p(n_i)}}[f_{n_i}({\boldsymbol{x}}) = \mathsf{True}] < 1-\epsilon$ for all $i$;
    • $\mathbf{I}[f_{n_i}^{(p(n_i))}] \leq C$ for all $i$.

    (Hint: Margulis–Russo and the Mean Value Theorem.)

  2. Let $f : \{-1,1\}^n \to \{-1,1\}$ be a nonconstant monotone function and let $F : [0,1] \to [0,1]$ be the (strictly increasing) function defined by $F(p) = \mathop{\bf Pr}_{\pi_p}[f({\boldsymbol{x}}) = -1]$. Let $p_c$ be the critical probability such that $F(p_c) = 1/2$. Assume that $p_c \leq 1/2$. (This is without loss of generality since we can replace $f$ by $f^\dagger$. We often think of $p_c \ll 1/2$.) The goal of this exercise is to show a weak kind of threshold result: roughly speaking, $F(p) = o(1)$ when $p = o(p_c)$ and $F(p) = 1-o(1)$ when $p = \omega(p_c)$.
    1. Using the Margulis–Russo Formula and the Poincaré inequality show that for all $0 < p < 1$, \[ F'(p) \geq \frac{F(p)(1-F(p))}{p(1-p)}. \]
    2. Show that for all $p \leq p_c$ we have $F’(p) \geq \frac{F(p)}{2p}$ and hence $\frac{d}{dp} \ln F(p) \geq \frac{1}{2p}$.
    3. Deduce that for any $0 \leq p_0 \leq p_c$ we have $F(p_0) \leq \tfrac{1}{2} \sqrt{p_0/p_c}$; i.e., $F(p_0) \leq \epsilon$ if $p_0 \leq (2\epsilon)^2 p_c$.
    4. Show that the factor $(2\epsilon)^2$ can be improved to $\Theta(\tau) \epsilon^{1+\tau}$ for any small constant $\tau > 0$. (Hint: the quadratic dependence on $\epsilon$ arose because we used $1-F(p) \geq 1/2$ for $p \leq p_c$; but from part (c) we have the improved bound $1-F(p) \geq 1-\tau$ once $p \leq (2\tau)^2 p_c$.)
    5. In the other direction, show that so long as $p_1 = \frac{1}{(2\epsilon)^2} p_c \leq 1/2$ then we have $F(p_1) \geq 1-\epsilon$. (Hint: work with $\ln(1-F(p))$.) In case $p_1 \leq 1/2$ does not hold, show that we at least have $F(1/2) \geq 1 – \sqrt{p_c/2}$.
    6. The bounds in part (e) are not very interesting when $p_c$ is close to $1/2$. Show that we also have $F(1-\delta) \geq 1 – \sqrt{\delta/2}$ (even when $p_c = 1/2$).
  3. Consider the sequence of functions $f_n : \{\mathsf{True}, \mathsf{False}\}^n \to \{\mathsf{True}, \mathsf{False}\}$ defined for all odd $n \geq 3$ as follows: $f_n(x_1, \dots, x_n) = \mathrm{Maj}_3(x_1, x_2, \mathrm{Maj}_{n-2}(x_3, \dots, x_n))$.
    1. Show that $f_n$ is monotone and has critical probability $p_c = 1/2$.
    2. Sketch a plot of $\mathop{\bf Pr}_{\pi_p}[f_n({\boldsymbol{x}}) = \mathsf{True}]$ versus $p$ (assuming $n$ very large).
    3. Show that $\mathbf{I}[f_n] = \Theta(\sqrt{n})$.
    4. Show that the sequence $f_n$ has a coarse threshold as defined in Exercise 24 (assuming $\epsilon < 1/4$).
    1. Consider the following probability distributions on strings ${\boldsymbol{x}} \in {\mathbb F}_2^n$:

      1. First choose $\boldsymbol{k} \sim \{0, 1, 2, \dots, n\}$ uniformly. Then choose ${\boldsymbol{x}}$ uniformly from the set of all strings of Hamming weight $\boldsymbol{k}$.
      2. First choose a uniformly random “path $\boldsymbol{\pi}$ from $(0, 0, \dots, 0)$ up to $(1, 1, \dots, 1)$”; i.e., let $\boldsymbol{\pi}$ be a uniformly random permutation from $S_n$ and let $\boldsymbol{\pi}^{\leq i} \in {\mathbb F}_2^n$ denote the string whose $j$th coordinate is $1$ if and only if $\pi(j) \leq i$. Then choose $\boldsymbol{k} \sim \{0, 1, 2, \dots, n\}$ uniformly and let ${\boldsymbol{x}}$ be the “$\boldsymbol{k}$th string on the path”, namely $\boldsymbol{\pi}^{\leq \boldsymbol{k}}$.
      3. First choose $\boldsymbol{p} \sim [0,1]$. Then choose ${\boldsymbol{x}} \sim \pi_{\boldsymbol{p}}^{\otimes n}$.

      Show that these are in fact the same distribution. (Hint: imagine choosing $n+1$ indistinguishable points uniformly from $[0,1]$ and then randomly assigning them the labels “$p$”, $1$, $2$, …, $n$.)

    2. We denote by $\nu^n$ the distribution on ${\mathbb F}_2^{[n]}$ from part (a); more generally, we use the notation $\nu^N$ for the distribution on ${\mathbb F}_2^N$ where $N$ is an abstract set of cardinality $n$. Given a nonempty $J \subseteq [n]$, show that if ${\boldsymbol{x}} \sim \nu^n$ and ${\boldsymbol{x}}_J \in {\mathbb F}_2^J$ denotes the restriction of ${\boldsymbol{x}}$ to coordinates $J$, then ${\boldsymbol{x}}_J$ has the distribution $\nu^J$.
    3. Let $f : {\mathbb F}_2^n \to {\mathbb R}$ and fix $i \in [n]$. The $i$th Shapley value of $f$ is defined to be \[ \mathbf{Shap}_i[f] = \mathop{\bf E}_{{\boldsymbol{x}} \sim \nu^n}[f({\boldsymbol{x}}^{(i \mapsto 1)}) - f({\boldsymbol{x}}^{(i \mapsto 0)})]. \] Show that $\sum_{i=1}^n \mathbf{Shap}_i[f] = f(1, 1, \dots, 1) – f(0, 0, \dots, 0)$.
    4. Suppose $f : {\mathbb F}_2^n \to \{0,1\}$ is monotone. Show that $\mathbf{Shap}_i[f] = 4\int_0^1 \mathbf{Inf}_i[f^{(p)}]\,dp$.
  4. Explain how to generalize the definitions and results in Sections 1, 2 to the case of the complex inner product space $L^2(\Omega^n, \pi^{\otimes n})$. In particular, verify the following formulas from Proposition 16: \begin{align*} \mathop{\bf E}[f] &= \widehat{f}(0)\\ \mathop{\bf E}[|f|^2] &= \mathop{\bf E}[\langle f, f \rangle] = \sum_{\alpha \in {\mathbb N}_{< m}^n} \langle \widehat{f}(\alpha), \widehat{f}(\alpha) \rangle = \sum_{\alpha \in {\mathbb N}_{< m}^n} |\widehat{f}(\alpha)|^2 \\ \mathop{\bf Var}[f] &= \langle f – \mathop{\bf E}[f], f – \mathop{\bf E}[f] \rangle = \sum_{\alpha \neq 0} |\widehat{f}(\alpha)|^2 \\ \langle f, g \rangle &= \sum_{\alpha \in {\mathbb N}_{<m}^n} \langle \widehat{f}(\alpha), \widehat{g}(\alpha) \rangle = \sum_{\alpha \in {\mathbb N}_{<m}^n} \widehat{f}(\alpha)\overline{\widehat{g}(\alpha)} \\ \mathop{\bf Cov}[f,g] &= \langle f – \mathop{\bf E}[f], g – \mathop{\bf E}[g] \rangle = \sum_{\alpha \neq 0} \widehat{f}(\alpha)\overline{\widehat{g}(\alpha)}. \end{align*}
    1. As in Exercise 2.51, explain how to generalize the definitions and results in Sections 1, 2 to the case of functions $f : \Omega^n \to V$, where $V$ is a real inner product space with inner product $\langle \cdot, \cdot \rangle_V$. Here the Fourier coefficients $\widehat{f}(\alpha)$ will be elements of $V$, and $\langle f, g \rangle$ is defined to be $\mathop{\bf E}_{{\boldsymbol{x}} \sim \pi^{\otimes n}}[\langle f({\boldsymbol{x}}), g({\boldsymbol{x}}) \rangle_V]$. In particular, verify the formulas from Proposition 16, including the Placherel Theorem $\langle f, g \rangle = \sum_\alpha \langle \widehat{f}(\alpha), \widehat{g}(\alpha) \rangle_V$.
    2. For $\Sigma$ a finite set we write $\triangle_\Sigma$ for the set of all probability distributions over $\Sigma$ (cf. Exercise 7.22). Writing $|\Sigma| = m$, we also identify $\triangle_\Sigma$ with the standard convex simplex in ${\mathbb R}^{m}$, namely $\{\mu \in {\mathbb R}^{m} : \mu_1 + \cdots + \mu_m = 1, \mu_i \geq 0\ \forall i\}$ (where we assume some fixed ordering of $\Sigma$). Finally, we identify the $m$ elements of $\Sigma$ with the constant distributions in $\triangle_\Sigma$; equivalently, the vertices of the form $(0, \dots, 0, 1, 0, \dots, 0)$. Given a function $f : \Omega^n \to \Sigma$, often the most useful way to treat it analytically is to interpret it as a function $f : \Omega^n \to \triangle_\Sigma \subset {\mathbb R}^m$ and then use the setting described in part (a), with $V = {\mathbb R}^m$. Using this idea, show that if $f : \Omega^n \to \Sigma$ and $\pi$ is a distribution on $\Omega$, then \[ \mathbf{Stab}_\rho[f] = \mathop{\bf Pr}_{{\boldsymbol{x}} \sim \pi^{\otimes n}, \boldsymbol{y} \sim N_\rho({\boldsymbol{x}})}[f({\boldsymbol{x}}) = f(\boldsymbol{y})]. \] (Here in $\mathbf{Stab}_\rho[f]$ we are interpreting $f$’s range as $\triangle_\Sigma \subset {\mathbb R}^m$ whereas in the expression $f({\boldsymbol{x}}) = f(\boldsymbol{y})$ we are treating $f$’s range as the abstract set $\Sigma$.)
  5. We say a function $f \in L^2(\Omega^n, \pi^{\otimes n})$ is a linear threshold function if it is expressible as $f(x) = \mathrm{sgn}(\ell(x))$, where $\ell : \Omega^n \to {\mathbb R}$ has degree at most $1$ (in the sense of Definition 32).
    1. Given $\omega^{(+1)}, \omega^{(-1)} \in \Omega^n$ and $x \in \{-1,1\}^n$, we introduce the notation $\omega^{(x)}$ for the string $(\omega_1^{(x_1)}, \dots, \omega_n^{(x_n)}) \in \Omega^n$. Show that if $\boldsymbol{\omega}^{(+1)}, \boldsymbol{\omega}^{(-1)} \sim \pi^{\otimes n}$ are drawn independently and $({\boldsymbol{x}}, \boldsymbol{y}) \sim \{-1,1\}^n \times \{-1,1\}^n$ is a $\rho$-correlated pair of binary strings, then $(\boldsymbol{\omega}^{({\boldsymbol{x}})}, \boldsymbol{\omega}^{(\boldsymbol{y})})$ is a $\rho$-correlated pair under $\pi^{\otimes n}$.
    2. Let $f \in L^2(\Omega^n, \pi^{\otimes n})$ be a linear threshold function. Given a pair $\omega^{(+1)}, \omega^{(-1)} \in \Omega^n$, define $g_{\omega^{(+1)}, \omega^{(-1)}} : \{-1,1\}^n \to \{-1,1\}$ by $g_{\omega^{(+1)}, \omega^{(-1)}}(x) = f(\omega^{(x)})$. Show that $g_{\omega^{(+1)}, \omega^{(-1)}}$ is a linear threshold function in the “usual” sense.
    3. Prove that Peres’s Theorem (from Chapter 5.5) applies to linear threshold functions in $L^2(\Omega^n, \pi^{\otimes n})$, with the same bounds.
  6. Let $G$ be a finite abelian group. We know by the Fundamental Theorem of Finitely Generated Abelian Groups that $G \cong {\mathbb Z}_{m_1} \times \cdots {\mathbb Z}_{m_n}$ where each $m_j$ is a prime power.
    1. Given $\alpha \in G$, define $\chi_\alpha : G \to {\mathbb C}$ by \[ \chi_\alpha(x) = \prod_{j = 1}^n \exp(2 \pi i x_j / m_j). \] Show $\chi_\alpha$ is a character of $G$ and that the $\chi_\alpha$’s are distinct functions for distinct $\alpha$’s. Deduce that the set of all $\chi_\alpha$’s forms a Fourier basis for $L^2(G)$.
    2. Show that this set of characters forms a group under multiplication and that this group is isomorphic to $G$. (I.e., generalize Fact 57.) This is called the dual group of $G$ and it is written $\widehat{G}$. We also identify the characters in $\widehat{G}$ with their indices $\alpha$.
  7. Verify that the convolution operation on $L^2(G)$ is associative, commutative, and satisfies $\widehat{f * g}(\alpha) = \widehat{f}(\alpha)\widehat{g}(\alpha)$ for all $\alpha \in \widehat{G}$. (See Exercise 30 for the definition of $\widehat{G}$.)
    1. Let $f \in L^2(\Omega^n, \pi^{\otimes n})$ be any transitive-symmetric function and let $\mathcal{T}$ be a randomized decision tree computing $f$. Show that there exists a randomized decision tree $\mathcal{T}’$ computing $f$ with $\Delta^{(\pi)}(\mathcal{T}’) = \Delta^{(\pi)}(\mathcal{T})$ and such that $\delta_i^{(\pi)}(\mathcal{T}’)$ is the same for all $i \in [n]$. (Hint: randomize over the automorphism group $\mathrm{Aut}(f)$.)
    2. Given a randomized decision tree $\mathcal{T}$, let $\delta^{(\pi)}(\mathcal{T}) = \max_{i \in n}\{\delta_i^{(\pi)}(\mathcal{T})\}$. Given $f \in L^2(\{-1,1\}^n, \pi^{\otimes n})$, define $\delta^{(\pi)}(f)$ to be the minimum value of $\delta_i^{(\pi)}(\mathcal{T})$ over all $\mathcal{T}$ which compute $f$; this is called the revealment of $f$. Show that if $f$ is transitive-symmetric then $\delta^{(\pi)}(f) = \frac{1}{n}\Delta^{(\pi)}(f)$.
    1. Show that ${\mathrm{DT}}(\mathrm{Maj}_3^{\otimes d}) = 3^d$, $\mathrm{RDT}(\mathrm{Maj}_3^{\otimes d}) \leq (8/3)^d$, and $\Delta(\mathrm{Maj}_3^{\otimes d}) \leq (5/2)^d$.
    2. Show that $\mathrm{RDT}(\mathrm{Maj}_3^{\otimes 2}) < (8/3)^2$. How small can you make your upper bound?
    1. Show that for every deterministic decision tree $T$ computing the logical OR function on $n$ bits, \begin{multline*} \qquad \qquad \Delta^{(p)}(T) = p \cdot 1 + (1-p)p \cdot 2 + (1-p)^2p \cdot 3 + \cdots \\ \cdots + (1-p)^{n-2}p \cdot (n-1) + (1-p)^{n-1} \cdot n = \frac{1 – (1-p)^n}{p}. \end{multline*} Deduce $\Delta^{(p)}(\mathrm{OR}_n) = \frac{1 – (1-p)^n}{p}$.
    2. Show that $\Delta^{(p_c)}(\mathrm{OR}_n) \sim n/(2 \ln 2)$ as $n \to \infty$, where $p_c$ denotes the critical probability for $\mathrm{OR}_n$.
  8. Let $\mathrm{NAND} : \{\mathsf{True}, \mathsf{False}\}^2 \to \{\mathsf{True}, \mathsf{False}\}$ be the function which outputs $\mathsf{True}$ unless both its inputs are $\mathsf{True}$.
    1. Show that for $d$ even, $\mathrm{NAND}^{\otimes d} = \mathrm{Tribes}_{2,2}^{\otimes d/2}$. (Thus the recursive NAND function is sometimes known as the AND-OR tree.)
    2. Show that ${\mathrm{DT}}(\mathrm{NAND}^{\otimes d}) = 2^d$.
    3. Show that $\mathrm{RDT}(\mathrm{NAND}) = 2$.
    4. For $b \in \{\mathsf{True}, \mathsf{False}\}$ and $\mathcal{T}$ a randomized decision tree computing a function $f$, let $\mathrm{RDT}_b(\mathcal{T})$ denote the maximum cost of $\mathcal{T}$ among inputs $x$ with $f(x) = b$. Show that there is a randomized decision tree $\mathcal{T}$ computing $\mathrm{NAND}$ with $\mathrm{RDT}_\mathsf{False}(\mathcal{T}) = 3/2$.
    5. Show that $\mathrm{RDT}(\mathrm{NAND}^{\otimes 2}) \leq 3$.
    6. Show that there is a family of randomized decision trees $(\mathcal{T}_d)_{d \in {\mathbb N}^+}$, with $\mathcal{T}_d$ computing $\mathrm{NAND}^{\otimes d}$, satisfying the inequalities \begin{align*} \mathrm{RDT}_{\mathsf{False}}(\mathcal{T}_{d}) &\leq 2\mathrm{RDT}_{\mathsf{True}}(\mathcal{T}_{d-1})\\ \mathrm{RDT}_{\mathsf{True}}(\mathcal{T}_{d}) &\leq \mathrm{RDT}_{\mathsf{False}}(\mathcal{T}_{d-1}) + (1/2)\mathrm{RDT}_{\mathsf{True}}(\mathcal{T}_{d-1}). \end{align*}
    7. Deduce $\mathrm{RDT}(\mathrm{NAND}^{\otimes d}) \leq (\frac{1+\sqrt{33}}{4})^d \approx n^{.754}$, where $n = 2^d$.
  9. Let $\mathcal{C} = \{\text{monotone } f : \{-1,1\}^n \to \{-1,1\}\mid {\mathrm{DT}}(f) \leq k\}$. Show that $\mathcal{C}$ is learnable from random examples with error $\epsilon$ in time $n^{O(\sqrt{k}/\epsilon)}$. (Hint: OS Inequality and Corollary 3.32.)
  10. Verify that the decision tree process described in Definition 68 indeed generates strings distributed according to $\pi^{\otimes n}$. (Hint: induction on the structure of the tree.)
  11. Let $T$ be a deterministic decision tree of size $s$. Show that $\Delta(T) \leq \log s$. (Hint: Let $\boldsymbol{P}$ be a random root-to-leaf path chosen as in the decision tree process. How can you bound the entropy of the random variable $\boldsymbol{P}$?)
  12. Let $f \in L^2(\Omega^n, \pi^{\otimes n})$ be a nonconstant function with range $\{-1,1\}$.

    1. Show that $\mathbf{MaxInf}[f] \geq \mathop{\bf Var}[f]/\Delta^{(\pi)}(f)$ (cf. the KKL Theorem from Chapter 4.2).
    2. In case $\Omega = \{-1,1\}$ show that $\mathbf{MaxInf}[f] \geq \mathop{\bf Var}[f] / \deg(f)^3$. (You should use the result of Midrijānis mentioned in Chapter 3.6.)
    3. Show that $\mathbf{Inf}[f] \geq \mathop{\bf Var}[f] / \delta^{(\pi)}(f)$, where $\delta^{(\pi)}(f)$ is the revealment of $f$, defined in Exercise 33(b).
  13. Let $f \in L^2(\Omega^n, \pi^{\otimes n})$ have range $\{-1,1\}$.
    1. Let $\mathcal{T}$ be a randomized decision computing $f$ and let $i \in [n]$. Show that $\mathbf{Inf}_i[f] \leq \delta^{(\pi)}_i(f)$. (Hint: the decision tree process.)
    2. Suppose $f$ is transitive-symmetric. Show that $\Delta^{(\pi)}(f) \geq \sqrt{\mathop{\bf Var}[f] / n}$. (Hint: Exercise 32(b).) This result can be sharp up to an $O(\sqrt{\log n})$ factor even for an $f : \{-1,1\}^n \to \{-1,1\}$ with $\mathop{\bf Var}[f] = 1$; see [BSW05].
  14. In this exercise you will give an alternate proof of the OSSS Inequality which is sharp when $\mathop{\bf Var}[f] = 1$ and is weaker by only a factor of $2$ when $\mathop{\bf Var}[f]$ is small. Let $f \in L^2(\Omega^n, \pi^{\otimes n})$ have range $\{-1,1\}$. Given a randomized decision tree $\mathcal{T}$ we write $\mathrm{err}(\mathcal{T}) = \mathop{\bf Pr}_{{\boldsymbol{x}} \sim \pi^{\otimes n}}[\mathcal{T}({\boldsymbol{x}}) \neq f({\boldsymbol{x}})]$.
    1. Let $T$ be a depth-$k$ deterministic decision tree (not necessarily computing $f$) whose root queries coordinate $i$. Let $\mathcal{T}$ be the distribution over deterministic trees of depth at most $k-1$ given by following a random outgoing edge from $T$’s root (according to $\pi$). Show that $\mathrm{err}(\mathcal{T}) \leq \mathrm{err}(T) + \tfrac{1}{2}\mathbf{Inf}_i[f]$.
    2. Let $\mathcal{T}$ be a randomized decision tree of depth $0$. Show that $\mathrm{err}(\mathcal{T}) \geq \min\{\mathop{\bf Pr}[f({\boldsymbol{x}}) = 1], \mathop{\bf Pr}[f({\boldsymbol{x}}) = -1]\}$.
    3. Prove by induction on depth that if $\mathcal{T}$ is any randomized decision tree then $\tfrac{1}{2} \sum_{i=1}^n \delta_i^{(\pi)}(T) \cdot \mathbf{Inf}_i[f] \geq \min\{\mathop{\bf Pr}[f({\boldsymbol{x}}) = 1], \mathop{\bf Pr}[f({\boldsymbol{x}}) = -1]\} – \mathrm{err}(\mathcal{T})$. Verify that this yields the OSSS Inequality when $\mathop{\bf Var}[f] = 1$ and in general yields the OSSS Inequality up to a factor of $2$.
  15. Show that the OSSS Inequality fails for functions $f : \{-1,1\}^n \to {\mathbb R}$. (Hint: the simplest counterexample uses a decision tree with the below shape.)

    The basis for a counterexample to the OSSS Inequality when f's range is R


    Can you make the ratio of the left-hand side to the right-hand side equal to $\frac{130 + 20\sqrt{3}}{157}$? Larger?

9 comments to Chapter 8 exercises, continued

Leave a Reply

  

  

  

You can use most standard LaTeX, as well as these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>