§3.2: Subspaces and decision trees

In this section we treat the domain of a boolean function as ${\mathbb F}_2^n$, an $n$-dimensional vector space over the field ${\mathbb F}_2$. As mentioned earlier, it can be natural to index the Fourier characters $\chi_S : {\mathbb F}_2^n \to \{-1,1\}$ not by subsets $S \subseteq [n]$ but by their $0$-$1$ indicator vectors $\gamma \in {\mathbb F}_2^n$; thus \[ \chi_\gamma(x) = (-1)^{\gamma \cdot x}, \] with the dot product $\gamma \cdot x$ being carried out in ${\mathbb F}_2^n$.

For example, in this notation we’d write $\chi_0$ for the constantly $1$ function and $\chi_{e_i}$ for the $i$th dictator. Fact 1.6 now becomes \begin{equation} \label{eqn:dual-char} \chi_\beta \chi_\gamma = \chi_{\beta + \gamma} \quad \forall \beta, \gamma. \end{equation} Thus the characters form a group under multiplication which is isomorphic to the group ${\mathbb F}_2^n$ under addition. To distinguish this group from the input domain we write it as ${\widehat{{\mathbb F}_2^n}}$; thus the Fourier expansion of $f : {\mathbb F}_2^n \to {\mathbb R}$ can be written as \[ f(x) = \sum_{\gamma \in {\widehat{{\mathbb F}_2^n}}} \widehat{f}(\gamma) \chi_\gamma(x). \]

The Fourier transform of $f$ can be thought of as a function $\widehat{f} : {\widehat{{\mathbb F}_2^n}} \to {\mathbb R}$. We can measure its complexity with various norms.

Definition 8 The Fourier (or spectral) $p$-norm of $f : \{-1,1\}^n \to {\mathbb R}$ is \[ \hat{\lVert} f \hat{\rVert}_p = \left(\sum_{\gamma \in {\widehat{{\mathbb F}_2^n}}} |\widehat{f}(\gamma)|^p\right)^{1/p}. \]

Note that we use the “counting measure” on ${\widehat{{\mathbb F}_2^n}}$ and hence we have a nice rephrasing of Parseval’s Theorem: $\|f\|_2 = \hat{\lVert} f \hat{\rVert}_2$. We make two more definitions relating to the simplicity of $\widehat{f}$:

Definition 9 The Fourier (or spectral) sparsity of $f : \{-1,1\}^n \to {\mathbb R}$ is \[ \mathrm{sparsity}(\widehat{f}) = |\mathrm{supp}(\widehat{f})| = \#\bigl\{\gamma \in {\widehat{{\mathbb F}_2^n}} : \widehat{f}(\gamma) \neq 0\bigr\}. \]

Definition 10 We say that $\widehat{f}$ is $\epsilon$-granular if $\widehat{f}(\gamma)$ is an integer multiple of $\epsilon$ for all $\gamma \in {\widehat{{\mathbb F}_2^n}}$.

To gain some practice with this notation, let’s look at the Fourier transforms of some indicator functions $1_A : {\mathbb F}_2^n \to \{0,1\}$ and probability density functions $\varphi_A$, where $A \subseteq {\mathbb F}_2^n$. First, suppose $A \leq {\mathbb F}_2^n$ is a subspace. Then one way to characterize $A$ is by its orthogonal complement subspace $A^\perp$: \[ A = \{x \in {\mathbb F}_2^n : \gamma \cdot x = 0 \text{ for all } \gamma \in A^\perp\}. \]

Proposition 11 If $A \leq {\mathbb F}_2^n$ has $\operatorname{codim} A = \dim A^\perp = k$, then \[ 1_A = \sum_{\gamma \in A^\perp} 2^{-k} \chi_\gamma, \qquad \varphi_A = \sum_{\gamma \in A^\perp} \chi_\gamma. \]

Proof: Let $\gamma_1, \dots, \gamma_k$ form a basis of $A^\perp$. Since $x \in A$ if and only if $\chi_{\gamma_i}(x) = 1$ for all $i \in [k]$, we have \[ 1_A(x) = \prod_{i=1}^k \Bigl(\tfrac{1}{2} + \tfrac{1}{2} \chi_{\gamma_i}(x)\Bigr) = 2^{-k}\sum_{\gamma \in \mathrm{span}\{\gamma_1, \dots, \gamma_k\}} \chi_\gamma(x) \] as claimed, where the last equality used \eqref{eqn:dual-char}. The Fourier expansion of $\varphi_A$ follows because $|A| = 2^{n-k}$. $\Box$

More generally, suppose $A$ is affine subspace (or coset) of ${\mathbb F}_2^n$; i.e., $A = H + a$ for some $H \leq {\mathbb F}_2^n$ and $a \in {\mathbb F}_2^n$, or equivalently \[ A = \{x \in {\mathbb F}_2^n : \gamma \cdot x = \gamma \cdot a \text{ for all } \gamma \in H^\perp\}. \] Then it is easy (see the exercises) to extend Proposition 11 to:

Proposition 12 If $A = H+a$ is an affine subspace of codimension $k$ then \[ \widehat{1_A}(\gamma) = \begin{cases} \chi_{\gamma}(a) 2^{-k} & \text{if $\gamma \in H^\perp$} \\ 0 & \text{else;} \end{cases} \] hence $\varphi_A = \sum_{\gamma \in H^\perp} \chi_{\gamma}(a) \chi_\gamma$. We have $\mathrm{sparsity}(\widehat{1_A}) = 2^k$, $\widehat{1_A}$ is $2^{-k}$-granular, $\hat{\lVert} 1_A \hat{\rVert}_\infty = 2^{-k}$, and $\hat{\lVert} 1_A \hat{\rVert} = 1$.

In computer science terminology, any $f : {\mathbb F}_2^n \to \{0,1\}$ which is a conjunction of parity conditions is the indicator of an affine subspace (or the zero function). In the simple case that the parity conditions are all of the form “$x_i = a_i$”, the function is a logical AND of literals and we call the affine subspace a subcube.

Another class of boolean functions with simple Fourier spectra are the ones computable by simple decision trees:

Definition 13 A decision tree $T$ is a representation of a boolean function $f : {\mathbb F}_2^n \to {\mathbb R}$. It consists of a rooted binary tree in which the internal nodes are labelled by coordinates $i \in [n]$, the outgoing edges of each internal node are labelled $0$ and $1$, and the leaves are labelled by real numbers. We insist that no coordinate $i \in [n]$ appears more than once on any root-to-leaf path.

On input $x \in {\mathbb F}_2^n$, the tree $T$ constructs a computation path from the root node to a leaf. Specifically, when the computation path reaches an internal node labelled by coordinate $i \in [n]$ we say that $T$ queries $x_i$; the computation path then follows the outgoing edge labelled by $x_i$. The output of $T$ (and hence $f$) on input $x$ is the label of the leaf reached by the computation path. We often identify a tree with the function it computes.

For decision trees, a picture is worth a thousand words:

Decision tree computing Sort_3

(It’s traditional to write $x_i$ rather than $i$ for the internal node labels.) For example, the computation path of the above tree on input $x = (0,1,0) \in {\mathbb F}_2^3$ starts at the root, queries $x_1$, proceeds left, queries $x_3$, proceeds left, queries $x_2$, proceeds right, and reaches a leaf labelled $0$. In fact, this tree computes the function $\mathrm{Sort}_3$ defined by $\mathrm{Sort}_3(x) = 1$ if and only if $x_1 \leq x_2 \leq x_3$ or $x_1 \geq x_2 \geq x_3$.

Definition 14 The size $s$ of a decision tree $T$ is the total number of leaves. The depth $k$ of $T$ is the maximum length of any root-to-leaf path. For decision trees over ${\mathbb F}_2^n$ we have $k \leq n$ and $s \leq 2^k$. Given $f : {\mathbb F}_2^n \to {\mathbb R}$ we write ${\mathrm{DT}}(f)$ (respectively, ${\mathrm{DT}_{\mathrm{size}}}(f)$) for the least depth (respectively, size) of a decision tree computing $f$.

The example decision tree above has size $6$ and depth $3$.

Let $T$ be a decision tree computing $f : {\mathbb F}_2^n \to {\mathbb R}$ and let $P$ be one of its root-to-leaf paths. The set of inputs $x$ which follow computation path $P$ in $T$ is precisely a subcube of ${\mathbb F}_2^n$, call it $C_P$. The function $f$ is constant on $C_P$; we will call its value there $f(P)$. Further, since every input $x$ follows a unique path in $T$, the subcubes $\{C_P : P $ a path in $T\}$ form a partition of ${\mathbb F}_2^n$. These observations yield the following “spectral simplicity” results for decision trees:

Fact 15 Let $f : {\mathbb F}_2^n \to {\mathbb R}$ be computed by a decision tree $T$. Then \[ f = \sum_{\text{paths $P$ of $T$}} f(P)\cdot 1_{C_P}. \]

Proposition 16 Let $f : {\mathbb F}_2^n \to {\mathbb R}$ be computed by a decision tree $T$ of size $s$ and depth $k$. Then:

  • $\deg(f) \leq k$;
  • $\mathrm{sparsity}(\widehat{f}) \leq s 2^k \leq 4^k$;
  • $\hat{\lVert} f \hat{\rVert}_1 \leq \|f\|_\infty \cdot s \leq \|f\|_\infty \cdot 2^k$;
  • $\widehat{f}$ is $2^{-k}$-granular assuming $f : {\mathbb F}_2^n \to {\mathbb Z}$.

Proposition 17 Let $f : {\mathbb F}_2^n \to \{-1,1\}$ be computable by a decision tree of size $s$ and let $\epsilon \in (0,1]$. Then the spectrum of $f$ is $\epsilon$-concentrated on degree up to $\log(s/\epsilon)$.

You are asked to prove these propositions in the exercises, as well as similar spectral simplicity results for some generalizations of the decision tree representation (“subcube partitions”, “parity decision trees”).

8 comments to §3.2: Subspaces and decision trees

Leave a Reply




You can use most standard LaTeX, as well as these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>