# §3.2: Subspaces and decision trees

In this section we treat the domain of a boolean function as ${\mathbb F}_2^n$, an $n$-dimensional vector space over the field ${\mathbb F}_2$. As mentioned earlier, it can be natural to index the Fourier characters $\chi_S : {\mathbb F}_2^n \to \{-1,1\}$ not by subsets $S \subseteq [n]$ but by their $0$-$1$ indicator vectors $\gamma \in {\mathbb F}_2^n$; thus $\chi_\gamma(x) = (-1)^{\gamma \cdot x},$ with the dot product $\gamma \cdot x$ being carried out in ${\mathbb F}_2^n$.

For example, in this notation we’d write $\chi_0$ for the constantly $1$ function and $\chi_{e_i}$ for the $i$th dictator. Fact 1.6 now becomes $$\label{eqn:dual-char} \chi_\beta \chi_\gamma = \chi_{\beta + \gamma} \quad \forall \beta, \gamma.$$ Thus the characters form a group under multiplication which is isomorphic to the group ${\mathbb F}_2^n$ under addition. To distinguish this group from the input domain we write it as ${\widehat{{\mathbb F}_2^n}}$; thus the Fourier expansion of $f : {\mathbb F}_2^n \to {\mathbb R}$ can be written as $f(x) = \sum_{\gamma \in {\widehat{{\mathbb F}_2^n}}} \widehat{f}(\gamma) \chi_\gamma(x).$

The Fourier transform of $f$ can be thought of as a function $\widehat{f} : {\widehat{{\mathbb F}_2^n}} \to {\mathbb R}$. We can measure its complexity with various norms.

Definition 8 The Fourier (or spectral) $p$-norm of $f : \{-1,1\}^n \to {\mathbb R}$ is $\hat{\lVert} f \hat{\rVert}_p = \left(\sum_{\gamma \in {\widehat{{\mathbb F}_2^n}}} |\widehat{f}(\gamma)|^p\right)^{1/p}.$

Note that we use the “counting measure” on ${\widehat{{\mathbb F}_2^n}}$ and hence we have a nice rephrasing of Parseval’s Theorem: $\|f\|_2 = \hat{\lVert} f \hat{\rVert}_2$. We make two more definitions relating to the simplicity of $\widehat{f}$:

Definition 9 The Fourier (or spectral) sparsity of $f : \{-1,1\}^n \to {\mathbb R}$ is $\mathrm{sparsity}(\widehat{f}) = |\mathrm{supp}(\widehat{f})| = \#\bigl\{\gamma \in {\widehat{{\mathbb F}_2^n}} : \widehat{f}(\gamma) \neq 0\bigr\}.$

Definition 10 We say that $\widehat{f}$ is $\epsilon$-granular if $\widehat{f}(\gamma)$ is an integer multiple of $\epsilon$ for all $\gamma \in {\widehat{{\mathbb F}_2^n}}$.

To gain some practice with this notation, let’s look at the Fourier transforms of some indicator functions $1_A : {\mathbb F}_2^n \to \{0,1\}$ and probability density functions $\varphi_A$, where $A \subseteq {\mathbb F}_2^n$. First, suppose $A \leq {\mathbb F}_2^n$ is a subspace. Then one way to characterize $A$ is by its orthogonal complement subspace $A^\perp$: $A = \{x \in {\mathbb F}_2^n : \gamma \cdot x = 0 \text{ for all } \gamma \in A^\perp\}.$

Proposition 11 If $A \leq {\mathbb F}_2^n$ has $\operatorname{codim} A = \dim A^\perp = k$, then $1_A = \sum_{\gamma \in A^\perp} 2^{-k} \chi_\gamma, \qquad \varphi_A = \sum_{\gamma \in A^\perp} \chi_\gamma.$

Proof: Let $\gamma_1, \dots, \gamma_k$ form a basis of $A^\perp$. Since $x \in A$ if and only if $\chi_{\gamma_i}(x) = 1$ for all $i \in [k]$, we have $1_A(x) = \prod_{i=1}^k \Bigl(\tfrac{1}{2} + \tfrac{1}{2} \chi_{\gamma_i}(x)\Bigr) = 2^{-k}\sum_{\gamma \in \mathrm{span}\{\gamma_1, \dots, \gamma_k\}} \chi_\gamma(x)$ as claimed, where the last equality used \eqref{eqn:dual-char}. The Fourier expansion of $\varphi_A$ follows because $|A| = 2^{n-k}$. $\Box$

More generally, suppose $A$ is affine subspace (or coset) of ${\mathbb F}_2^n$; i.e., $A = H + a$ for some $H \leq {\mathbb F}_2^n$ and $a \in {\mathbb F}_2^n$, or equivalently $A = \{x \in {\mathbb F}_2^n : \gamma \cdot x = \gamma \cdot a \text{ for all } \gamma \in H^\perp\}.$ Then it is easy (see the exercises) to extend Proposition 11 to:

Proposition 12 If $A = H+a$ is an affine subspace of codimension $k$ then $\widehat{1_A}(\gamma) = \begin{cases} \chi_{\gamma}(a) 2^{-k} & \text{if \gamma \in H^\perp} \\ 0 & \text{else;} \end{cases}$ hence $\varphi_A = \sum_{\gamma \in H^\perp} \chi_{\gamma}(a) \chi_\gamma$. We have $\mathrm{sparsity}(\widehat{1_A}) = 2^k$, $\widehat{1_A}$ is $2^{-k}$-granular, $\hat{\lVert} 1_A \hat{\rVert}_\infty = 2^{-k}$, and $\hat{\lVert} 1_A \hat{\rVert} = 1$.

In computer science terminology, any $f : {\mathbb F}_2^n \to \{0,1\}$ which is a conjunction of parity conditions is the indicator of an affine subspace (or the zero function). In the simple case that the parity conditions are all of the form “$x_i = a_i$”, the function is a logical AND of literals and we call the affine subspace a subcube.

Another class of boolean functions with simple Fourier spectra are the ones computable by simple decision trees:

Definition 13 A decision tree $T$ is a representation of a boolean function $f : {\mathbb F}_2^n \to {\mathbb R}$. It consists of a rooted binary tree in which the internal nodes are labelled by coordinates $i \in [n]$, the outgoing edges of each internal node are labelled $0$ and $1$, and the leaves are labelled by real numbers. We insist that no coordinate $i \in [n]$ appears more than once on any root-to-leaf path.

On input $x \in {\mathbb F}_2^n$, the tree $T$ constructs a computation path from the root node to a leaf. Specifically, when the computation path reaches an internal node labelled by coordinate $i \in [n]$ we say that $T$ queries $x_i$; the computation path then follows the outgoing edge labelled by $x_i$. The output of $T$ (and hence $f$) on input $x$ is the label of the leaf reached by the computation path. We often identify a tree with the function it computes.

For decision trees, a picture is worth a thousand words:

Decision tree computing Sort_3

(It’s traditional to write $x_i$ rather than $i$ for the internal node labels.) For example, the computation path of the above tree on input $x = (0,1,0) \in {\mathbb F}_2^3$ starts at the root, queries $x_1$, proceeds left, queries $x_3$, proceeds left, queries $x_2$, proceeds right, and reaches a leaf labelled $0$. In fact, this tree computes the function $\mathrm{Sort}_3$ defined by $\mathrm{Sort}_3(x) = 1$ if and only if $x_1 \leq x_2 \leq x_3$ or $x_1 \geq x_2 \geq x_3$.

Definition 14 The size $s$ of a decision tree $T$ is the total number of leaves. The depth $k$ of $T$ is the maximum length of any root-to-leaf path. For decision trees over ${\mathbb F}_2^n$ we have $k \leq n$ and $s \leq 2^k$. Given $f : {\mathbb F}_2^n \to {\mathbb R}$ we write ${\mathrm{DT}}(f)$ (respectively, ${\mathrm{DT}_{\mathrm{size}}}(f)$) for the least depth (respectively, size) of a decision tree computing $f$.

The example decision tree above has size $6$ and depth $3$.

Let $T$ be a decision tree computing $f : {\mathbb F}_2^n \to {\mathbb R}$ and let $P$ be one of its root-to-leaf paths. The set of inputs $x$ which follow computation path $P$ in $T$ is precisely a subcube of ${\mathbb F}_2^n$, call it $C_P$. The function $f$ is constant on $C_P$; we will call its value there $f(P)$. Further, since every input $x$ follows a unique path in $T$, the subcubes $\{C_P : P$ a path in $T\}$ form a partition of ${\mathbb F}_2^n$. These observations yield the following “spectral simplicity” results for decision trees:

Fact 15 Let $f : {\mathbb F}_2^n \to {\mathbb R}$ be computed by a decision tree $T$. Then $f = \sum_{\text{paths P of T}} f(P)\cdot 1_{C_P}.$

Proposition 16 Let $f : {\mathbb F}_2^n \to {\mathbb R}$ be computed by a decision tree $T$ of size $s$ and depth $k$. Then:

• $\deg(f) \leq k$;
• $\mathrm{sparsity}(\widehat{f}) \leq s 2^k \leq 4^k$;
• $\hat{\lVert} f \hat{\rVert}_1 \leq \|f\|_\infty \cdot s \leq \|f\|_\infty \cdot 2^k$;
• $\widehat{f}$ is $2^{-k}$-granular assuming $f : {\mathbb F}_2^n \to {\mathbb Z}$.

Proposition 17 Let $f : {\mathbb F}_2^n \to \{-1,1\}$ be computable by a decision tree of size $s$ and let $\epsilon \in (0,1]$. Then the spectrum of $f$ is $\epsilon$-concentrated on degree up to $\log(s/\epsilon)$.

You are asked to prove these propositions in the exercises, as well as similar spectral simplicity results for some generalizations of the decision tree representation (“subcube partitions”, “parity decision trees”).

### 8 comments to §3.2: Subspaces and decision trees

• It seems to me that for a subspace $A$ of codimension $k$ one has $\varphi_A (x) = 2^{k-n} 1_A(x)$, so the second conclusion of Prop 11 should be $$\varphi_A = 2^{-n} \sum_{\gamma\in A^\prep} \chi_\gamma$$

• Hi Lior — recall that $\varphi_A$ is a density, not a probability mass function (see §1.5). Hence it should have $\mathbf{E}[\varphi_A] = 1$.

• Yes. I misread the definition. That said, I think there’s a problem on the third line of the proof where you say $|A|=2^{-k}$. Shouldn’t this read $|A|=2^{n-k}$?

• Yep, you’re right. Thanks!

• dinesh kumar

Subspaces is defined clearly.

• Sorry, I don’t understand this comment?