In this section we treat the domain of a boolean function as ${\mathbb F}_2^n$, an $n$-dimensional vector space over the field ${\mathbb F}_2$. As mentioned earlier, it can be natural to index the Fourier characters $\chi_S : {\mathbb F}_2^n \to \{-1,1\}$ not by subsets $S \subseteq [n]$ but by their $0$-$1$ indicator vectors $\gamma \in {\mathbb F}_2^n$; thus \[ \chi_\gamma(x) = (-1)^{\gamma \cdot x}, \] with the dot product $\gamma \cdot x$ being carried out in ${\mathbb F}_2^n$.

For example, in this notation we’d write $\chi_0$ for the constantly $1$ function and $\chi_{e_i}$ for the $i$th dictator. Fact 1.6 now becomes \begin{equation} \label{eqn:dual-char} \chi_\beta \chi_\gamma = \chi_{\beta + \gamma} \quad \forall \beta, \gamma. \end{equation} Thus the characters form a group under multiplication which is isomorphic to the group ${\mathbb F}_2^n$ under addition. To distinguish this group from the input domain we write it as ${\widehat{{\mathbb F}_2^n}}$; thus the Fourier expansion of $f : {\mathbb F}_2^n \to {\mathbb R}$ can be written as \[ f(x) = \sum_{\gamma \in {\widehat{{\mathbb F}_2^n}}} \widehat{f}(\gamma) \chi_\gamma(x). \]

The Fourier transform of $f$ can be thought of as a function $\widehat{f} : {\widehat{{\mathbb F}_2^n}} \to {\mathbb R}$. We can measure its complexity with various norms.

Definition 8TheFourier (or spectral) $p$-normof $f : \{-1,1\}^n \to {\mathbb R}$ is \[ \hat{\lVert} f \hat{\rVert}_p = \left(\sum_{\gamma \in {\widehat{{\mathbb F}_2^n}}} |\widehat{f}(\gamma)|^p\right)^{1/p}. \]

Note that we use the “counting measure” on ${\widehat{{\mathbb F}_2^n}}$ and hence we have a nice rephrasing of Parseval’s Theorem: $\|f\|_2 = \hat{\lVert} f \hat{\rVert}_2$. We make two more definitions relating to the simplicity of $\widehat{f}$:

Definition 9TheFourier (or spectral) sparsityof $f : \{-1,1\}^n \to {\mathbb R}$ is \[ \mathrm{sparsity}(\widehat{f}) = |\mathrm{supp}(\widehat{f})| = \#\bigl\{\gamma \in {\widehat{{\mathbb F}_2^n}} : \widehat{f}(\gamma) \neq 0\bigr\}. \]

Definition 10We say that $\widehat{f}$ is$\epsilon$-granularif $\widehat{f}(\gamma)$ is an integer multiple of $\epsilon$ for all $\gamma \in {\widehat{{\mathbb F}_2^n}}$.

To gain some practice with this notation, let’s look at the Fourier transforms of some indicator functions $1_A : {\mathbb F}_2^n \to \{0,1\}$ and probability density functions $\varphi_A$, where $A \subseteq {\mathbb F}_2^n$. First, suppose $A \leq {\mathbb F}_2^n$ is a *subspace*. Then one way to characterize $A$ is by its *orthogonal complement* subspace $A^\perp$: \[ A = \{x \in {\mathbb F}_2^n : \gamma \cdot x = 0 \text{ for all } \gamma \in A^\perp\}. \]

Proposition 11If $A \leq {\mathbb F}_2^n$ has $\operatorname{codim} A = \dim A^\perp = k$, then \[ 1_A = \sum_{\gamma \in A^\perp} 2^{-k} \chi_\gamma, \qquad \varphi_A = \sum_{\gamma \in A^\perp} \chi_\gamma. \]

*Proof:* Let $\gamma_1, \dots, \gamma_k$ form a basis of $A^\perp$. Since $x \in A$ if and only if $\chi_{\gamma_i}(x) = 1$ for all $i \in [k]$, we have \[ 1_A(x) = \prod_{i=1}^k \Bigl(\tfrac{1}{2} + \tfrac{1}{2} \chi_{\gamma_i}(x)\Bigr) = 2^{-k}\sum_{\gamma \in \mathrm{span}\{\gamma_1, \dots, \gamma_k\}} \chi_\gamma(x) \] as claimed, where the last equality used \eqref{eqn:dual-char}. The Fourier expansion of $\varphi_A$ follows because $|A| = 2^{n-k}$. $\Box$

More generally, suppose $A$ is *affine subspace* (or *coset*) of ${\mathbb F}_2^n$; i.e., $A = H + a$ for some $H \leq {\mathbb F}_2^n$ and $a \in {\mathbb F}_2^n$, or equivalently \[ A = \{x \in {\mathbb F}_2^n : \gamma \cdot x = \gamma \cdot a \text{ for all } \gamma \in H^\perp\}. \] Then it is easy (see the exercises) to extend Proposition 11 to:

Proposition 12If $A = H+a$ is an affine subspace of codimension $k$ then \[ \widehat{1_A}(\gamma) = \begin{cases} \chi_{\gamma}(a) 2^{-k} & \text{if $\gamma \in H^\perp$} \\ 0 & \text{else;} \end{cases} \] hence $\varphi_A = \sum_{\gamma \in H^\perp} \chi_{\gamma}(a) \chi_\gamma$. We have $\mathrm{sparsity}(\widehat{1_A}) = 2^k$, $\widehat{1_A}$ is $2^{-k}$-granular, $\hat{\lVert} 1_A \hat{\rVert}_\infty = 2^{-k}$, and $\hat{\lVert} 1_A \hat{\rVert} = 1$.

In computer science terminology, any $f : {\mathbb F}_2^n \to \{0,1\}$ which is a conjunction of parity conditions is the indicator of an affine subspace (or the zero function). In the simple case that the parity conditions are all of the form “$x_i = a_i$”, the function is a logical AND of *literals* and we call the affine subspace a *subcube*.

Another class of boolean functions with simple Fourier spectra are the ones computable by simple *decision trees*:

Definition 13Adecision tree$T$ is a representation of a boolean function $f : {\mathbb F}_2^n \to {\mathbb R}$. It consists of a rooted binary tree in which the internal nodes are labelled by coordinates $i \in [n]$, the outgoing edges of each internal node are labelled $0$ and $1$, and the leaves are labelled by real numbers. We insist that no coordinate $i \in [n]$ appears more than once on any root-to-leaf path.On input $x \in {\mathbb F}_2^n$, the tree $T$ constructs a

computation pathfrom the root node to a leaf. Specifically, when the computation path reaches an internal node labelled by coordinate $i \in [n]$ we say that $T$queries$x_i$; the computation path then follows the outgoing edge labelled by $x_i$. The output of $T$ (and hence $f$) on input $x$ is the label of the leaf reached by the computation path. We often identify a tree with the function it computes.

For decision trees, a picture is worth a thousand words:

(It’s traditional to write $x_i$ rather than $i$ for the internal node labels.) For example, the computation path of the above tree on input $x = (0,1,0) \in {\mathbb F}_2^3$ starts at the root, queries $x_1$, proceeds left, queries $x_3$, proceeds left, queries $x_2$, proceeds right, and reaches a leaf labelled $0$. In fact, this tree computes the function $\mathrm{Sort}_3$ defined by $\mathrm{Sort}_3(x) = 1$ if and only if $x_1 \leq x_2 \leq x_3$ or $x_1 \geq x_2 \geq x_3$.

Definition 14Thesize$s$ of a decision tree $T$ is the total number of leaves. Thedepth$k$ of $T$ is the maximum length of any root-to-leaf path. For decision trees over ${\mathbb F}_2^n$ we have $k \leq n$ and $s \leq 2^k$. Given $f : {\mathbb F}_2^n \to {\mathbb R}$ we write ${\mathrm{DT}}(f)$ (respectively, ${\mathrm{DT}_{\mathrm{size}}}(f)$) for the least depth (respectively, size) of a decision tree computing $f$.

The example decision tree above has size $6$ and depth $3$.

Let $T$ be a decision tree computing $f : {\mathbb F}_2^n \to {\mathbb R}$ and let $P$ be one of its root-to-leaf paths. The set of inputs $x$ which follow computation path $P$ in $T$ is precisely a subcube of ${\mathbb F}_2^n$, call it $C_P$. The function $f$ is constant on $C_P$; we will call its value there $f(P)$. Further, since every input $x$ follows a unique path in $T$, the subcubes $\{C_P : P $ a path in $T\}$ form a *partition* of ${\mathbb F}_2^n$. These observations yield the following “spectral simplicity” results for decision trees:

Fact 15Let $f : {\mathbb F}_2^n \to {\mathbb R}$ be computed by a decision tree $T$. Then \[ f = \sum_{\text{paths $P$ of $T$}} f(P)\cdot 1_{C_P}. \]

Proposition 16Let $f : {\mathbb F}_2^n \to {\mathbb R}$ be computed by a decision tree $T$ of size $s$ and depth $k$. Then:

- $\deg(f) \leq k$;
- $\mathrm{sparsity}(\widehat{f}) \leq s 2^k \leq 4^k$;
- $\hat{\lVert} f \hat{\rVert}_1 \leq \|f\|_\infty \cdot s \leq \|f\|_\infty \cdot 2^k$;
- $\widehat{f}$ is $2^{-k}$-granular assuming $f : {\mathbb F}_2^n \to {\mathbb Z}$.

Proposition 17Let $f : {\mathbb F}_2^n \to \{-1,1\}$ be computable by a decision tree of size $s$ and let $\epsilon \in (0,1]$. Then the spectrum of $f$ is $\epsilon$-concentrated on degree up to $\log(s/\epsilon)$.

You are asked to prove these propositions in the exercises, as well as similar spectral simplicity results for some generalizations of the decision tree representation (“subcube partitions”, “parity decision trees”).

It seems to me that for a subspace $A$ of codimension $k$ one has $\varphi_A (x) = 2^{k-n} 1_A(x)$, so the second conclusion of Prop 11 should be $$\varphi_A = 2^{-n} \sum_{\gamma\in A^\prep} \chi_\gamma$$

Hi Lior — recall that $\varphi_A$ is a density, not a probability mass function (see §1.5). Hence it should have $\mathbf{E}[\varphi_A] = 1$.

Yes. I misread the definition. That said, I think there’s a problem on the third line of the proof where you say $|A|=2^{-k}$. Shouldn’t this read $|A|=2^{n-k}$?

Yep, you’re right. Thanks!

Subspaces is defined clearly.

Sorry, I don’t understand this comment?