In this section we treat the domain of a boolean function as ${\mathbb F}_2^n$, an $n$-dimensional vector space over the field ${\mathbb F}_2$. As mentioned earlier, it can be natural to index the Fourier characters $\chi_S : {\mathbb F}_2^n \to \{-1,1\}$ not by subsets $S \subseteq [n]$ but by their $0$-$1$ indicator vectors $\gamma \in {\mathbb F}_2^n$; thus \[ \chi_\gamma(x) = (-1)^{\gamma \cdot x}, \] with the dot product $\gamma \cdot x$ being carried out in ${\mathbb F}_2^n$.
For example, in this notation we’d write $\chi_0$ for the constantly $1$ function and $\chi_{e_i}$ for the $i$th dictator. Fact 1.6 now becomes \begin{equation} \label{eqn:dual-char} \chi_\beta \chi_\gamma = \chi_{\beta + \gamma} \quad \forall \beta, \gamma. \end{equation} Thus the characters form a group under multiplication which is isomorphic to the group ${\mathbb F}_2^n$ under addition. To distinguish this group from the input domain we write it as ${\widehat{{\mathbb F}_2^n}}$; thus the Fourier expansion of $f : {\mathbb F}_2^n \to {\mathbb R}$ can be written as \[ f(x) = \sum_{\gamma \in {\widehat{{\mathbb F}_2^n}}} \widehat{f}(\gamma) \chi_\gamma(x). \]
The Fourier transform of $f$ can be thought of as a function $\widehat{f} : {\widehat{{\mathbb F}_2^n}} \to {\mathbb R}$. We can measure its complexity with various norms.
Definition 8 The Fourier (or spectral) $p$-norm of $f : \{-1,1\}^n \to {\mathbb R}$ is \[ \hat{\lVert} f \hat{\rVert}_p = \left(\sum_{\gamma \in {\widehat{{\mathbb F}_2^n}}} |\widehat{f}(\gamma)|^p\right)^{1/p}. \]
Note that we use the “counting measure” on ${\widehat{{\mathbb F}_2^n}}$ and hence we have a nice rephrasing of Parseval’s Theorem: $\|f\|_2 = \hat{\lVert} f \hat{\rVert}_2$. We make two more definitions relating to the simplicity of $\widehat{f}$:
Definition 9 The Fourier (or spectral) sparsity of $f : \{-1,1\}^n \to {\mathbb R}$ is \[ \mathrm{sparsity}(\widehat{f}) = |\mathrm{supp}(\widehat{f})| = \#\bigl\{\gamma \in {\widehat{{\mathbb F}_2^n}} : \widehat{f}(\gamma) \neq 0\bigr\}. \]
Definition 10 We say that $\widehat{f}$ is $\epsilon$-granular if $\widehat{f}(\gamma)$ is an integer multiple of $\epsilon$ for all $\gamma \in {\widehat{{\mathbb F}_2^n}}$.
To gain some practice with this notation, let’s look at the Fourier transforms of some indicator functions $1_A : {\mathbb F}_2^n \to \{0,1\}$ and probability density functions $\varphi_A$, where $A \subseteq {\mathbb F}_2^n$. First, suppose $A \leq {\mathbb F}_2^n$ is a subspace. Then one way to characterize $A$ is by its orthogonal complement subspace $A^\perp$: \[ A = \{x \in {\mathbb F}_2^n : \gamma \cdot x = 0 \text{ for all } \gamma \in A^\perp\}. \]
Proposition 11 If $A \leq {\mathbb F}_2^n$ has $\operatorname{codim} A = \dim A^\perp = k$, then \[ 1_A = \sum_{\gamma \in A^\perp} 2^{-k} \chi_\gamma, \qquad \varphi_A = \sum_{\gamma \in A^\perp} \chi_\gamma. \]
Proof: Let $\gamma_1, \dots, \gamma_k$ form a basis of $A^\perp$. Since $x \in A$ if and only if $\chi_{\gamma_i}(x) = 1$ for all $i \in [k]$, we have \[ 1_A(x) = \prod_{i=1}^k \Bigl(\tfrac{1}{2} + \tfrac{1}{2} \chi_{\gamma_i}(x)\Bigr) = 2^{-k}\sum_{\gamma \in \mathrm{span}\{\gamma_1, \dots, \gamma_k\}} \chi_\gamma(x) \] as claimed, where the last equality used \eqref{eqn:dual-char}. The Fourier expansion of $\varphi_A$ follows because $|A| = 2^{n-k}$. $\Box$
More generally, suppose $A$ is affine subspace (or coset) of ${\mathbb F}_2^n$; i.e., $A = H + a$ for some $H \leq {\mathbb F}_2^n$ and $a \in {\mathbb F}_2^n$, or equivalently \[ A = \{x \in {\mathbb F}_2^n : \gamma \cdot x = \gamma \cdot a \text{ for all } \gamma \in H^\perp\}. \] Then it is easy (see the exercises) to extend Proposition 11 to:
Proposition 12 If $A = H+a$ is an affine subspace of codimension $k$ then \[ \widehat{1_A}(\gamma) = \begin{cases} \chi_{\gamma}(a) 2^{-k} & \text{if $\gamma \in H^\perp$} \\ 0 & \text{else;} \end{cases} \] hence $\varphi_A = \sum_{\gamma \in H^\perp} \chi_{\gamma}(a) \chi_\gamma$. We have $\mathrm{sparsity}(\widehat{1_A}) = 2^k$, $\widehat{1_A}$ is $2^{-k}$-granular, $\hat{\lVert} 1_A \hat{\rVert}_\infty = 2^{-k}$, and $\hat{\lVert} 1_A \hat{\rVert} = 1$.
In computer science terminology, any $f : {\mathbb F}_2^n \to \{0,1\}$ which is a conjunction of parity conditions is the indicator of an affine subspace (or the zero function). In the simple case that the parity conditions are all of the form “$x_i = a_i$”, the function is a logical AND of literals and we call the affine subspace a subcube.
Another class of boolean functions with simple Fourier spectra are the ones computable by simple decision trees:
Definition 13 A decision tree $T$ is a representation of a boolean function $f : {\mathbb F}_2^n \to {\mathbb R}$. It consists of a rooted binary tree in which the internal nodes are labelled by coordinates $i \in [n]$, the outgoing edges of each internal node are labelled $0$ and $1$, and the leaves are labelled by real numbers. We insist that no coordinate $i \in [n]$ appears more than once on any root-to-leaf path.
On input $x \in {\mathbb F}_2^n$, the tree $T$ constructs a computation path from the root node to a leaf. Specifically, when the computation path reaches an internal node labelled by coordinate $i \in [n]$ we say that $T$ queries $x_i$; the computation path then follows the outgoing edge labelled by $x_i$. The output of $T$ (and hence $f$) on input $x$ is the label of the leaf reached by the computation path. We often identify a tree with the function it computes.
For decision trees, a picture is worth a thousand words:
(It’s traditional to write $x_i$ rather than $i$ for the internal node labels.) For example, the computation path of the above tree on input $x = (0,1,0) \in {\mathbb F}_2^3$ starts at the root, queries $x_1$, proceeds left, queries $x_3$, proceeds left, queries $x_2$, proceeds right, and reaches a leaf labelled $0$. In fact, this tree computes the function $\mathrm{Sort}_3$ defined by $\mathrm{Sort}_3(x) = 1$ if and only if $x_1 \leq x_2 \leq x_3$ or $x_1 \geq x_2 \geq x_3$.
Definition 14 The size $s$ of a decision tree $T$ is the total number of leaves. The depth $k$ of $T$ is the maximum length of any root-to-leaf path. For decision trees over ${\mathbb F}_2^n$ we have $k \leq n$ and $s \leq 2^k$. Given $f : {\mathbb F}_2^n \to {\mathbb R}$ we write ${\mathrm{DT}}(f)$ (respectively, ${\mathrm{DT}_{\mathrm{size}}}(f)$) for the least depth (respectively, size) of a decision tree computing $f$.
The example decision tree above has size $6$ and depth $3$.
Let $T$ be a decision tree computing $f : {\mathbb F}_2^n \to {\mathbb R}$ and let $P$ be one of its root-to-leaf paths. The set of inputs $x$ which follow computation path $P$ in $T$ is precisely a subcube of ${\mathbb F}_2^n$, call it $C_P$. The function $f$ is constant on $C_P$; we will call its value there $f(P)$. Further, since every input $x$ follows a unique path in $T$, the subcubes $\{C_P : P $ a path in $T\}$ form a partition of ${\mathbb F}_2^n$. These observations yield the following “spectral simplicity” results for decision trees:
Fact 15 Let $f : {\mathbb F}_2^n \to {\mathbb R}$ be computed by a decision tree $T$. Then \[ f = \sum_{\text{paths $P$ of $T$}} f(P)\cdot 1_{C_P}. \]
Proposition 16 Let $f : {\mathbb F}_2^n \to {\mathbb R}$ be computed by a decision tree $T$ of size $s$ and depth $k$. Then:
- $\deg(f) \leq k$;
- $\mathrm{sparsity}(\widehat{f}) \leq s 2^k \leq 4^k$;
- $\hat{\lVert} f \hat{\rVert}_1 \leq \|f\|_\infty \cdot s \leq \|f\|_\infty \cdot 2^k$;
- $\widehat{f}$ is $2^{-k}$-granular assuming $f : {\mathbb F}_2^n \to {\mathbb Z}$.
Proposition 17 Let $f : {\mathbb F}_2^n \to \{-1,1\}$ be computable by a decision tree of size $s$ and let $\epsilon \in (0,1]$. Then the spectrum of $f$ is $\epsilon$-concentrated on degree up to $\log(s/\epsilon)$.
You are asked to prove these propositions in the exercises, as well as similar spectral simplicity results for some generalizations of the decision tree representation (“subcube partitions”, “parity decision trees”).


It seems to me that for a subspace $A$ of codimension $k$ one has $\varphi_A (x) = 2^{k-n} 1_A(x)$, so the second conclusion of Prop 11 should be $$\varphi_A = 2^{-n} \sum_{\gamma\in A^\prep} \chi_\gamma$$
Hi Lior — recall that $\varphi_A$ is a density, not a probability mass function (see §1.5). Hence it should have $\mathbf{E}[\varphi_A] = 1$.
Yes. I misread the definition. That said, I think there’s a problem on the third line of the proof where you say $|A|=2^{-k}$. Shouldn’t this read $|A|=2^{n-k}$?
Yep, you’re right. Thanks!
Subspaces is defined clearly.
Sorry, I don’t understand this comment?