As we have seen, the Fourier expansion of $f : \{-1,1\}^n \to {\mathbb R}$ can be thought of as the representation of $f$ over the orthonormal basis of parity functions $(\chi_S)_{S \subseteq [n]}$. In this basis, $f$ has $2^n$ “coordinates”, and these are precisely the Fourier coefficients of $f$. The “coordinate” of $f$ in the $\chi_S$ “direction” is $\langle f, \chi_S\rangle$; i.e., we have the following formula for Fourier coefficients:

Proposition 8For $f : \{-1,1\}^n \to {\mathbb R}$ and $S \subseteq [n]$, the Fourier coefficient of $f$ on $S$ is given by \begin{equation*} \widehat{f}(S) = \langle f, \chi_S \rangle = \mathop{\bf E}_{\boldsymbol{x} \sim \{-1,1\}^n}[f(\boldsymbol{x}) \chi_S(\boldsymbol{x})]. \end{equation*}

We can verify this formula explicitly: \begin{equation} \label{eqn:fourier-coeff-verification} \langle f, \chi_S \rangle = \left\langle \sum_{T \subseteq [n]} \widehat{f}(T)\,\chi_T, \chi_S \right\rangle = \sum_{T \subseteq [n]} \widehat{f}(T) \langle \chi_T, \chi_S \rangle = \widehat{f}(S), \end{equation} where we used the Fourier expansion of $f$, the linearity of $\langle \cdot, \cdot \rangle$ and finally Theorem 5. This formula is the simplest way to calculate the Fourier coefficients of a given function; it can also be viewed as a streamlined version of the interpolation method illustrated in equation (3) here. Alternatively, this formula can be taken as the *definition* of Fourier coefficients.

The orthonormal basis of parities also lets us measure the squared “length” ($2$-norm) of $f : \{-1,1\}^n \to {\mathbb R}$ efficiently: it’s just the sum of the squares of $f$’s “coordinates” — i.e., Fourier coefficients. This simple but crucial fact is called *Parseval’s Theorem*.

Parseval’s TheoremFor any $f : \{-1,1\}^n \to {\mathbb R}$, \[ \langle f, f \rangle = \mathop{\bf E}_{\boldsymbol{x} \sim \{-1,1\}^n}[f(\boldsymbol{x})^2] = \sum_{S \subseteq [n]} \widehat{f}(S)^2. \] In particular, if $f : \{-1,1\}^n \to \{-1,1\}$ is boolean-valued then \[ \sum_{S \subseteq [n]} \widehat{f}(S)^2 = 1. \]

As examples we can recall the Fourier expansions of ${\textstyle \min_2}$ and $\mathrm{Maj}_3$: \[ {\textstyle \min_2}(x) = -\tfrac12 + \tfrac12 x_1 + \tfrac12 x_2 + \tfrac12 x_1 x_2, \qquad \mathrm{Maj}_3(x) = \tfrac{1}{2} x_1 + \tfrac{1}{2} x_2 + \tfrac{1}{2} x_3 - \tfrac{1}{2} x_1x_2x_3. \] In both cases the sum of squares of Fourier coefficients is $4 \times (1/4) = 1$.

More generally, given two functions $f, g : \{-1,1\}^n \to {\mathbb R}$, we can compute $\langle f, g \rangle$ by taking the “dot product” of their coordinates in the orthonormal basis of parities. The resulting formula is called *Plancherel’s Theorem*.

Plancherel’s TheoremFor any $f, g : \{-1,1\}^n \to {\mathbb R}$, \[ \langle f, g \rangle = \mathop{\bf E}_{\boldsymbol{x} \sim \{-1,1\}^n}[f(\boldsymbol{x})g(\boldsymbol{x})] = \sum_{S \subseteq [n]} \widehat{f}(S) \widehat{g}(S). \]

We can verify this formula explicitly as we did in \eqref{eqn:fourier-coeff-verification}: \[ \langle f, g \rangle = \Bigl\langle \sum_{S \subseteq [n]} \widehat{f}(S)\,\chi_S, \sum_{T \subseteq [n]} \widehat{g}(T)\,\chi_T \Bigr\rangle = \sum_{S, T \subseteq [n]} \widehat{f}(S)\widehat{g}(T) \langle \chi_S, \chi_T \rangle = \sum_{S\subseteq [n]} \widehat{f}(S)\widehat{g}(S). \]

Now is a good time to remark that for boolean-valued functions $f, g : \{-1,1\}^n \to \{-1,1\}$, the inner product $\langle f, g \rangle$ can be interpreted as a kind of “correlation” between $f$ and $g$, measuring how similar they are. Since $f(x)g(x) = 1$ if $f(x) = g(x)$ and $f(x)g(x) = -1$ if $f(x) \neq g(x)$, we have:

Proposition 9If $f, g : \{-1,1\}^n \to \{-1,1\}$, \[ \langle f, g \rangle = \mathop{\bf Pr}[f(\boldsymbol{x}) = g(\boldsymbol{x})] – \mathop{\bf Pr}[f(\boldsymbol{x}) \neq g(\boldsymbol{x})] = 1 – 2\mathrm{dist}(f,g). \]

Here we are using the following definition:

Definition 10Given $f, g : \{-1,1\}^n \to \{-1,1\}$, we define their(relative Hamming) distanceto be \[ \mathrm{dist}(f,g) = \mathop{\bf Pr}_{\boldsymbol{x}}[f(\boldsymbol{x}) \neq g(\boldsymbol{x})], \] the fraction of inputs on which they disagree.

With a number of Fourier formulas now in hand we can begin to illustrate a basic theme in the analysis of boolean functions: interesting combinatorial properties of a boolean function $f$ can be “read off” from its Fourier coefficients. Let’s start by looking at one way to measure the “bias” of $f$:

Definition 11Themeanof $f : \{-1,1\}^n \to {\mathbb R}$ is $\mathop{\bf E}[f]$. When $f$ has mean $0$ we say that it isunbiased, orbalanced. In the particular case that $f : \{-1,1\}^n \to \{-1,1\}$ is boolean-valued, its mean is \[ \mathop{\bf E}[f] = \mathop{\bf Pr}[f = 1] – \mathop{\bf Pr}[f = -1]; \] thus $f$ is unbiased if and only if it takes value $1$ on exactly half of the points of the Hamming cube.

Fact 12If $f : \{-1,1\}^n \to {\mathbb R}$ then $\mathop{\bf E}[f] = \widehat{f}(\emptyset)$.

This formula holds simply because $\mathop{\bf E}[f] = \langle f, 1 \rangle = \widehat{f}(\emptyset)$ (taking $S = \emptyset$ in Proposition 8). In particular, a boolean function is unbiased if and only if its empty-set Fourier coefficient is $0$.

Next we obtain a formula for the *variance* of a real-valued boolean function (thinking of $f(\boldsymbol{x})$ as a real-valued random variable):

Proposition 13Thevarianceof $f : \{-1,1\}^n \to {\mathbb R}$ is \[ \mathop{\bf Var}[f] = \langle f – \mathop{\bf E}[f], f – \mathop{\bf E}[f] \rangle = \mathop{\bf E}[f^2] – \mathop{\bf E}[f]^2 = \sum_{S \neq \emptyset} \widehat{f}(S)^2. \]

This Fourier formula follows immediately from Parseval’s Theorem and Fact 12.

Fact 14For $f : \{-1,1\}^n \to \{-1,1\}$, $\mathop{\bf Var}[f] = 1 – \mathop{\bf E}[f]^2 = 4 \mathop{\bf Pr}[f(\boldsymbol{x}) = 1] \mathop{\bf Pr}[f(\boldsymbol{x}) = -1] \in [0,1]$.

In particular, a boolean-valued function $f$ has variance $1$ if it’s unbiased and variance $0$ if it’s constant. More generally, the variance of a boolean-valued function is proportional to its “distance from being constant”.

Proposition 15Let $f : \{-1,1\}^n \to \{-1,1\}$. Then $2 \epsilon \leq \mathop{\bf Var}[f] \leq 4\epsilon$, where \[ \epsilon = \min\{\mathrm{dist}(f, 1), \mathrm{dist}(f, -1)\}. \]

The proof of Proposition 15 is an exercise.

By using Plancherel in place of Parseval, we get a generalization of Proposition 13 for *covariance*:

Proposition 16Thecovarianceof $f, g : \{-1,1\}^n \to {\mathbb R}$ is \[ \mathop{\bf Cov}[f,g] = \langle f – \mathop{\bf E}[f], g – \mathop{\bf E}[g] \rangle = \mathop{\bf E}[fg] – \mathop{\bf E}[f]\mathop{\bf E}[g] = \sum_{S \neq \emptyset} \widehat{f}(S)\widehat{g}(S). \]

We end this section by discussing the *Fourier weight distribution* of boolean functions.

Definition 17The(Fourier) weightof $f : \{-1,1\}^n \to {\mathbb R}$ on set $S$ is defined to be the squared Fourier coefficient, $\widehat{f}(S)^2$.

Although we lose some information about the Fourier coefficients when we square them, many Fourier formulas only depend on the weights of $f$. For example, Proposition 13 says that the variance of $f$ equals its Fourier weight on nonempty sets. Studying Fourier weights is particularly pleasant for boolean-valued functions $f : \{-1,1\}^n \to \{-1,1\}$ since Parseval’s Theorem says that they always have total weight $1$. In particular, they define a *probability distribution* on subsets of $[n]$.

Definition 18Given $f : \{-1,1\}^n \to \{-1,1\}$, thespectral samplefor $f$, denoted $\mathscr{S}_{f}$, is the probability distribution on subsets of $[n]$ in which the set $S$ has probability $\widehat{f}(S)^2$. We write $\boldsymbol{S} \sim \mathscr{S}_{f}$ for a draw from this distribution.

For example, the spectral sample for the ${\textstyle \min_2}$ function is the uniform distribution on all four subsets of $[2]$; the spectral sample for $\mathrm{Maj}_3$ is the uniform distribution on the four subsets of $[3]$ with odd cardinality.

Given a boolean function it can be helpful to try to keep a mental picture of its weight distribution on the subsets of $[n]$, partially ordered by inclusion. Here is an example for the $\mathrm{Maj}_3$ function, with the white circles indicating weight $0$ and the shaded circles indicating weight $1/4$:

Finally, as suggested by the diagram we often stratify the subsets $S \subseteq [n]$ according to their cardinality (also called “height” or “level”). Equivalently, this is the *degree* of the associated monomial $x^S$.

Definition 19For $f : \{-1,1\}^n \to {\mathbb R}$ and $0 \leq k \leq n$, the(Fourier) weight of $f$ at degree $k$is \[ \mathbf{W}^{k}[f] = \sum_{\substack{S \subseteq [n] \\ |S| = k}} \widehat{f}(S)^2. \] If $f : \{-1,1\}^n \to \{-1,1\}$ is boolean-valued, an equivalent definition is \[ \mathbf{W}^{k}[f] = \mathop{\bf Pr}_{\boldsymbol{S} \sim \mathscr{S}_{f}}[|S| = k]. \] By Parseval’s Theorem, $\mathbf{W}^{k}[f] = \|f^{=k}\|_2^2$ where \[ f^{=k} = \sum_{|S| = k} \widehat{f}(S)\,\chi_S \] is called thedegree $k$ part of $f$. We will also sometimes use notation like $\mathbf{W}^{> k}[f] = \sum_{|S| > k} \widehat{f}(S)^2$ and $f^{\leq k} = \sum_{|S| \leq k} \widehat{f}(S)\,\chi_S$.

In Prop 13, is there a reason to use the definition Var(f)=E[f*(f-E[f])] instead of the “standard” E[(f-E[f])^2]? I see that it works out the same, but initially thought it was a typo.

Technically this section is in the Category “Uncategorized”, unsure if that matters.

Fixed both, thanks!

Might it also be better to use the standard covariance definition: Covar(f, g) = E[(f-E(f))*(g-E(g))] in Prop. 16?

Also, a couple of typos?

“For example, Proposition 13 says that the variance of f equals its Fourier weight on non-empty sets ..” is only loosely correct? It might be more correct to say: “… equals the sum of its Fourier weights on non-empty sets ..”

and, just above the weight distribution figure, there is a missing “be”: “Given a boolean function it can *be* helpful to try to keep ..”

Thanks plv! I thought it might be cluttered with the extra (equivalent) definition of covariance in there, but actually it looks fine and your suggestion should be helpful. Fixed the last typo too. I will be using phrases like “f’s Fourier weight on [some collection of sets]” in the future, so I hope it’s not too confusing; I left the comment about variance as is.

In Definition 19, in the last row, the last definition should be for W^{\le k} and not f^{\le k}.

Never mind, it’s correct. I was confused by the different inequality signs.

Def 17 is missing a definite article (“on [the] set”)

Hi Ryan, and thanks a lot!

In the proof of Plancherel’s Theorem, after “$\langle f, g \rangle =$” there are two $f$s rather than $f$ and $g$.

Bye,

Daniele

Thanks, fixed!

Should the first sentence have the word “as” in it? …can be thought of AS the representation…

Yep, thanks!

Nitpick: I think fact 14 is a consequence of Parseval’s Theorem and Definition 11 (not fact 12).

Um, I think it’s okay as stated, no?

Never mind, maybe I am missing something.

Thanks for your notes.

I think its interesting to take the Proposition 9 as cosine theorem: $4*dist(f,g) = |f-g|^2 = |f|^2+|g|^2-2$.