# §1.4: Basic Fourier formulas

As we have seen, the Fourier expansion of $f : \{-1,1\}^n \to {\mathbb R}$ can be thought of as the representation of $f$ over the orthonormal basis of parity functions $(\chi_S)_{S \subseteq [n]}$. In this basis, $f$ has $2^n$ “coordinates”, and these are precisely the Fourier coefficients of $f$. The “coordinate” of $f$ in the $\chi_S$ “direction” is $\langle f, \chi_S\rangle$; i.e., we have the following formula for Fourier coefficients:

Proposition 8 For $f : \{-1,1\}^n \to {\mathbb R}$ and $S \subseteq [n]$, the Fourier coefficient of $f$ on $S$ is given by \begin{equation*} \widehat{f}(S) = \langle f, \chi_S \rangle = \mathop{\bf E}_{\boldsymbol{x} \sim \{-1,1\}^n}[f(\boldsymbol{x}) \chi_S(\boldsymbol{x})]. \end{equation*}

We can verify this formula explicitly: \begin{equation} \label{eqn:fourier-coeff-verification} \langle f, \chi_S \rangle = \left\langle \sum_{T \subseteq [n]} \widehat{f}(T)\,\chi_T, \chi_S \right\rangle = \sum_{T \subseteq [n]} \widehat{f}(T) \langle \chi_T, \chi_S \rangle = \widehat{f}(S), \end{equation} where we used the Fourier expansion of $f$, the linearity of $\langle \cdot, \cdot \rangle$ and finally Theorem 5. This formula is the simplest way to calculate the Fourier coefficients of a given function; it can also be viewed as a streamlined version of the interpolation method illustrated in equation (3) here. Alternatively, this formula can be taken as the definition of Fourier coefficients.

The orthonormal basis of parities also lets us measure the squared “length” ($2$-norm) of $f : \{-1,1\}^n \to {\mathbb R}$ efficiently: it’s just the sum of the squares of $f$’s “coordinates” — i.e., Fourier coefficients. This simple but crucial fact is called Parseval’s Theorem.

Parseval’s Theorem For any $f : \{-1,1\}^n \to {\mathbb R}$, $\langle f, f \rangle = \mathop{\bf E}_{\boldsymbol{x} \sim \{-1,1\}^n}[f(\boldsymbol{x})^2] = \sum_{S \subseteq [n]} \widehat{f}(S)^2.$ In particular, if $f : \{-1,1\}^n \to \{-1,1\}$ is boolean-valued then $\sum_{S \subseteq [n]} \widehat{f}(S)^2 = 1.$

As examples we can recall the Fourier expansions of ${\textstyle \min_2}$ and $\mathrm{Maj}_3$: ${\textstyle \min_2}(x) = -\tfrac12 + \tfrac12 x_1 + \tfrac12 x_2 + \tfrac12 x_1 x_2, \qquad \mathrm{Maj}_3(x) = \tfrac{1}{2} x_1 + \tfrac{1}{2} x_2 + \tfrac{1}{2} x_3 - \tfrac{1}{2} x_1x_2x_3.$ In both cases the sum of squares of Fourier coefficients is $4 \times (1/4) = 1$.

More generally, given two functions $f, g : \{-1,1\}^n \to {\mathbb R}$, we can compute $\langle f, g \rangle$ by taking the “dot product” of their coordinates in the orthonormal basis of parities. The resulting formula is called Plancherel’s Theorem.

Plancherel’s Theorem For any $f, g : \{-1,1\}^n \to {\mathbb R}$, $\langle f, g \rangle = \mathop{\bf E}_{\boldsymbol{x} \sim \{-1,1\}^n}[f(\boldsymbol{x})g(\boldsymbol{x})] = \sum_{S \subseteq [n]} \widehat{f}(S) \widehat{g}(S).$

We can verify this formula explicitly as we did in \eqref{eqn:fourier-coeff-verification}: $\langle f, g \rangle = \Bigl\langle \sum_{S \subseteq [n]} \widehat{f}(S)\,\chi_S, \sum_{T \subseteq [n]} \widehat{g}(T)\,\chi_T \Bigr\rangle = \sum_{S, T \subseteq [n]} \widehat{f}(S)\widehat{g}(T) \langle \chi_S, \chi_T \rangle = \sum_{S\subseteq [n]} \widehat{f}(S)\widehat{g}(S).$

Now is a good time to remark that for boolean-valued functions $f, g : \{-1,1\}^n \to \{-1,1\}$, the inner product $\langle f, g \rangle$ can be interpreted as a kind of “correlation” between $f$ and $g$, measuring how similar they are. Since $f(x)g(x) = 1$ if $f(x) = g(x)$ and $f(x)g(x) = -1$ if $f(x) \neq g(x)$, we have:

Proposition 9 If $f, g : \{-1,1\}^n \to \{-1,1\}$, $\langle f, g \rangle = \mathop{\bf Pr}[f(\boldsymbol{x}) = g(\boldsymbol{x})] – \mathop{\bf Pr}[f(\boldsymbol{x}) \neq g(\boldsymbol{x})] = 1 – 2\mathrm{dist}(f,g).$

Here we are using the following definition:

Definition 10 Given $f, g : \{-1,1\}^n \to \{-1,1\}$, we define their (relative Hamming) distance to be $\mathrm{dist}(f,g) = \mathop{\bf Pr}_{\boldsymbol{x}}[f(\boldsymbol{x}) \neq g(\boldsymbol{x})],$ the fraction of inputs on which they disagree.

With a number of Fourier formulas now in hand we can begin to illustrate a basic theme in the analysis of boolean functions: interesting combinatorial properties of a boolean function $f$ can be “read off” from its Fourier coefficients. Let’s start by looking at one way to measure the “bias” of $f$:

Definition 11 The mean of $f : \{-1,1\}^n \to {\mathbb R}$ is $\mathop{\bf E}[f]$. When $f$ has mean $0$ we say that it is unbiased, or balanced. In the particular case that $f : \{-1,1\}^n \to \{-1,1\}$ is boolean-valued, its mean is $\mathop{\bf E}[f] = \mathop{\bf Pr}[f = 1] – \mathop{\bf Pr}[f = -1];$ thus $f$ is unbiased if and only if it takes value $1$ on exactly half of the points of the Hamming cube.

Fact 12 If $f : \{-1,1\}^n \to {\mathbb R}$ then $\mathop{\bf E}[f] = \widehat{f}(\emptyset)$.

This formula holds simply because $\mathop{\bf E}[f] = \langle f, 1 \rangle = \widehat{f}(\emptyset)$ (taking $S = \emptyset$ in Proposition 8). In particular, a boolean function is unbiased if and only if its empty-set Fourier coefficient is $0$.

Next we obtain a formula for the variance of a real-valued boolean function (thinking of $f(\boldsymbol{x})$ as a real-valued random variable):

Proposition 13 The variance of $f : \{-1,1\}^n \to {\mathbb R}$ is $\mathop{\bf Var}[f] = \langle f – \mathop{\bf E}[f], f – \mathop{\bf E}[f] \rangle = \mathop{\bf E}[f^2] – \mathop{\bf E}[f]^2 = \sum_{S \neq \emptyset} \widehat{f}(S)^2.$

This Fourier formula follows immediately from Parseval’s Theorem and Fact 12.

Fact 14 For $f : \{-1,1\}^n \to \{-1,1\}$, $\mathop{\bf Var}[f] = 1 – \mathop{\bf E}[f]^2 = 4 \mathop{\bf Pr}[f(\boldsymbol{x}) = 1] \mathop{\bf Pr}[f(\boldsymbol{x}) = -1] \in [0,1]$.

In particular, a boolean-valued function $f$ has variance $1$ if it’s unbiased and variance $0$ if it’s constant. More generally, the variance of a boolean-valued function is proportional to its “distance from being constant”.

Proposition 15 Let $f : \{-1,1\}^n \to \{-1,1\}$. Then $2 \epsilon \leq \mathop{\bf Var}[f] \leq 4\epsilon$, where $\epsilon = \min\{\mathrm{dist}(f, 1), \mathrm{dist}(f, -1)\}.$

The proof of Proposition 15 is an exercise.

By using Plancherel in place of Parseval, we get a generalization of Proposition 13 for covariance:

Proposition 16 The covariance of $f, g : \{-1,1\}^n \to {\mathbb R}$ is $\mathop{\bf Cov}[f,g] = \langle f – \mathop{\bf E}[f], g – \mathop{\bf E}[g] \rangle = \mathop{\bf E}[fg] – \mathop{\bf E}[f]\mathop{\bf E}[g] = \sum_{S \neq \emptyset} \widehat{f}(S)\widehat{g}(S).$

We end this section by discussing the Fourier weight distribution of boolean functions.

Definition 17 The (Fourier) weight of $f : \{-1,1\}^n \to {\mathbb R}$ on set $S$ is defined to be the squared Fourier coefficient, $\widehat{f}(S)^2$.

Although we lose some information about the Fourier coefficients when we square them, many Fourier formulas only depend on the weights of $f$. For example, Proposition 13 says that the variance of $f$ equals its Fourier weight on nonempty sets. Studying Fourier weights is particularly pleasant for boolean-valued functions $f : \{-1,1\}^n \to \{-1,1\}$ since Parseval’s Theorem says that they always have total weight $1$. In particular, they define a probability distribution on subsets of $[n]$.

Definition 18 Given $f : \{-1,1\}^n \to \{-1,1\}$, the spectral sample for $f$, denoted $\mathscr{S}_{f}$, is the probability distribution on subsets of $[n]$ in which the set $S$ has probability $\widehat{f}(S)^2$. We write $\boldsymbol{S} \sim \mathscr{S}_{f}$ for a draw from this distribution.

For example, the spectral sample for the ${\textstyle \min_2}$ function is the uniform distribution on all four subsets of $$; the spectral sample for $\mathrm{Maj}_3$ is the uniform distribution on the four subsets of $$ with odd cardinality.

Given a boolean function it can be helpful to try to keep a mental picture of its weight distribution on the subsets of $[n]$, partially ordered by inclusion. Here is an example for the $\mathrm{Maj}_3$ function, with the white circles indicating weight $0$ and the shaded circles indicating weight $1/4$:

Finally, as suggested by the diagram we often stratify the subsets $S \subseteq [n]$ according to their cardinality (also called “height” or “level”). Equivalently, this is the degree of the associated monomial $x^S$.

Definition 19 For $f : \{-1,1\}^n \to {\mathbb R}$ and $0 \leq k \leq n$, the (Fourier) weight of $f$ at degree $k$ is $\mathbf{W}^{k}[f] = \sum_{\substack{S \subseteq [n] \\ |S| = k}} \widehat{f}(S)^2.$ If $f : \{-1,1\}^n \to \{-1,1\}$ is boolean-valued, an equivalent definition is $\mathbf{W}^{k}[f] = \mathop{\bf Pr}_{\boldsymbol{S} \sim \mathscr{S}_{f}}[|S| = k].$ By Parseval’s Theorem, $\mathbf{W}^{k}[f] = \|f^{=k}\|_2^2$ where $f^{=k} = \sum_{|S| = k} \widehat{f}(S)\,\chi_S$ is called the degree $k$ part of $f$. We will also sometimes use notation like $\mathbf{W}^{> k}[f] = \sum_{|S| > k} \widehat{f}(S)^2$ and $f^{\leq k} = \sum_{|S| \leq k} \widehat{f}(S)\,\chi_S$.

### 15 comments to §1.4: Basic Fourier formulas

• miforbes

In Prop 13, is there a reason to use the definition Var(f)=E[f*(f-E[f])] instead of the “standard” E[(f-E[f])^2]? I see that it works out the same, but initially thought it was a typo.

Technically this section is in the Category “Uncategorized”, unsure if that matters.

• Ryan O'Donnell

Fixed both, thanks!

• plv

Might it also be better to use the standard covariance definition: Covar(f, g) = E[(f-E(f))*(g-E(g))] in Prop. 16?

Also, a couple of typos?
“For example, Proposition 13 says that the variance of f equals its Fourier weight on non-empty sets ..” is only loosely correct? It might be more correct to say: “… equals the sum of its Fourier weights on non-empty sets ..”

and, just above the weight distribution figure, there is a missing “be”: “Given a boolean function it can *be* helpful to try to keep ..”

• Ryan O'Donnell

Thanks plv! I thought it might be cluttered with the extra (equivalent) definition of covariance in there, but actually it looks fine and your suggestion should be helpful. Fixed the last typo too. I will be using phrases like “f’s Fourier weight on [some collection of sets]” in the future, so I hope it’s not too confusing; I left the comment about variance as is.

• Sergiu

In Definition 19, in the last row, the last definition should be for W^{\le k} and not f^{\le k}.

• Sergiu

Never mind, it’s correct. I was confused by the different inequality signs.

• Lior Silberman

Def 17 is missing a definite article (“on [the] set”)

• Daniele A. Gewurz

Hi Ryan, and thanks a lot!

In the proof of Plancherel’s Theorem, after “$\langle f, g \rangle =$” there are two $f$s rather than $f$ and $g$.

Bye,
Daniele

• Ryan O'Donnell

Thanks, fixed!

• Marla

Should the first sentence have the word “as” in it? …can be thought of AS the representation…

• Ryan O'Donnell

Yep, thanks!

• Pavithra

Nitpick: I think fact 14 is a consequence of Parseval’s Theorem and Definition 11 (not fact 12).

• Ryan O'Donnell

Um, I think it’s okay as stated, no?

• Pavithra

Never mind, maybe I am missing something.

• Fang-Yi Yu

I think its interesting to take the Proposition 9 as cosine theorem: $4*dist(f,g) = |f-g|^2 = |f|^2+|g|^2-2$.