Perhaps the commonest way of representing a boolean function $f : \{0,1\}^n \to \{0,1\}$ is by a DNF formula:

Definition 1ADNF (disjunctive normal form) formulaover boolean variables $x_1, \dots, x_n$ is defined to be a logical OR ofterms, each of which is a logical AND ofliterals. Aliteralis either a variable $x_i$ or its logical negation $\overline{x}_i$. We insist that no term contains both a variable and its negation. The number of literals in a term is called itswidth. We often identify a DNF formula with the boolean function $f : \{0,1\}^n \to \{0,1\}$ it computes.

Example 2Recall the function $\mathrm{Sort}_3$, defined by $\mathrm{Sort}_3(x_1, x_2, x_3) = 1$ if and only if $x_1 \leq x_2 \leq x_3$ or $x_1 \geq x_2 \geq x_3$. We can represent it by a DNF formula as follows: \[ \mathrm{Sort}_3(x_1, x_2, x_3)\ =\ (x_1 \wedge x_2)\ \vee\ (\overline{x}_2 \wedge \overline{x}_3)\ \vee\ (\overline{x}_1 \wedge x_3). \] The DNF representation says that the bits are sorted if either the first two bits are $1$, or the last two bits are $0$, or the first bit is $0$ and the last bit is $1$.

The complexity of a DNF formula is measured by its size and width:

Definition 3Thesizeof a DNF formula is its number of terms. Thewidthis the maximum width of its terms. Given $f : \{-1,1\}^n \to \{-1,1\}$ we write ${\mathrm{DNF}_{\mathrm{size}}}(f)$ (respectively, ${\mathrm{DNF}_{\mathrm{width}}}(f)$) for the least size (respectively, width) of a DNF formula computing $f$.

The DNF formula for $\mathrm{Sort}_3$ from Example 2 has size $3$ and width $2$. Every function $f : \{0,1\}^n \to \{0,1\}$ can be computed by a DNF of size at most $2^n$ and width at most $n$ (see the exercises).

There is also a “dual” notion to DNF formulas:

Definition 4ACNF (conjunctive normal form) formulasis a logical AND ofclauses, each of which is a logical OR of literals. Size and width are defined as for DNFs.

Some functions can be represented much more compactly by CNFs than DNFs (see the exercises). On the other hand, if we take a CNF computing $f$ and switch its ANDs and ORs, the result is a DNF computing the dual function $f^\dagger$ (recall Exercise 1.9). Since $f$ and $f^\dagger$ have essentially the same Fourier expansion, there isn’t much difference between CNFs and DNFs when it comes to Fourier analysis. We will therefore focus mainly on DNFs.

DNFs and CNFs are more powerful than decision trees for representing boolean-valued functions, as the following proposition shows:

Proposition 5Let $f : \{0,1\}^n \to \{0,1\}$ be computable by a decision tree $T$ of size $s$ and depth $k$. Then $f$ is computable by a DNF (and also a CNF) of size at most $s$ and width at most $k$.

*Proof:* Take each path in $T$ from the root to a leaf labelled $1$ and form the logical AND of the literals describing the path. These are the terms of the required DNF. (For the CNF clauses, take paths to label $0$ and negate all literals describing the path.) $\Box$

Example 6If we perform this conversion on the decision tree computing $\mathrm{Sort}_3$ in the figure from Chapter 3.2 we get the DNF \[ (\overline{x}_1 \wedge \overline{x}_3 \wedge \overline{x}_2)\ \vee\ (\overline{x}_1 \wedge x_3)\ \vee\ (x_1 \wedge \overline{x}_2 \wedge \overline{x}_3)\ \vee\ (x_1 \wedge x_2). \] This has size $4$ (indeed at most the decision tree size $6$) and width $3$ (indeed at most the decision tree depth $3$). It is not as simple as the equivalent DNF from Example 2, though; DNF representation is not unique.

The class of functions computable by small DNFs is intensively studied in learning theory. This is one reason why the problem of analyzing spectral concentration for DNFs is important. Let’s begin with the simplest method for this: understanding low-degree concentration via total influence. We will switch to $\pm 1$ notation.

Proposition 7Suppose that $f : \{-1,1\}^n \to \{-1,1\}$ has ${\mathrm{DNF}_{\mathrm{width}}}(f) \leq w$. Then $\mathbf{I}[f] \leq 2w$.

*Proof:* We use Exercise 2.9 which states that \[ \mathbf{I}[f] = 2\mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[\# \text{ $(-1)$-pivotal coordinates for $f$ on ${\boldsymbol{x}}$}], \] where coordinate $i$ is “$(-1)$-pivotal” on input $x$ if $f(x) = -1$ (logical $\mathsf{True}$) but $f(x^{\oplus i}) = 1$ (logical $\mathsf{False}$). It thus suffices to show that on *every* input $x$ there are at most $w$ coordinates which are $(-1)$-pivotal. To have any $(-1)$-pivotal coordinates at all on $x$ we must have $f(x) = -1$ ($\mathsf{True}$); this means that at least one term $T$ in $f$’s width-$w$ DNF representation must be made $\mathsf{True}$ by $x$. But now if $i$ is a $(-1)$-pivotal coordinate then either $x_i$ or $\overline{x}_i$ must appear in $T$; otherwise, $T$ would still be made true by $x^{\oplus i}$. Thus the number of $(-1)$-pivotal coordinates on $x$ is at most the number of literals in $T$, which is at most $w$. $\Box$

Since $\mathbf{I}[f^\dagger] = \mathbf{I}[f]$ the proposition is also true for CNFs of width at most $w$. The proposition is very close to being tight: The parity function $\chi_{[w]} : \{-1,1\}^n \to \{-1,1\}$ has $\mathbf{I}[\chi_{[w]}] = w$ and ${\mathrm{DNF}_{\mathrm{width}}}(\chi_{[w]}) \leq w$ (the latter being true for all $w$-juntas). In fact, the proposition can be improved to give the tight upper bound $w$ (see the exercises).

Using Proposition 3.2 we deduce:

Corollary 8Let $f : \{-1,1\}^n \to \{-1,1\}$ have ${\mathrm{DNF}_{\mathrm{width}}}(f) \leq w$. Then for $\epsilon > 0$, the Fourier spectrum of $f$ is $\epsilon$-concentrated on degree up to $2w/\epsilon$.

The dependence here on $w$ is of the correct order (by the example of the parity $\chi_{[w]}$ again), but the dependence on $\epsilon$ can be significantly improved as we will see in a later section.

There’s usually more interest in DNF *size* than in DNF width; for example, learning theorists are often interested in the class of $n$-variable DNFs of size $\mathrm{poly}(n)$. The following fact (similar to Exercise 3.22) helps relate the two, suggesting $O(\log n)$ as an analogous width bound:

Proposition 9Let $f : \{-1,1\}^n \to \{-1,1\}$ be computable by a DNF (or CNF) of size $s$ and let $\epsilon \in (0, 1]$. Then $f$ is $\epsilon$-close to a function $g$ computable by a DNF of width $\log(s/\epsilon)$.

*Proof:* Take the DNF computing $f$ and delete all terms with more than $\log(s/\epsilon)$ literals; let $g$ be the function computed by the resulting DNF. For any deleted term $T$, the probability a random input ${\boldsymbol{x}} \sim \{-1,1\}^n$ makes $T$ true is at most $2^{-\log(s/\epsilon)} = \epsilon/s$. Taking a union bound over the (at most $s$) such terms shows that $\mathop{\bf Pr}[g({\boldsymbol{x}}) \neq f({\boldsymbol{x}})] \leq \epsilon$. (A similar proof works for CNFs.) $\Box$

By combining Proposition 9 and Corollary 8 we can deduce (using Exercise 3.16) that DNFs of size $s$ have Fourier spectra $\epsilon$-concentrated up to degree $O(\log(s/\epsilon)/\epsilon)$. Again, the dependence on $\epsilon$ will be improved in a later section. We will also show in a later section that size-$s$ DNFs have total influence at most $O(\log s)$, something we cannot deduce immediately from Proposition 7.

In light of the Kushilevitz–Mansour learning algorithm it would also be nice to show that $\mathrm{poly}(n)$-size DNFs have their Fourier spectra concentrated on small collections (not necessarily low-degree). In a later section we will show they are $\epsilon$-concentrated on collections of size $n^{O(\log \log n)}$ for any constant $\epsilon>0$. It has been conjectured that this can be improved to $\mathrm{poly}(n)$:

Mansour’s ConjectureLet $f : \{-1,1\}^n \to \{-1,1\}$ be computable by a DNF of size $s > 1$ and let $\epsilon \in (0,1/2]$. Strong conjecture: $f$’s Fourier spectrum is $\epsilon$-concentrated on a collection $\mathcal{F}$ with $|\mathcal{F}| \leq s^{O(\log(1/\epsilon))}$. Weaker conjecture: if $s \leq \mathrm{poly}(n)$ and $\epsilon > 0$ is any fixed constant then we have the bound $|\mathcal{F}| \leq \mathrm{poly}(n)$.

It seems to me that the last term of the DNF in Example 6 should be $(x_1 \wedge x_2)$ instead of $(x_2 \wedge x_3)$ (otherwise the sorted string 110 does not satisfy the current DNF). Also, there is some LaTeX code (“renewcommand…”) appearing before the statement of Mansour’s conjecture in the final paragraph.

Fixed both, thanks!

In Definition 3, “(respectively, depth)” should be “(respectively, width)”

Fixed, thanks.

a small typo in Cor. 8. It should be ${\mathrm{DNF}_{\mathrm{width}}}(f) \leq w$ with $f$ in parenthesis instead of $w$.

Fixed, thanks Igor.

a tiny typo: “We will also later show in a later section…”

The first “later” is not needed

Fixed, thanks!

a tiny typo: “In fact, the proposition be improved” -> “In fact, the proposition can be improved”

Thanks!