§4.1: DNF formulas

Perhaps the commonest way of representing a boolean function $f : \{0,1\}^n \to \{0,1\}$ is by a DNF formula:

Definition 1 A DNF (disjunctive normal form) formula over boolean variables $x_1, \dots, x_n$ is defined to be a logical OR of terms, each of which is a logical AND of literals. A literal is either a variable $x_i$ or its logical negation $\overline{x}_i$. We insist that no term contains both a variable and its negation. The number of literals in a term is called its width. We often identify a DNF formula with the boolean function $f : \{0,1\}^n \to \{0,1\}$ it computes.

Example 2 Recall the function $\mathrm{Sort}_3$, defined by $\mathrm{Sort}_3(x_1, x_2, x_3) = 1$ if and only if $x_1 \leq x_2 \leq x_3$ or $x_1 \geq x_2 \geq x_3$. We can represent it by a DNF formula as follows: \[ \mathrm{Sort}_3(x_1, x_2, x_3)\ =\ (x_1 \wedge x_2)\ \vee\ (\overline{x}_2 \wedge \overline{x}_3)\ \vee\ (\overline{x}_1 \wedge x_3). \] The DNF representation says that the bits are sorted if either the first two bits are $1$, or the last two bits are $0$, or the first bit is $0$ and the last bit is $1$.

The complexity of a DNF formula is measured by its size and width:

Definition 3 The size of a DNF formula is its number of terms. The width is the maximum width of its terms. Given $f : \{-1,1\}^n \to \{-1,1\}$ we write ${\mathrm{DNF}_{\mathrm{size}}}(f)$ (respectively, ${\mathrm{DNF}_{\mathrm{width}}}(f)$) for the least size (respectively, width) of a DNF formula computing $f$.

The DNF formula for $\mathrm{Sort}_3$ from Example 2 has size $3$ and width $2$. Every function $f : \{0,1\}^n \to \{0,1\}$ can be computed by a DNF of size at most $2^n$ and width at most $n$ (see the exercises).

There is also a “dual” notion to DNF formulas:

Definition 4 A CNF (conjunctive normal form) formulas is a logical AND of clauses, each of which is a logical OR of literals. Size and width are defined as for DNFs.

Some functions can be represented much more compactly by CNFs than DNFs (see the exercises). On the other hand, if we take a CNF computing $f$ and switch its ANDs and ORs, the result is a DNF computing the dual function $f^\dagger$ (recall Exercise 1.9). Since $f$ and $f^\dagger$ have essentially the same Fourier expansion, there isn’t much difference between CNFs and DNFs when it comes to Fourier analysis. We will therefore focus mainly on DNFs.

DNFs and CNFs are more powerful than decision trees for representing boolean-valued functions, as the following proposition shows:

Proposition 5 Let $f : \{0,1\}^n \to \{0,1\}$ be computable by a decision tree $T$ of size $s$ and depth $k$. Then $f$ is computable by a DNF (and also a CNF) of size at most $s$ and width at most $k$.

Proof: Take each path in $T$ from the root to a leaf labelled $1$ and form the logical AND of the literals describing the path. These are the terms of the required DNF. (For the CNF clauses, take paths to label $0$ and negate all literals describing the path.) $\Box$

Example 6 If we perform this conversion on the decision tree computing $\mathrm{Sort}_3$ in the figure from Chapter 3.2 we get the DNF \[ (\overline{x}_1 \wedge \overline{x}_3 \wedge \overline{x}_2)\ \vee\ (\overline{x}_1 \wedge x_3)\ \vee\ (x_1 \wedge \overline{x}_2 \wedge \overline{x}_3)\ \vee\ (x_1 \wedge x_2). \] This has size $4$ (indeed at most the decision tree size $6$) and width $3$ (indeed at most the decision tree depth $3$). It is not as simple as the equivalent DNF from Example 2, though; DNF representation is not unique.

The class of functions computable by small DNFs is intensively studied in learning theory. This is one reason why the problem of analyzing spectral concentration for DNFs is important. Let’s begin with the simplest method for this: understanding low-degree concentration via total influence. We will switch to $\pm 1$ notation.

Proposition 7 Suppose that $f : \{-1,1\}^n \to \{-1,1\}$ has ${\mathrm{DNF}_{\mathrm{width}}}(f) \leq w$. Then $\mathbf{I}[f] \leq 2w$.

Proof: We use Exercise 2.9 which states that \[ \mathbf{I}[f] = 2\mathop{\bf E}_{{\boldsymbol{x}} \sim \{-1,1\}^n}[\# \text{ $(-1)$-pivotal coordinates for $f$ on ${\boldsymbol{x}}$}], \] where coordinate $i$ is “$(-1)$-pivotal” on input $x$ if $f(x) = -1$ (logical $\mathsf{True}$) but $f(x^{\oplus i}) = 1$ (logical $\mathsf{False}$). It thus suffices to show that on every input $x$ there are at most $w$ coordinates which are $(-1)$-pivotal. To have any $(-1)$-pivotal coordinates at all on $x$ we must have $f(x) = -1$ ($\mathsf{True}$); this means that at least one term $T$ in $f$’s width-$w$ DNF representation must be made $\mathsf{True}$ by $x$. But now if $i$ is a $(-1)$-pivotal coordinate then either $x_i$ or $\overline{x}_i$ must appear in $T$; otherwise, $T$ would still be made true by $x^{\oplus i}$. Thus the number of $(-1)$-pivotal coordinates on $x$ is at most the number of literals in $T$, which is at most $w$. $\Box$

Since $\mathbf{I}[f^\dagger] = \mathbf{I}[f]$ the proposition is also true for CNFs of width at most $w$. The proposition is very close to being tight: The parity function $\chi_{[w]} : \{-1,1\}^n \to \{-1,1\}$ has $\mathbf{I}[\chi_{[w]}] = w$ and ${\mathrm{DNF}_{\mathrm{width}}}(\chi_{[w]}) \leq w$ (the latter being true for all $w$-juntas). In fact, the proposition can be improved to give the tight upper bound $w$ (see the exercises).

Using Proposition 3.2 we deduce:

Corollary 8 Let $f : \{-1,1\}^n \to \{-1,1\}$ have ${\mathrm{DNF}_{\mathrm{width}}}(f) \leq w$. Then for $\epsilon > 0$, the Fourier spectrum of $f$ is $\epsilon$-concentrated on degree up to $2w/\epsilon$.

The dependence here on $w$ is of the correct order (by the example of the parity $\chi_{[w]}$ again), but the dependence on $\epsilon$ can be significantly improved as we will see in a later section.

There’s usually more interest in DNF size than in DNF width; for example, learning theorists are often interested in the class of $n$-variable DNFs of size $\mathrm{poly}(n)$. The following fact (similar to Exercise 3.22) helps relate the two, suggesting $O(\log n)$ as an analogous width bound:

Proposition 9 Let $f : \{-1,1\}^n \to \{-1,1\}$ be computable by a DNF (or CNF) of size $s$ and let $\epsilon \in (0, 1]$. Then $f$ is $\epsilon$-close to a function $g$ computable by a DNF of width $\log(s/\epsilon)$.

Proof: Take the DNF computing $f$ and delete all terms with more than $\log(s/\epsilon)$ literals; let $g$ be the function computed by the resulting DNF. For any deleted term $T$, the probability a random input ${\boldsymbol{x}} \sim \{-1,1\}^n$ makes $T$ true is at most $2^{-\log(s/\epsilon)} = \epsilon/s$. Taking a union bound over the (at most $s$) such terms shows that $\mathop{\bf Pr}[g({\boldsymbol{x}}) \neq f({\boldsymbol{x}})] \leq \epsilon$. (A similar proof works for CNFs.) $\Box$

By combining Proposition 9 and Corollary 8 we can deduce (using Exercise 3.16) that DNFs of size $s$ have Fourier spectra $\epsilon$-concentrated up to degree $O(\log(s/\epsilon)/\epsilon)$. Again, the dependence on $\epsilon$ will be improved in a later section. We will also show in a later section that size-$s$ DNFs have total influence at most $O(\log s)$, something we cannot deduce immediately from Proposition 7.

In light of the Kushilevitz–Mansour learning algorithm it would also be nice to show that $\mathrm{poly}(n)$-size DNFs have their Fourier spectra concentrated on small collections (not necessarily low-degree). In a later section we will show they are $\epsilon$-concentrated on collections of size $n^{O(\log \log n)}$ for any constant $\epsilon>0$. It has been conjectured that this can be improved to $\mathrm{poly}(n)$:

Mansour’s Conjecture Let $f : \{-1,1\}^n \to \{-1,1\}$ be computable by a DNF of size $s > 1$ and let $\epsilon \in (0,1/2]$. Strong conjecture: $f$’s Fourier spectrum is $\epsilon$-concentrated on a collection $\mathcal{F}$ with $|\mathcal{F}| \leq s^{O(\log(1/\epsilon))}$. Weaker conjecture: if $s \leq \mathrm{poly}(n)$ and $\epsilon > 0$ is any fixed constant then we have the bound $|\mathcal{F}| \leq \mathrm{poly}(n)$.

10 comments to §4.1: DNF formulas

Leave a Reply




You can use most standard LaTeX, as well as these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>