This section is about the computational complexity of constraint satisfaction problems (CSPs), a fertile area of application for analysis of boolean functions. To study it we need to introduce a fair bit of background material; in fact, this section will mainly consist of definitions.

In brief, a CSP is an algorithmic task in which a large number of “variables” must be assigned “labels” so as to satisfy given “local constraints”. We start by informally describing some examples:

- In the “Max-$3$-Sat” problem, given is a CNF formula of width at most $3$ over boolean variables $x_1, \dots, x_n$. The task is to find a setting of the inputs which satisfies (i.e., makes $\mathsf{True}$) as many clauses as possible.
- In the “Max-Cut” problem, given is an undirected graph $G = (V,E)$. The task is to fund a “cut” — i.e., a partition of $V$ into two parts — so that as many edges as possible “cross the cut”.
- In the “Max-E$3$-Lin” problem, given is a system of linear equations over ${\mathbb F}_2$, each equation involving exactly $3$ variables. The system may in general be overdetermined; the task is to find a solution which satisfies as many equations as possible.
- In the “Max-$3$-Colouring” problem, given is an undirected graph $G = (V,E)$. The task is to colour each vertex either red, green, or blue so as to make as many edges as possible bichromatic.

Let’s rephrase the last two of these examples so that the descriptions have more in common. In Max-E$3$-Lin we have a set of variables $V$, to be assigned labels from the domain $\Omega = {\mathbb F}_2$. Each constraint is of the form $v_1 + v_2 + v_3 = 0$ or $v_1 + v_2 + v_3 = 1$, where $v_1, v_2, v_3 \in V$. In Max-$3$-Colouring we have a set of variables (vertices) $V$ to be assigned labels from the domain $\Omega = \{\text{red}, \text{green}, \text{blue}\}$. Each constraint (edge) is a pair of variables, constrained to be labelled by unequal colours.

We now make formal definitions which encompass all of the above examples:

Definition 23A constraint satisfaction problem (CSP) overdomain$\Omega$ is defined by a finite set ofpredicates(“types of constraints”) $\Psi$, with each $\psi \in \Psi$ being of the form $\psi : \Omega^r \to \{0,1\}$ for somearity$r$ (possibly different for different predicates). We say that thearityof the CSP is the maximum arity of its predicates.

Such a CSP is associated with an algorithmic task called “Max-$\mathrm{CSP}(\Psi)$”, which we will define below. First, though, let us see how the CSPs from Examples 22 fit into the above definition.

- Max-$3$-Sat: Domain $\Omega = \{\mathsf{True}, \mathsf{False}\}$; $\Psi$ contains $14$ predicates: the $8$ logical OR functions on $3$ literals (variables/negated-variables), the $4$ logical OR functions on $2$ literals, and the $2$ logical OR functions on $1$ literal.
- Max-Cut: Domain $\Omega = \{-1,1\}$; $\Psi = \{\neq\}$, the “not-equal” predicate $\neq : \{-1,1\}^2 \to \{0,1\}$.
- Max-E$3$-Lin: Domain $\Omega = {\mathbb F}_2$; $\Psi$ contains two $3$-ary predicates, $(x_1,x_2,x_3) \mapsto x_1+x_2+x_3$ and $(x_1,x_2,x_3) \mapsto x_1+x_2+x_3 + 1$.
- Max-$3$-Colouring: Domain $\Omega = \{\text{red}, \text{green}, \text{blue}\}$; $\Psi$ contains just the single not-equal predicate $\neq : \Omega^2 \to \{0,1\}$.

Remark 24Let us add a few words about traditional CSP terminology.BooleanCSPs refer to the case $|\Omega| = 2$. If $\psi : \{-1,1\}^r \to \{0,1\}$ is a boolean predicate we sometimes write “Max-$\psi$” to refer to the CSP where all constraints are of the form $\psi$ applied toliterals; i.e., $\Psi = \{\psi(\pm v_1, \dots, \pm v_r)\}$. As an example, Max-E$3$-Lin could also be called Max-$\chi_{[3]}$. The “E$3$” in the name Max-E$3$-Lin refers to the fact that all constraints involve “E”xactly $3$ variables. Thus e.g. Max-$3$-Lin is the generalization in which $1$- and $2$-variable equations are allowed. Conversely, Max-E$3$-Sat is the special case of Max-$3$-Sat where each clause must be of width exactly $3$ (a CSP which could also be called Max-$\mathrm{OR}_3$).

To formally define the algorithmic task Max-$\mathrm{CSP}(\Psi)$, we begin by defining its input:

Definition 25Aninstance(orinput) ${\mathcal P}$ of Max-$\mathrm{CSP}(\Psi)$ over variable set $V$ is a list (multiset) ofconstraints. Each constraint $C \in {\mathcal P}$ is a pair $C = (S,\psi)$, where $\psi \in \Psi$ and where thescope$S = (v^1, \dots, v^r)$ is a tuple ofdistinctvariables from $V$, with $r$ being the arity of $\psi$. We always assume that each $v \in V$ participates in at least one constraint scope. Thesizeof an instance is the number of bits required to represent it; writing $n = |V|$ and treating $|\Omega|$, $|\Psi|$ and the arity of $\Psi$ as constants, the size is between $n$ and $O(|{\mathcal P}| \log n)$.

Remark 26Let’s look at how the small details of Definition 25 affect input graphs for Max-Cut. Since an instance is a multiset of constraints, this means we allow graphs with parallel edges. Since each scope must consist of distinct variables, this means we disallow graphs with self-loops. Finally, since each variable must participate in at least one constraint, this means input graphs must have no isolated vertices (though they may be disconnected).

Given an assignment of labels for the variables, we are interested in the number of constraints that are “satisfied”. The reason we explicitly allow duplicate constraints in an instance is that we may want some constraints to be more important than others. In fact it’s more convenient to normalize by looking at the *fraction* of satisfied constraints, rather than the number. Equivalently, we can choose a constraint $\boldsymbol{C} \sim {\mathcal P}$ uniformly at random and look at the *probability* that it is satisfied. It will actually be quite useful to think of a CSP instance ${\mathcal P}$ as a probability distribution on constraints. (Indeed, we could have more generally defined *weighted CSPs* in which the constraints are given arbitrary nonnegative weights summing to $1$; however we don’t want to worry about the issue of representing, say, irrational weights with finitely many bits.)

Definition 27Anassignment(orlabelling) for instance ${\mathcal P}$ of Max-$\mathrm{CSP}(\Psi)$ is just a mapping $F : V \to \Omega$. For constraint $C = (S,\psi) \in {\mathcal P}$ we say that $F$satisfies$C$ if $\psi(F(S)) = 1$. Here we use shorthand notation: if $S = (v^1, \dots, v^r)$ then $F(S)$ denotes $(F(v^1), \dots, F(v^r))$. Thevalueof $F$, denoted $\mathrm{Val}_{\mathcal P}(F)$, is the fraction of constraints in ${\mathcal P}$ which $F$ satisfies: \begin{equation} \label{eqn:csp-value} \mathrm{Val}_{{\mathcal P}}(F) = \mathop{\bf E}_{(\boldsymbol{S}, \boldsymbol{\psi}) \sim {\mathcal P}}[\boldsymbol{\psi}(F(\boldsymbol{S}))] \in [0,1]. \end{equation} Theoptimumvalue of ${\mathcal P}$ is \[ \mathrm{Opt}({\mathcal P}) = \max_{F : V \to \Omega} \{\mathrm{Val}_{{\mathcal P}}(F)\}. \] If $\mathrm{Opt}({\mathcal P}) = 1$ we say that ${\mathcal P}$ issatisfiable.

Remark 28In the literature on CSPs there is sometimes an unfortunate blurring between a variable and its assignment. For example, a Max-E$3$-Lin instance may be written as \begin{align*} x_1 + x_2 + x_3 &= 0 \\ x_1 + x_5 + x_6 &= 0 \\ x_3 + x_4 + x_6 &= 1; \end{align*} then a particular assignment $x_1 = 0, x_2 = 1, x_3 = 0, x_4 = 1, x_5 = 1, x_6 = 1$ may be given. Now there is confusion: does $x_2$ represent the name of a variable or does it represent $1$? Because of this we prefer to display CSP instances with the name of the assignment $F$ present in the constraints. I.e., the above instance would be described as finding $F : \{x_1, \dots, x_6\} \to {\mathbb F}_2$ so as to satisfy as many as possible of the following: \begin{align*} F(x_1) + F(x_2) + F(x_3) &= 0 \\ F(x_1) + F(x_5) + F(x_6) &= 0 \\ F(x_3) + F(x_4) + F(x_6) &= 1, \end{align*}

Finally, we define the algorithmic task associated with a CSP:

Definition 29The algorithmic task Max-$\mathrm{CSP}(\Psi)$ is defined as follows: The input is an instance ${\mathcal P}$. The goal is to output an assignment $F$ with as large a value as possible.

Having defined CSPs, let us make a connection to the notion of a string testing algorithm from the previous section. The connection is this: *CSPs and string testing algorithms are the same object*. Indeed, consider a CSP instance ${\mathcal P}$ over domain $\Omega$ with $n$ variables $V$. Fix an assignment $F : V \to \Omega$; we can also think of $F$ as a string in $\Omega^n$ (under some ordering of $V$). Now think of a testing algorithm which chooses a constraint $(\boldsymbol{S}, \boldsymbol{\psi}) \sim {\mathcal P}$ at random, “queries” the string entry $F(v)$ for each $v \in \boldsymbol{S}$, and accepts if and only if the predicate $\boldsymbol{\psi}(F(\boldsymbol{S}))$ is satisfied. This is indeed an $r$-query string testing algorithm, where $r$ is the arity of the CSP; the probability the tester accepts is precisely $\mathrm{Val}_{{\mathcal P}}(F)$.

Conversely, let $T$ be some randomized testing algorithm for strings in $\Omega^n$. Assume for simplicity that $T$’s randomness comes from the uniform distribution over some sample space $U$. Now suppose we enumerate all outcomes in $U$ and for each we write the tuple of indices $S$ which $T$ queries and the predicate $\psi : \Omega^{|S|} \to \{0,1\}$ which $T$ uses to make its subsequent accept/reject decision. Then this list of scope/predicates pairs is precisely an instance of an $n$-variable CSP over $\Omega$. The arity of the CSP is equal to the (maximum) number of queries that $T$ makes and the predicates for the CSP are precisely those used by the tester in making its accept/reject decisions. Again, the probability that $T$ accepts a string $F \in \Omega^n$ is equal to the value of $F$ as an assignment for the CSP. (Our actual definition of string testers allowed any form of randomness, including, say, irrational probabilities; thus technically not every string tester can be viewed as a CSP. However, it does little harm to ignore this technicality.)

In particular, this equivalence between string testers and CSPs lets us properly define “outputting the description of a PCPP system” as in Definition 18 of PCPP reductions.

Examples 30The PCPP system for ${\mathcal O} = \{w \in {\mathbb F}_2 : w_1 + \cdots + w_n = 1\}$ given in Example 15 can be thought of as an instance of the Max-$3$-Lin CSP over the $2n-1$ variables $\{w_1, \dots, w_n, \Pi_1, \dots, \Pi_{n-1}\}$. The BLR linearity test for functions ${\mathbb F}_2^n \to {\mathbb F}_2$ can also be thought of as instance of Max-$3$-Lin over $2^n$ variables (recall that function testers are string testers). In this case we identify the variable set with ${\mathbb F}_2^n$; if $n = 2$ then the variables are named $(0,0)$, $(0,1)$, $(1,0)$, and $(1,1)$; and, if we write $F : {\mathbb F}_2^2 \to {\mathbb F}_2$ for the assignment, the instance is\begin{align*} {\tiny F(0,0) + F(0,0) + F(0,0) }&{\tiny= 0 }&{\tiny F(0,1) + F(0,0) + F(0,1) }&{\tiny= 0 }&{\tiny F(1,0) + F(0,0) + F(1,0) }&{\tiny= 0 }&{\tiny F(1,1) + F(0,0) + F(1,1) }&{\tiny= 0 }\\ {\tiny F(0,0) + F(0,1) + F(0,1) }&{\tiny= 0 }&{\tiny F(0,1) + F(0,1) + F(0,0) }&{\tiny= 0 }&{\tiny F(1,0) + F(0,1) + F(1,1) }&{\tiny= 0 }&{\tiny F(1,1) + F(0,1) + F(1,0) }&{\tiny= 0 }\\ {\tiny F(0,0) + F(1,0) + F(1,0) }&{\tiny= 0 }&{\tiny F(0,1) + F(1,0) + F(1,1) }&{\tiny= 0 }&{\tiny F(1,0) + F(1,0) + F(0,0) }&{\tiny= 0 }&{\tiny F(1,1) + F(1,0) + F(0,1) }&{\tiny= 0 }\\ {\tiny F(0,0) + F(1,1) + F(1,1) }&{\tiny= 0 }&{\tiny F(0,1) + F(1,1) + F(1,0) }&{\tiny= 0 }&{\tiny F(1,0) + F(1,1) + F(0,1) }&{\tiny= 0 }&{\tiny F(1,1) + F(1,1) + F(0,0) }&{\tiny= 0.} \end{align*}

Cf. Remark 28; also, note the duplicate constraints.

We end this section by discussing the computational complexity of finding high-value assignments for a given CSP — equivalently, finding strings which make a given string tester accept with high probability. Consider, for example, the task of Max-Cut on $n$-vertex graphs. Of course, given a Max-Cut instance one can always find the optimal solution in time roughly $2^n$, just by trying all possible cuts. Unfortunately, this is not very efficient, even for slightly large values of $n$. In computational complexity theory, an algorithm is generally deemed “efficient” if it runs in time $\mathrm{poly}(n)$. For some subfamilies of graphs there are $\mathrm{poly}(n)$-time algorithms for finding the maximum cut; e.g., bipartite graphs (exercise) or planar graphs. However it seems very unlikely that there is a $\mathrm{poly}(n)$-time algorithm which is guaranteed to find an optimal Max-Cut assignment given any input graph. This statement is formalized by a basic theorem from the field of computational complexity:

Theorem 31The task of finding the maximum cut in a given input graph is “$\mathsf{NP}$-hard”.

We will not formally define $\mathsf{NP}$-hardness in this book (though see the exercises for more explanation). Roughly speaking it means “at least as hard as the Circuit-Sat problem”, where “Circuit-Sat” is the following task: Given an $n$-variable boolean circuit $C$, decide whether or not $C$ is satisfiable (i.e., there exists $w \in \{0,1\}^n$ such that $C(w) = 1$). It is widely believed that Circuit-Sat does not have a polynomial-time algorithm (this is the “$\mathsf{P} \neq \mathsf{NP}$” conjecture). In fact it is also believed that Circuit-Sat does not have a $2^{o(n)}$-time algorithm.

For essentially all CSPs, including Max-E$3$-Sat, Max-E$3$-Lin, and Max-$3$-Colouring, finding an optimal solution is $\mathsf{NP}$-hard. This motivates considering a relaxed goal:

Definition 32Let $0 \leq \alpha \leq \beta \leq 1$. We say that algorithm $A$ is an$(\alpha,\beta)$-approximation algorithmfor Max-$\mathrm{CSP}(\Psi)$ (pronounced “$\alpha$ out of $\beta$ approximation”) if it has the following guarantee: on any instance with optimum value at least $\beta$, algorithm $A$ outputs an assignment of value at least $\alpha$. In case $A$ is a randomized algorithm, we only require that its output has value at least $\alpha$ in expectation.

A mnemonic here is that when the $\beta$est assignment has value $\beta$, the $\alpha$lgorithm gets value $\alpha$.

Examples 33Consider the following algorithm for Max-E$3$-Lin: Given an instance, output either the assignment $F \equiv 0$ or the assignment $F \equiv 1$, whichever has higher value. Since either $0$ or $1$ occurs on at least half of the instance’s “right-hand sides”, the output assignment will always have value at least $\tfrac{1}{2}$. Thus this is an efficient $(\tfrac{1}{2}, \beta)$-approximation algorithm for any $\beta$. In the case $\beta = 1$ one can do better: performing Gaussian elimination is an efficient $(1,1)$-approximation algorithm for Max-E$3$-Lin (or indeed Max-$r$-Lin for any $r$). As a far more sophisticated example, Goemans and Williamson[GW95]showed that there is an efficient (randomized) algorithm which $(.878 \beta, \beta)$-approximates Max-Cut for every $\beta$.

Not only is finding the optimal solution of a Max-E$3$-Sat instance $\mathsf{NP}$-hard, it’s even $\mathsf{NP}$-hard on *satisfiable* instances. In other words:

Theorem 34$(1,1)$-approximating Max-E$3$Sat is $\mathsf{NP}$-hard. The same is true of Max-$3$-Colouring.

On the other hand, it’s easy to $(1,1)$-approximate Max-$3$-Lin (Examples 33) or Max-Cut (exercise). Nevertheless, the “textbook” $\mathsf{NP}$-hardness results for these problems imply the following:

Theorem 35$(\beta,\beta)$-approximating Max-E$3$-Lin is $\mathsf{NP}$-hard for any fixed $\beta \in (\frac12, 1)$. The same is true of Max-Cut.

In some ways, saying that $(1,1)$-distinguishing Max-E$3$-Sat is $\mathsf{NP}$-hard is not necessarily that disheartening. For example, if $(1-\delta,1)$-approximating Max-E$3$-Sat were possible in polynomial time for every $\delta > 0$, you might consider that “good enough”. Unfortunately, such a state of affairs is very likely ruled out:

Theorem 36There exists a positive universal constant $\delta_0 > 0$ such that $(1-\delta_0, 1)$-approximating Max-E$3$-Sat is $\mathsf{NP}$-hard.

In fact, Theorem 36 is *equivalent* to the “PCP Theorem” mentioned in Section 2. It follows straightforwardly from the PCPP Theorem, as we now sketch:

*Proof:* Let $\delta_0$ be the rejection rate in the PCPP Theorem. We want to show that $(1-\delta_0, 1)$-approximating Max-E$3$-Sat is at least as hard as the Circuit-Sat problem. Equivalently, we want to show that if there is an efficient algorithm $A$ for $(1-\delta_0, 1)$-approximating Max-E$3$-Sat then there is an efficient algorithm $B$ for Circuit-Sat. So suppose $A$ exists and let $C$ be a boolean circuit given as input to $B$. Algorithm $B$ first applies to $C$ the PCPP reduction given by the PCPP Theorem. The output is some arity-$3$ CSP instance ${\mathcal P}$ over variables $w_1, \dots, w_n, \Pi_1, \dots, \Pi_\ell$, where $\ell \leq \mathrm{poly}(\mathrm{size}(C))$. By Exercise 7.12 we may assume that ${\mathcal P}$ is an instance of Max-E$3$-Sat. From the definition of a PCPP system, it is easy to check the following: If $C$ is satisfiable then $\mathrm{Opt}({\mathcal P}) = 1$; and, if $C$ is not satisfiable then $\mathrm{Opt}({\mathcal P}) < 1 – \delta_0$. Algorithm $B$ now runs the supposed $(1-\delta_0,1)$-approximation algorithm $A$ on ${\mathcal P}$ and outputs “$C$ is satisfiable” if and only if $A$ finds an assignment of value at least $1-\delta_0$. $\Box$

Enjoying the book…

-You refer the reader to the chapter exercises for more details on the concept of NP-hardness. Now that the exercises are here, which ones are supposed to be relevant?

-Also, I’m not sure if the above is your first discussion of P vs NP in this book, but perhaps it would be wise to emphasize the widespread belief that Circuit Sat doesn’t have a poly(n) time algorithm, then to mention that many even conjecture that 2^{o(n)} is impossible.

-Sec. 7.2, missing space? “The PCPP TheoremThere exists a 3-query PCPP reduction with proof length poly(size(C)) (and positive rejection rate).”

-You use “positive” as above where some authors would use Omega(1). Is the meaning discussed? Maybe it’s clear, just something to think about.

-You tend to use punctuation outside quotation marks. Just in case you’re unaware, there is fun and lively discussion around this tiny issue, e.g.:

http://blog.apastyle.org/apastyle/2011/08/punctuating-around-quotation-marks.html

-Sec. 7.2, Def. 18: “Finally, the PCPP reduction should run in time polynomial in its input and output size.” of course you mean poly(input + output), not poly(input) $\cap$ poly(output). This distinction is relevant to the long code PCPPs. Also, ‘input size’ could be read as n rather than size(C) To avoid confusion you might write the running time requirement concisely as $\poly(size(C) + \ell)$.

Hi Andy, thanks for the comments!

1. I was thinking of Exercise 13.

2. You’re right (though I’m a big fan of the ETH Changed.

3. Fixed.

4. Yeah… I agree ‘positive’ is arguably not the optimal word here, but I think I’ll just leave it for now. I don’t want to get too caught up in this rejection rate business; I think the reader will understand what’s meant.

5. I also use Canadian spelling.

6. Right… good suggestion.

In the proof of Thm 36, when you say “By an exercise”, maybe you want to point out which exercise?

Okay, updated!