# Simons Symposium 2014 — Day 3

The first speaker of the day was Subhash Khot, who discussed his recent work with Madhur Tulsiani and Pratik Worah giving a complete characterization of the approximation resistance of constraint satisfaction problems (CSPs) under Subhash’s Unique Games Conjecture (UGC).

Suppose we are given a 3SAT formula $\varphi$, and are promised the existence of an assignment $x^*$ such that $x^*$ satisfies > 99% of the clauses in $\varphi$. Our goal is to efficiently find an assignment $x$ that satisfies as many clauses as possible. Certainly a uniformly random assignment satisfies a 7/8-fraction of the clauses — can we do better?  Johan Håstad’s seminal paper proved that this entirely trivial algorithm (one that doesn’t even look at the input formula $\varphi$!) is in fact the best we can do assuming $\mathsf{P} \ne \mathsf{NP}$. For such CSPs we call the associated predicate $P:\{-1,1\}^k\to\{-1,1\}$ (e.g. $P(x_1,x_2,x_3) = x_1 \vee x_2 \vee x_3$ in the case of Max-3SAT) approximation resistant; since Håstad’s result there has been significant interest in understanding which predicates are approximation resistant, and which ones aren’t.

The work of Austrin and Mossel gave a nice and clean sufficient condition for a predicate to be approximation resistant (also under the UGC): $P:\{-1,1\}^k \to\{-1,1\}$ is approximation resistant if there exists a balanced pairwise independent distribution on $\{-1,1\}^k$ that is supported on $P^{-1}(1)$. In this work Subhash and his coauthors give a complete characterization of approximation resistant predicates. Given a distribution $\mu$ on $\{-1,1\}^k$, define $\xi(\mu) \in \mathbb{R}^{k+{k\choose 2}}$ to be the vector of first and second moments:

$\xi(\mu)_i = \mathop{\mathbb{E}}_{\boldsymbol{x}\sim \mu}[\boldsymbol{x}_i],\quad \xi(\mu)_{i,j} =\mathop{\mathbb{E}}_{\boldsymbol{x}\sim \mu}[\boldsymbol{x}_i\boldsymbol{x}_{j}],$

and let $\mathcal{C}(P)$ be the convex polytope

$\mathcal{C}(P) := \big\{\xi(\mu)\colon \text{\mu is supported on P^{-1}(1)}\big\}.$

In this notation, Austrin and Mossel’s result can be equivalently stated as saying that $P$ is approximation resistant if $\mathbf{0}\in \mathcal{C}(P)$. The necessary and sufficient condition that Subhash and his coauthors give is stated in terms of the existence of a probability measure $\Lambda$ on $\mathcal{C}(P)$ with certain symmetry properties.

Subhash concluded with a couple of open problems, the first of which is to determine if their characterization is decidable. The second is to find a linear threshold predicate (i.e. a predicate $P$ that can be expressed as $P(x) = \mathrm{sign}(w\cdot x – \theta)$ for some $w\in \mathbb{R}^k$ and $\theta\in \mathbb{R}$) that is approximation resistant, or to prove that all such predicates are approximable (see this paper for more on this problem). Finally, Subhash mentioned that the picture for the stronger notion of approximation resistance on satisfiable instances appears to be completely different. These are CSPs for which we cannot beat a random assignment even when promised the existence of an assignment satisfying all constraints; Max-3SAT is one such CSP whereas Max-3LIN, while approximation resistant, is not approximation resistant on satisfiable instances (since we can simply run Gaussian elimination).

For more details check out Subhash’s TCS+ talk.

The next speaker was Boaz Barak, who spoke about questions and results surrounding the sums-of-squares (SOS) hierarchy, UGC, hypercontractivity, and sparse vectors. For many combinatorial optimization problems where a simple LP/SDP yields a non-trivial performance (e.g. the Goemans-Williamson approximation algorithm for MaxCut), we would naturally like to know if these algorithms be improved by more sophisticated relaxations, or if they represent the best we can hope to do. In many of these cases the gap between the current best algorithm’s performance and the known (NP-)hardness result is closed by the UGC (and indeed this is the case for MaxCut). Boaz points out that the SOS hierarchy represents our best candidate for further improving current algorithms, and as a corollary also our best candidate for refuting the UGC; evidently understanding its power and limitations is of significant importance.

Let $P : \mathbb{R}^n\to\mathbb{R}$, and suppose we would like to certify that $P\le \alpha$. An SOS proof of this claim is an SOS polynomial $S$ such that $P -\alpha = S$ (where an SOS polynomial is, well, the sum of squared polynomials), and an SOS proof of degree $d$ is a pair of degree-$d$ SOS polynomials $S$ and $S’$ such that $(1+S’)(P-\alpha) = S$. The key fact about SOS proofs, and the main reason behind its importance in combinatorial optimization, is that degree-$d$ SOS proofs can be found “automatically” in time $n^{O(d)}$ via semidefinite programming; this is a theorem attributed to the independent work of Shor, Parillo, Nesterov, and Lasserre.  The following example illustrates the connection between SOS proofs and optimization (and incidentally, is also among our few examples establishing limitations on the power of SOS proofs).  Let us call a polynomial $P:\{-1,1\}^n\to\mathbb{R}$ a 3LIN polynomial if

$P(x) = \frac1{m} \sum_{i,j,k} A_{ijk} x_i x_j x_k,$

where $A_{ijk} \in \{0,-1,1\}$ and $m = \sum_{i,j,k} |A_{ijk}|$.  Note that the range of $P$ is bounded in $[-1,1]$, and since $\mathbb{E}[P] = 0$ there must exist an $x\in \{-1,1\}^n$ such that $P(x) \ge 0$.  (It is straightforward to see that every 3LIN polynomial encodes a 3LIN instance, hence its name.) A result of Grigoriev (and independently Schoenebeck) says the following: for every $\epsilon > 0$, there exists a 3LIN polynomial $P$ with such that $P \le \epsilon$, and yet there is no degree-$d$ SOS proof certifying that $P < 1$ for all $d < (\epsilon^2 n)/10^6$.  Note that $P < 1$ certainly has a degree-$n$ SOS proof, since we are working over $\{-1,1\}^n$ and hence may assume that $P$ is multilinear without loss of generality. In words, the Grigoriev-Schoenebeck theorem states that there is a 3LIN instance where every assignment satisfies < 51% of constraints, and yet no degree $d < n/10^{10}$ proof can certify the absence of an assignment satisfying > 99% of constraints. This is in marked contrast with the situation for MaxCut, where it is known that the Goemans-Williamson approximation algorithm is captured by degree-$2$ SOS proofs.

Boaz concluded with the following problem related to the hypercontractive inequality: Given unit vectors $v_1,\ldots,v_m\in\mathbb{R}^n$, we define

$P(x) = \frac{n^2}{m} \sum \langle v_i,x\rangle^4,$

and say that $P$ is bounded if $P(x) \le C \| x \|^4$ and $C$ is a constant. Is there always a low-degree SOS proof that $P$ is bounded? This has implications for algorithms for finding sparse vectors in subspaces, which in turn is related to a variety of problems including small-set expansion, planted clique, sparse coding, and restricted isometry.

For more details see Boaz’s blog post, or watch his TCS+ talk. Also check out the introduction to Ryan’s recent paper with Yuan Zhou for a nice overview of the SOS hierarchy and its history.

Next up was Julia Wolf, who spoke about going beyond the Gowers norm. Like Tom’s talk yesterday, Julia began with the statement of Szemerédi’s theorem: Let $A\subseteq \{1,\ldots,N\}$ be a subset containing no $k$-APs. Then $|A| = o_k(N)$. Julia pointed out that determining the precise asymptotics of the “$o_k(N)$” term is an important open problem, since sufficiently strong decay bounds along with the prime number theorem would directly yield the Green-Tao theorem as a consequence; unfortunately such strong quantitative bounds appear to be out of reach of our current techniques. (Of course, as we saw in Tom’s talk yesterday Szemerédi’s theorem already plays a crucial role in the current proof of the Green-Tao theorem.) This was the theme of Julia’s talk — strong quantitative bounds on the relationship between density and the existence of APs — but in the finite field model $\mathbb{F}_p^n$ instead of $\{1,\ldots,N\}$.

Let $f:\mathbb{F}_p^n\to\mathbb{C}$ where $p > 2$ is a fixed prime, and let $k\ge 2$ be an integer. The Gowers $k$-norm is defined to be

$\| f \|_{U^k}^{2^k} := \mathop{\mathbb{E}}_{\boldsymbol{x},\boldsymbol{h}_1,\ldots,\boldsymbol{h}_k\in\mathbb{F}_p^n} \big[\Delta_{\boldsymbol{h}_1\cdots \boldsymbol{h}_k} f(\boldsymbol{x})\big],$

where $(\Delta_h f)(x) := f(x)\overline{f(x+h)}$ is the discrete derivative of $f$ with respect to $h$. (It is not obvious that these are norms, but they are.) The connection between these norms and APs in $\mathbb{F}_p^n$ goes via the following key fact due to Gowers: Let $f:\mathbb{F}_p^n\to [-1,1]$. Then

$\Big| \mathop{\mathbb{E}}_{\boldsymbol{x},\boldsymbol{d}}\big[f(\boldsymbol{x})f(\boldsymbol{x}+\boldsymbol{d}) \cdots f(\boldsymbol{x}+(k-1)\boldsymbol{d})\big]\Big| \le \| f \|_{U^{k-1}}.$

Note that when $f:\mathbb{F}_p^n\to\{0,1\}$, which we may view as the indicator of a subset $A_f$, this expectation counts the density of $k$-APs in $A_f$. More generally, given a system of linear forms $\mathcal{L} = (L_1,\ldots,L_m)$ we may define

$T_{\mathcal{L}}(f_1,\ldots,f_m) = \mathop{\mathbb{E}}_{\boldsymbol{x}_1,\ldots,\boldsymbol{x}_d}\Big[\prod_{i=1}^m f_i(L_i(x_1,\ldots,x_d))\Big],$

and in this notation, Gowers’s key fact may be stated as $T_{k-\mathrm{AP}}(f,\ldots,f) \le \| f \|_{U^{k-1}}$. Gowers’s key fact tells us that for any subset $A\subseteq\mathbb{F}_p^n$, if $\| \mathbb{1}_A-\alpha \|_{U^{k-1}}$ is small (where as always $\alpha = |A|/p^n$), then the number of $k$-APs that $A$ contains is correspondingly small. But what if the Gowers norm is large? In this case we need an inverse theorem, which allows us to conclude that $f$ would then be correlated with a degree-$(k-1)$ phase polynomial. These inverse theorems are proved using ergodic theory and transference principles, and their proofs do not establish explicit bounds. Consequently, the quantitative bounds we currently have for $k=3$ are far from tight, and we have no quantitative bounds at all for $k > 3$.

Julia concluded with an open problem. A result of hers and Gowers’s shows that Gowers’s key fact is actually not the best possible for most linear systems $\mathcal{L}$; they constructed a linear system $\mathcal{L}$ of size 6 for which

$|T_\mathcal{L}(f)| \le C(\|f \|_{U^2}), \quad \text{where C(t) \to 0 as t\to 0},$

whereas Gowers’s key fact only implies a bound of $|T_\mathcal{L}(f)| \le \| f\|_{U^3}$. This was proved using Gowers’s key fact; the open problem is to find a proof that doesn’t.

Next up was Stanisław Szarek, whose talk was around a series of works, 1, 2, 3; mainly the second one, entitled “Entanglement thresholds for random induced states”, joint with Guillaume Aubrun and Deping Ye. The interest here is in studying how much entanglement there is in a “typical” bipartite quantum state. However, Stanisław considered a more sophisticated (realistic?) notion of a typical state than just choosing a random complex unit vector.

Let’s forget bipartite states for a second to describe this alternate notion of a random quantum state. Actually, we should be careful to distinguish between pure quantum states (which are identified with unit vectors in some $\mathbb{C}^d$) and “mixed” quantum states (which are identified with density matrices in $\mathbb{C}^{d \times d}$ — i.e., trace-1 PSD matrices — and can be thought of as probability distributions over pure quantum states). To define a “typical” pure state, it seems the only natural thing to do is take a uniformly random unit complex vector. But it’s not completely clear what the most “natural” way to define a “typical” mixed state is. A work of Życzkowski and Sommers introduced the following notion: To get a “typical” $d$-dimensional mixed state, first suppose that there is also an $s$-dimensional “environment”. Next, take a random pure state in $\mathbb{C}^d \otimes \mathbb{C}^{s}$, and then get a mixed state on the first $d$-dimensional part by “tracing out” the second $s$-dimensional part. This yields a different distribution on $d$-dimensional mixed states for each choice of $s$. This distribution is evidently quite a bit more complicated than simply taking a uniformly random pure state.

Back to Stanisław’s talk. Suppose we generate a two-“qudit” bipartite mixed state in this way; i.e., we take a random pure state in $\mathbb{C}^d \otimes \mathbb{C}^d \otimes \mathbb{C}^s$ and trace out the third part. Will the resulting mixed bipartite state actually be entangled? This depends on the enviroment’s dimension, and Stanisław’s work shows a sharp transition around some $s = \widetilde{\Theta}(d^2)$: if $s \leq (1-\epsilon)d^2$ then the resulting mixed state is entangled with very high probability; if $s \geq (1+\epsilon)d^2$ then the resulting mixed state is separable with very high probability.

If you prefer to hear about qubits, a consequence is the following. Suppose we take a random pure state in an $N$-qubit system. Trace out all but $2k$ of the qubits, leaving a mixed state on $2k$ qubits, thought of as a bipartite system with $k$ qubits each. Then if $k \gg N/5$ the resulting bipartite system will be
entangled with high probability and if $k \ll N/5$ the resulting bipartite system will be separable with high probability.

Unfortunately, Stanisław didn’t have time to get into any proof details; I understand they’re quite sophisticated, introducing new ideas from random matrix theory and asymptotic geometric analysis.

The last speaker of the day was Christophe Garban. Christophe gave a very interactive talk concerning recent work (both his and that of others) concerning the speed at which Glauber dynamics mixes in the critical planar Ising model. Let’s unpack some of that jargon. (For more details you might look at some of Christophe’s writing, which has beautiful diagrams and beautiful exposition.) The planar Ising model is a physically natural nonuniform probability distribution on strings $\sigma \in \{-1,+1\}^{n^2}$. Think of an $n \times n$ square lattice, with each site $x \in [n] \times [n]$ having a “magnetic spin” $\sigma_x$, either $\pm 1$. Magnetic spins like to be the same as their neighbors, so one measure of how happy an overall configuration $\sigma$ is would be $E(\sigma) = \sum_{x \sim y} \sigma_x \sigma_y$, where $x \sim y$ denotes that $x$ and $y$ are nearest neighbors in the $[n] \times [n]$ grid. The Ising model with parameter $\beta \geq 0$ corresponds to choosing $\boldsymbol{\sigma} \in \{-1,+1\}^{n^2}$ at random by stipulating that ${\bf Pr}[\boldsymbol{\sigma} = \sigma]$ be proportional to $\exp(\beta E(\sigma))$. Note that if $\beta = 0$ we just get the uniform distribution. On the other hand, as $\beta \to \infty$ the configurations $\sigma$ of maximal $E(\sigma)$ — i.e., the two constant strings — get more and more important, with the $\beta = \infty$ corresponding simply to the uniform distribution over $\{(-1, -1, \dots, -1), (+1, +1, \dots, +1)\} \subset \{-1,+1\}^{n^2}$. Physically, $\beta$ corresponds to the reciprocal of “temperature”, so this last distribution corresponds to an ice-cold (“frozen”) iron square and the uniform distribution ($\beta = 0$) corresponds to a super-hot iron square.

What about $\beta$’s in between? Actually, there’s a super-special value of $\beta$ called the critical $\beta_c$ (known to equal $\frac12 \ln(1+\sqrt{2})$ where something particularly interesting happens. If you take a draw $\boldsymbol{\sigma}$ from the Ising model with parameter $\beta$, you can ask about the correlation ${\bf E}[\boldsymbol{\sigma}_x \boldsymbol{\sigma}_y]$ between two sites; more precisely, how it decays as a function of the lattice-distance between $x$ and $y$. If $\beta$ is close to $0$ then the decay is exponential; if $\beta$ is close to $\infty$ then there is hardly any decay from $1$ at all. The critical $\beta_c$ is characterized by being the unique value such that the decay is polynomial. In fact it’s known that with parameter $\beta_c$ we have ${\bf E}[\boldsymbol{\sigma}_x \boldsymbol{\sigma}_y] = \Theta(\|x-y\|^{-1/4})$.

So in some sense this is the most “interesting” value of $\beta$, and it’s the one Christophe focused on in his talk. Now, once you have an interesting probability distribution on $\{-1,1\}^{n^2}$ you want an interesting Markov chain having it as the stationary distribution. (In the $\beta = 0$ case — i.e., uniform-distribution — this is just the standard random walk on the hypercube.) The natural one in the Ising model is called Glauber dynamics. As in the “usual case”, this Markov chain works by first choosing a random site $\boldsymbol{x} \in [n] \times [n]$ to potentially update. However, the probability of actually updating the spin $\boldsymbol{\sigma}_{\boldsymbol{x}}$ depends on the sum of the spins of its four neighbors (and on the parameter $\beta$), according to a simple formula.

The natural question now is: How long does it take for this Markov chain to mix to its stationary distribution, the $\beta_c$-Ising model? Unfortunately, no one knows! At the very least, a notable breakthrough of Lubetzky and Sly from 2012 (see also this popular account) showed that the mixing time is $\mathrm{poly}(n)$. There is a great deal of physical interest in determining the actual exponent in the polynomial, however. More attention is paid to the slightly simpler question of showing that the “second eigenvalue” of the Markov chain’s Laplacian is at least $1/n^z$ for some positive $z$. (Lubetzky and Sly’s work establishes some finite upper bound on $z$.)

Christophe mainly discussed his work in progress on the other side of the problem, trying to provide lower bounds on $z$. Some (questionable?) physics simulations suggest that $z$ is some modestly small number like $2.17$. To show lower bounds, what you precisely need to do is find “starting configurations” — or more generally, functions $f : \{-1,+1\}^n \to \mathbb{R}$ — with low “conductance” under the critical Ising model; more precisely, which have $\mathscr{E}[f,f]/{\bf Var}[f]$ small, where $\mathscr{E}[f,f]$ is the Dirichlet form, akin to “total influence” in the uniform-distribution case. Christophe showed us that it’s not too hard to calculate what you get for such classics as the Dictator, Majority $= \mathrm{sgn}(\sum_x \sigma_x)$, and even simply $f = \sum_x \sigma_x$ (called “magnetization” by the physicists). The last of these actually gives the best known lower bound on $z$, namely $z \geq 7/4$. Christophe has been working valiantly to try to come up with other examples which beat $7/4$ (and several audience members proposed some suggestions). His most interesting result is that indicator function of left-right percolation on the square — which is a horribly noise-sensitive example in the uniform-distribution case — is actually quite a good (“low sensitivity”) example for the critical Ising model. It actually shows $z \geq 5/8$, which is the best lower bound known coming from a boolean-valued $f$.

As usual, we close out with some pictures.