# Simons Symposium 2014 — Day 1

Li-Yang here.

Avi Wigderson kicked off this year’s symposium with a talk describing recent joint work with Dimitri Gavinsky, Or Meir, and Omri Weinstein attacking one of the central open problems in computational complexity: does ${\mathsf P} = \mathsf{NC}^1$, or in words, can every sequential computation be efficiently parallelized?

For a Boolean function $f:\{0,1\}^n \to\{0,1\}$, let $S(f)$ denote the minimum size of any Boolean circuit computing $f$, and $L(f)$ the minimum size of any Boolean formula computing $f$. Certainly $S(f) \le L(f)$ for all $f$, and we know of examples showing that $S(f)$ can be polynomially smaller than $L(f)$ — the parity function, for example, has $S(\mathsf{PAR}_n) = O(n)$ whereas $L(\mathsf{PAR}_n) = \Omega(n^2)$. The $\mathsf{P}$ versus $\mathsf{NC}^1$ problem seeks an explicit function $f$ witnessing a super-polynomial separation between these two quantities.

The work of Karchmer, Raz, and Wigderson (KRW) proposed the following approach to the problem. Let $D(f)$ denote the minimum depth of any formula computing $f$ (and recall that $D(f) = O(\log(L(f))$ by Spira’s depth-reduction theorem).  Given two Boolean functions $g:\{0,1\}^m \to\{0,1\}$ and $f:\{0,1\}^n\to\{0,1\}$, we write $g\circ f : \{0,1\}^{mn}\to\{0,1\}$ to denote their (disjoint) composition $(g\circ f)(x^1,\ldots,x^m) = g(f(x^1),\ldots,f(x^m))$. Clearly $D(g \circ f) \le D(g) + D(f)$: to compute $g\circ f$ with a formula, simply stick on a formula for $f$ at every leaf in a formula computing $g$. The KRW conjecture states that this naive construction is essentially the best one can do, i.e. that $D(g \circ f) \approx D(g) + D(f)$; equivalently, $C(R_{g\circ f}) \approx C(R_g) + C(R_f)$ where $R_g$ and $R_f$ are the communication problems corresponding to the Karchmer-Wigderson relations of $f$ and $g$ respectively, and $C(\cdot)$ denotes deterministic communication complexity. KRW showed that proving this conjecture would separate $\mathsf{P}$ from $\mathsf{NC}^1$ (notably, this approach appears to circumvent the Razborov-Rudich natural proofs barrier). They furthermore proposed the first step of proving the conjecture in the case when $f = U_n$ and $g=U_m$, where $U_k$ is the $k$-ary Universal Relation that defines the following communication problem: Alice receives $x\in \{0,1\}^k$ and Bob $y\in \{0,1\}^k$ with the sole guarantee that $x\ne y$, and their goal is to find a coordinate $i\in [k]$ such that $x_i \ne y_i$. (It is easy to see that every Karchmer-Wigderson relation reduces to the Universal Relation, hence its name.)

This first step was taken by Edmonds, Impagliazzo, Rudich, and Sgall; Håstad and Wigderson subsequently gave an alternative proof. Using tools from interactive information complexity Avi and his co-authors take the natural next step: they establish the KRW conjecture in the case when $f = U_n$ and $g$ is an arbitrary function.  The case when $g = U_m$ and $f$ is an arbitrary function remains open: while superficially similar, Avi points out that this case appears to be of significantly different nature, and proposes it as a good next step to pursue.

For more details here’s a video of Or giving a talk on their results.

Next up were back-to-back talks by Rafał Latała and Witold Bednorz on their recent work on the suprema of Bernoulli processes, the main result of which is an affirmative answer to Talagrand’s \$5000 Bernoulli Conjecture (among the list of open problems from the 2012 Symposium). The problem can be stated as follows: let$T\subseteq \ell^2$be a collection of vectors, and consider the quantity: $b(T) := \mathop{\bf E} \left[\sup_{t\in T} \sum_{i =1}^\infty \boldsymbol{\epsilon}_i t_i \right],$ where$\boldsymbol{\epsilon}_1,\boldsymbol{\epsilon}_2,\ldots$are i.i.d. uniform$\pm$1 random variables. The goal is to obtain tight upper and lower bounds on$b(T)$; more precisely, to understand its relationship with the geometry of$T$. As is often the case with questions concerning Bernoulli random variables, it is natural to first consider the Gaussian setting where we can define the analogous quantity: $g(T) := \mathop{\bf E}\left[ \sup_{t\in T} \sum_{i =1}^\infty \boldsymbol{g}_i t_i \right],$ and where the$\boldsymbol{g}_i$’s are i.i.d. standard Gaussians. The generic chaining theory of Michel Talagrand (building on results of Fernique and Dudley, the culmination of a body of work dating back to Kolmogorov in the 30′s) has shown that$g(T)$is characterized, up to universal constant factors, by the quantity $\gamma_2(T,\|\cdot \|_2) := \inf \sup_{t \in T} \sum_{n=0}^\infty 2^{n/2} \mathrm{diam}(\mathcal{A}_n(t)),$ where the infimum is taken over all choices of$\{\mathcal{A}_n\}_{n=0}^\infty$, increasing sequences of partitions of$T$satisfying$\mathcal{A}_0 = \{T\}$and$|\mathcal{A}_n| = 2^{2^n}$, and$\mathcal{A}_n(t)$denotes the unique set in the partition$\mathcal{A}_n$that contains$t$. With Talagrand’s characterization of$g(T)$in mind we now revisit the Bernoulli setting. An easy application of Jensen’s inequality yields the relationship$b(T) \le g(T)\sqrt{\pi/2}$, and an even simpler “$\ell_1$-bound” yields$b(T) \le \sup_{t\in T} \| t \|_1$. Together these give us the upper bound: $b(T) \le \inf\{ \sup_{t\in T_1} \|t\|_1 + g(T_2) \colon T \subseteq T_1 + T_2\}.$ Talagrand’s Bernoulli Conjecture essentially asks: “Can we reverse this inequality?”, and Rafał and Witek’s work answers this in the affirmative. Their main theorem states the following: for all$T$such that$b(T) < \infty$, there exists$T_1$and$T_2$such that$T \subseteq T_1 + T_2$and$\sup _{t\in T_1}\| t \|_1 \le O(b(T))$and$g(T_2) = O(b(T))$. Rafał’s talk ended with the following intriguing question from Yuval Peres and Elchanan Mossel: is there an efficient algorithm for computing or approximating$b(T)$? (This is very closely related to Raghu Meka’s talk; see below.) Witek spoke about several potential extensions of the Bernoulli Theorem which remain open. Here I will highlight two: The first is a strict generalization, conjectured by Kwapień, to the Banach space setting. Let$(B, \| \cdot \|)$be a normed space, and let$u_1, u_2, \dots \in B$be such that$\sum_i u_i \boldsymbol{\epsilon}_i$converges almost surely. (Here the$\boldsymbol{\epsilon}_i$’s represent independent uniformly$\pm 1$random variables.) Is it possible to write each$u_i$as$u_i = v_i + w_i$such that $\mathop{\bf E}\left[\sum_i u_i \boldsymbol{\epsilon}_i\right] \gtrsim \sup_{\delta_i \in \{-1,1\}} \left\{\left\|\sum_i v_i \delta_i\right\| \right\}, \ \ \mathop{\bf E}\left[\sum_i u_i \boldsymbol{\epsilon}_i\right] \gtrsim {\bf E}\left[\sum_i w_i \boldsymbol{g}_i\right]?$ (Here as usual the$\boldsymbol{g}_i$’s represent independent standard Gaussians, and$\gtrsim$means inequality up to a universal constant.) In case$B = \ell^\infty$this follows from Rafal and Witek’s proof of the Bernoulli Conjecture; however, this potential generalization does not seem so easy because their proof does not give much control over the “$T_1$” and “$T_2$” sets. Another potential generalization of the Bernoulli Theorem is to$p$-biased$\pm 1$random variables. More specifically, let$p > 0$be small and let$\boldsymbol{\delta} = (\boldsymbol{\delta}_1,\dots, \boldsymbol{\delta} _n)$be a sequence of independent random variables which are$1-p$with probability$p$and$-p$with probability$1-p$. Let$T$be a finite collection of vectors in${\mathbb R}^n$. As usual, we are interesting in finding deterministic quantities which bound $b_p(T) := \mathop{\bf E}\left[\max_{t \in T} |\langle t, \boldsymbol{\delta}\rangle| \right]$ up to a constant factor not depending on$n$or on$p$. As in the Bernoulli Conjecture, one can give upper bounds based on: (a) trivial$\ell_1$norm considerations; (b) chaining (roughly speaking, the union bound with clever clustering of vectors). In the Bernoulli Conjecture case (i.e.,$p = 1/2$), the chaining bound relies on Chernoff bounds; i.e., the subgaussian property of uniform$\pm 1$random vectors. In this$p$-biased case one has to be a little more careful using Bernstein bounds. In any case, Talagrand has again conjectured that these are the “only” two ways to get an upper bound. Precisely, the conjecture boils down to showing the following: one can find vector sets$T_1, T_2$with$T \subseteq T_1
+ T_2$, and $b_p(T) \gtrsim \mathop{\bf E}\left[\max_{t \in T} \sum_i |t_i| \boldsymbol{\delta}_i\right],\ \ \sqrt{p} \cdot \gamma_2(T_1, \| \cdot \|_2), \ \ \gamma_1(T_2, d_{\infty}),$ where$\gamma_2$is as before, and the$\gamma_1$quantity is like$\gamma_2$except that the diameter is measured with respect to the 1-norm and the coefficient on contribution from the$j$th part is$2^j$rather than$2^{j/2}$. (And, it should be stressed again, the constant hidden by the$\gtrsim$should not depend on$p$or$n$.) For more details here’s a video of Rafal giving a longer version of his talk; see also this great series of blog posts by James Lee. The next speaker was Yuval Filums, who spoke about the Erdős-Ko-Rado (EKR) theorem, stability results, and association schemes. The classical EKR theorem from extremal combinatorics says the following: for all$n \ge 2k$, if$\mathcal{F} \subseteq {{[n]}\choose k}$is an intersecting family then$|\mathcal{F}| \le {{n-1}\choose {k-1}}$. (A family$\mathcal{F}$is intersecting if$A \cap B \ne \emptyset$for all$A,B \in \mathcal{F}$.) The extremal family is perhaps the first that comes to mind: take an arbitrary element in the universe$[n]$, say$1$, and let$\mathcal{F}$be the family of all${{n-1}\choose {k-1}}$subsets that include$1$; in fact, these are the only families attaining the bound of${{n-1}\choose {k-1}}$. Let’s call such a family$\mathcal{F}$a star. It is natural to wonder if a “stability version” of the EKR theorem is true: if$|\mathcal{F}| \ge (1-\epsilon){{n-1}\choose {k-1}}$, must there exist some star$\mathcal{S}$such that$|\mathcal{F} \triangle \mathcal{S}| \le \epsilon |\mathcal{F}|$? There are by now several different proofs of the EKR theorem; however many of these, like the original shifting-based proof EKR, do not appear to be amendable to obtaining stability results. An analytic approach introduced by Ehud Friedgut has the key advantage that it automatically yields stability, showing that families of cardinality close to the extremal bound of${{n-1}\choose {k-1}}$must in fact themselves be close to a star. Roughly speaking, Ehud accomplishes this by considering the Fourier expansion of$\mathbf{1}_{\mathcal{F}}: {{[n]}\choose k} \to\{0,1\}$, and showing that its Fourier mass is concentrated on the first two levels. This brings to mind the Friedgut-Kalai-Naor theorem, which says the Boolean functions$f:\{0,1\}^n\to\{0,1\}$with most of their Fourier mass concentrated on the first two levels are close to dictators (or more precisely,$1$-juntas) — we note that the indicator of an extremal family is in fact a dictator, since$\mathbf{1}_{\mathcal{F}}(S) = 1$iff the distinguished element ($1$in the case of our example above) is in$S$. Of course to carry out this proof strategy one would need an FKN-type theorem for Boolean(-valued) functions${{[n]}\choose k}\to\{0,1\}$over “slices of the hypercube”; in other words, functions over the Johnson scheme. This falls into a line of work extending classical Fourier-analytic theorems concerning Boolean-valued functions over the discrete cube$\{-1,1\}^n$to non-independent domains. Yuval ended his talk by with a wish-list of six such theorems he would like to see extended to various association schemes: the KKL theorem, Friedgut’s junta theorem, the FKN theorem, the Kindler-Safra theorem and the closely-related Bourgain’s theorem, and finally the invariance principle. Analogues of the first two for the Johnson scheme have been established by O’Donnell and Wimmer and Wimmer respectively. For more on related spectral techniques in extremal combinatorics, check out Yuval’s thesis. The final speaker of the day was Raghu Meka, who presented a PTAS for computing the supremum of Gaussian processes (answering an open problem of Lee and Ding). More precisely, he constructs a deterministic algorithm which, given a finite set of vectors$T  \subseteq \mathbb{R}^d$and$\epsilon > 0$, computes a$(1+\epsilon)$-factor approximation to the quantity$\mathbf{E}_{\mathbf{X} \leftarrow \mathcal{N}^d}[\sup_{t \in T} |\langle t, \mathbf{X}\rangle|]$in time$d^{O(1)}\cdot |T|^{O_\epsilon(1)}$(here$\mathcal{N}$is the standard Gaussian). Gaussian processes play a central role in the study of stochastic processes, functional analysis, convex geometry, and machine learning (among others); we elaborate on one specific application which is of particular relevance to Raghu’s result. Given a finite connected graph$G = (V,E)$, we may associate with it the following fundamental parameter: the cover time of$G=(V,E)$, denoted$\mathrm{cov}(G)$, is the maximum over all vertices$v \in V$, of the expected time it takes for a simple random walk starting at$v$to visit all every other vertex of$G$. (A simple random walk is one which, at every step, moves to a uniformly random neighbor.) Aldous and Fill asked the natural question: “Can we approximate$\mathrm{cov}(G)$deterministically?” (Note that there is a simple randomized algorithm: simulate a random walk a few of times, and output the median time required to visit all vertices.) The first result in this direction was due to Kahn, Kim, Lovász, and Vu, who gave a$O((\log\log n)^2)$-factor approximation algorithm; this was followed by the work of Feige and Zeitouni who gave an FPTAS for trees. In recent exciting work Ding, Lee, and Peres significantly improved this state of affairs with a deterministic polynomial-time$O(1)$-factor approximation algorithm. They accomplish this by bridging a connection between the cover time of$G$and an associated Gaussian process on$G$, known as the Gaussian free field on$G$; more precisely, they show that the supremum of the Gaussian free field on$G$characterizes, up to universal constant factors, the cover time of$G$. With this connection in hand they then give a dynamic programming-based algorithm for computing$\gamma_2(\cdot,\|\cdot \|_2)$, which by Talagrand’s characterization yields a$O(1)$-factor approximation to the supremum of Gaussian processes. Motivated by this application, Ding and Lee asked if there is a PTAS for computing the supremum of Gaussian processes, and Raghu’s work answers this question in the affirmative (with a “direct” algorithm, one that does not incur the$O(1)$-factor loss that is likely to be inherent in any approach that goes through Talagrand’s characterization). Raghu’s algorithm can be viewed as proceeding in two modular steps, the first of which is dimension reduction. Using the standard Johnson-Lindenstrauss (JL) lemma (or rather, a derandomized version of the lemma), we first project$T$onto a space of dimension roughly$k=O(\log |T|)$. In the analysis of this first step, the JL lemma ensures that the distances between pairs of points are preserved, while Slepian’s lemma allows us to relate the supremum of the original Gaussian process to the supremum of the projected Gaussian process. As one would expect, the next step involves solving the problem in the projected space (with an algorithm that is allowed run in time exponential in the dimension$k$); we accomplish this via the construction of explicit$\epsilon$-nets for convex functions in Gaussian space. A naive rounding-based approach yields a net of size$k^{O(k)}$, and Dadush and Vempala give a construction achieving size$(\log k)^{O(k)}$. Using a classical comparison lemma called Kanter’s lemma to lift a simple net for the univariate setting to the multivariate setting, Raghu constructs an optimal$\epsilon$-net of size$(1/\epsilon)^{O(k)}$. Raghu concluded with two open problems. The first is a conjecture of Ding, Lee, and Peres, which states that the supremum of the Gaussian free field on a graph and characterizes its cover time asymptotically; if true, this coupled with Raghu’s result would give us a PTAS for computing the cover time of all graphs. The second is whether an analogue of Slepian’s lemma holds for graphs: Given a graph$G = (V,E)$, for any$u,v\in V$we write$\mathrm{hit}_G(u,v)$to denote the expected time it takes for a random walk starting at$u$to hit$v$. Let$G, H$be two graphs on the set vertex set$V$, and suppose$\mathrm{hit}_G(u,v)\le \mathrm{hit}_H(u,v)$for all$u,v\in V$. Does this imply that$\mathrm{cov}(G) \le \mathrm{cov}(H)\$?

For more details, here’s a video of Raghu’s talk at FOCS 2012; see also this blog post by James Lee for more on the nice connection to cover times and the Gaussian free field.

And finally, here are a few pictures from the first day!