§2.3: Total influence

A very important quantity in the analysis of a boolean function is the sum of its influences.

Definition 26 The total influence of $f : \{-1,1\}^n \to {\mathbb R}$ is defined to be $\mathbf{I}[f] = \sum_{i=1}^n \mathbf{Inf}_i[f].$

For boolean-valued functions $f : \{-1,1\}^n \to \{-1,1\}$ the total influence has several additional interpretations. First, it is often referred to as the average sensitivity of $f$ because of the following proposition:

Proposition 27 For $f : \{-1,1\}^n \to \{-1,1\}$ $\mathbf{I}[f] = \mathop{\bf E}_{{\boldsymbol{x}}}[\mathrm{sens}_f({\boldsymbol{x}})],$ where $\mathrm{sens}_f(x)$ is the sensitivity of $f$ at $x$, defined to be the number of pivotal coordinates for $f$ on input $x$.

Proof: \begin{multline*} \mathbf{I}[f] = \sum_{i=1}^n \mathbf{Inf}_i[f] =
\sum_{i=1}^n \mathop{\bf Pr}_{{\boldsymbol{x}}}[f({\boldsymbol{x}}) \neq f({\boldsymbol{x}}^{\oplus i})]
\\ = \sum_{i=1}^n \mathop{\bf E}_{{\boldsymbol{x}}}[\boldsymbol{1}_{f({\boldsymbol{x}}) \neq f({\boldsymbol{x}}^{\oplus i})}] = \mathop{\bf E}_{{\boldsymbol{x}}}\left[\sum_{i=1}^n \boldsymbol{1}_{f({\boldsymbol{x}}) \neq f({\boldsymbol{x}}^{\oplus i})}\right] = \mathop{\bf E}_{{\boldsymbol{x}}}[\mathrm{sens}_f({\boldsymbol{x}})]. \quad \Box \end{multline*}

The total influence of $f : \{-1,1\}^n \to \{-1,1\}$ is also closely related to the size of its edge boundary; from Fact 14 we deduce:

Fact 28 The fraction of edges in the Hamming cube $\{-1,1\}^n$ which are boundary edges for $f : \{-1,1\}^n \to \{-1,1\}$ is equal to $\frac{1}{n} \mathbf{I}[f]$.

Examples 29 (Recall Examples 15.) For boolean-valued functions $f : \{-1,1\}^n \to \{-1,1\}$ the total influence ranges between $0$ and $n$. It is minimized by the constant functions $\pm 1$ which have total influence $0$. It is maximized by the parity function $\chi_{[n]}$ and its negation which have total influence $n$; every coordinate is pivotal on every input for these functions. The dictator functions (and their negations) have total influence $1$. The total influence of $\mathrm{OR}_n$ and $\mathrm{AND}_n$ is very small: $n2^{1-n}$. On the other hand, the total influence of $\mathrm{Maj}_n$ is fairly large: roughly $\sqrt{2/\pi}\sqrt{n}$ for large $n$.

By virtue of Proposition 20 we have another interpretation for the total influence of monotone functions:

Proposition 30 If $f : \{-1,1\}^n \to \{-1,1\}$ is monotone then $\mathbf{I}[f] = \sum_{i=1}^n \widehat{f}(i).$

This sum of the degree-$1$ Fourier coefficients has a natural interpretation in social choice:

Proposition 31 Let $f : \{-1,1\}^n \to \{-1,1\}$ be a voting rule for a $2$-candidate election. Given votes ${\boldsymbol{x}} = ({\boldsymbol{x}}_1, \dots, {\boldsymbol{x}}_n)$, let $\boldsymbol{w}$ be the number of votes which agree with the outcome of the election, $f({\boldsymbol{x}})$. Then $\mathop{\bf E}[\boldsymbol{w}] = \frac{n}{2} + \frac12 \sum_{i=1}^n \widehat{f}(i).$

Proof: By the formula for Fourier coefficients, $$\label{eqn:deg-1-sum} \sum_{i=1}^n \widehat{f}(i) = \sum_{i=1}^n \mathop{\bf E}_{{\boldsymbol{x}}}[f({\boldsymbol{x}}) {\boldsymbol{x}}_i] = \mathop{\bf E}_{{\boldsymbol{x}}}[f({\boldsymbol{x}})({\boldsymbol{x}}_1 + {\boldsymbol{x}}_2 + \cdots + {\boldsymbol{x}}_n)].$$ Now ${\boldsymbol{x}}_1 + \cdots + {\boldsymbol{x}}_n$ equals the difference between the number of votes for candidate $1$ and the number of votes for candidate $-1$. Hence $f({\boldsymbol{x}})({\boldsymbol{x}}_1 + \cdots + {\boldsymbol{x}}_n)$ equals the difference between the number of votes for the winner and the number of votes for the loser; i.e., $\boldsymbol{w} – (n-\boldsymbol{w}) = 2\boldsymbol{w} – n$. The result follows. $\Box$

Rousseau [Rou62] suggested that the ideal voting rule is one which maximizes the number of votes which agree with the outcome. Here we show that the majority rule has this property (at least when $n$ is odd):

Theorem 32 The unique maximizers of $\sum_{i=1}^n \widehat{f}(i)$ among all $f : \{-1,1\}^n \to \{-1,1\}$ are the majority functions. In particular, $\mathbf{I}[f] \leq \mathbf{I}[\mathrm{Maj}_n] = \sqrt{2/\pi}\sqrt{n} + O(n^{-1/2})$ for all monotone $f$.

Proof: From \eqref{eqn:deg-1-sum}, $\sum_{i=1}^n \widehat{f}(i) = \mathop{\bf E}_{{\boldsymbol{x}}}[f({\boldsymbol{x}})({\boldsymbol{x}}_1 + {\boldsymbol{x}}_2 + \cdots + {\boldsymbol{x}}_n)] \leq \mathop{\bf E}_{{\boldsymbol{x}}}[|{\boldsymbol{x}}_1 + {\boldsymbol{x}}_2 + \cdots + {\boldsymbol{x}}_n|],$ since $f({\boldsymbol{x}}) \in \{-1,1\}$ always. Equality holds if and only if $f(x) = \mathrm{sgn}(x_1 + \cdots + x_n)$ whenever $x_1 + \cdots + x_n \neq 0$. The second statement of the theorem follows from Proposition 30 and Exercise 18 in this chapter. $\Box$

Let’s now take a look at more analytic expressions for the total influence. By definition, if $f : \{-1,1\}^n \to {\mathbb R}$ then $$\label{eqn:tinf-gradient} \mathbf{I}[f] = \sum_{i=1}^n \mathbf{Inf}_i[f] = \sum_{i=1}^n \mathop{\bf E}_{{\boldsymbol{x}}}[\mathrm{D}_i f({\boldsymbol{x}})^2] = \mathop{\bf E}_{{\boldsymbol{x}}}\left[\sum_{i=1}^n \mathrm{D}_i f({\boldsymbol{x}})^2\right].$$ This motivates the following definition:

Definition 33 The (discrete) gradient operator $\nabla$ maps the function $f : \{-1,1\}^n \to {\mathbb R}$ to the function $\nabla f : \{-1,1\}^n \to {\mathbb R}^n$ defined by $\nabla f(x) = (\mathrm{D}_1 f(x), \mathrm{D}_2 f(x), \dots, \mathrm{D}_n f(x)).$

Note that for $f : \{-1,1\}^n \to \{-1,1\}$ we have $\|\nabla f(x)\|_2^2 = \mathrm{sens}_f(x)$, where $\| \cdot \|_2$ is the usual Euclidean norm in ${\mathbb R}^n$. In general, from \eqref{eqn:tinf-gradient} we deduce:

Proposition 34 For $f : \{-1,1\}^n \to {\mathbb R}$, $\mathbf{I}[f] = \mathop{\bf E}_{{\boldsymbol{x}}}[\|\nabla f({\boldsymbol{x}})\|_2^2].$

An alternative analytic definition involves introducing the Laplacian:

Definition 35 The Laplacian operator $\mathrm{L}$ is the linear operator on functions $f : \{-1,1\}^n \to {\mathbb R}$ defined by $\mathrm{L} = \sum_{i=1}^n \mathrm{L}_i$.

In the exercises you are asked to verify the following:

Proposition 36 For $f : \{-1,1\}^n \to {\mathbb R}$,

• $\displaystyle \mathrm{L} f (x) = (n/2)\bigl(f(x) – \mathop{\mathrm{avg}}_{i \in [n]} \{f(x^{\oplus i})\}\bigr)$,
• $\displaystyle \mathrm{L} f (x) = f(x) \cdot \mathrm{sens}_f(x) \quad$ if $f : \{-1,1\}^n \to \{-1,1\}$,
• $\displaystyle \mathrm{L} f = \sum_{S \subseteq [n]} |S|\,\widehat{f}(S)\,\chi_S$,
• $\displaystyle \langle f, \mathrm{L} f \rangle = \mathbf{I}[f]$.

We can obtain a Fourier formula for the total influence of a function using Theorem 19; when we sum that theorem over all $i \in [n]$ the Fourier weight $\widehat{f}(S)^2$ is counted exactly $|S|$ times. Hence:

Theorem 37 For $f : \{-1,1\}^n \to {\mathbb R}$, $$\label{eqn:total-influence-formula} \mathbf{I}[f] = \sum_{S \subseteq [n]} |S| \widehat{f}(S)^2 = \sum_{k=0}^n k \cdot \mathbf{W}^{k}[f].$$ For $f : \{-1,1\}^n \to \{-1,1\}$ we can express this using the spectral sample: $\mathbf{I}[f] = \mathop{\bf E}_{\boldsymbol{S} \sim \mathscr{S}_{f}}[|\boldsymbol{S}|].$

Thus the total influence of $f : \{-1,1\}^n \to \{-1,1\}$ also measures the average “height” or degree of its Fourier weights.

Finally, from Proposition 1.13 we have $\mathop{\bf Var}[f] = \sum_{k > 0} \mathbf{W}^{k}[f]$; comparing this with \eqref{eqn:total-influence-formula} we immediately deduce a simple but important fact called the Poincaré inequality.

Poincaré Inequality For any $f : \{-1,1\}^n \to {\mathbb R}$, $\mathop{\bf Var}[f] \leq \mathbf{I}[f]$.

Equality holds in the Poincaré inequality if and only if all of $f$’s Fourier weight is at degrees $0$ and $1$; i.e., $\mathbf{W}^{\leq 1}[f] = \mathop{\bf E}[f^2]$. For boolean-valued $f : \{-1,1\}^n \to \{-1,1\}$, Exercise 1.19 tells us this can only occur if $f = \pm 1$ or $f = \pm \chi_i$ for some $i$.

For boolean-valued $f : \{-1,1\}^n \to {\mathbb R}$, the Poincaré inequality can be viewed as an (edge-)isoperimetric inequality, or (edge-)expansion bound, for the Hamming cube. If we think of $f$ as the indicator function for a set $A \subseteq \{-1,1\}^n$ of “measure” $\alpha = |A|/2^n$, then $\mathop{\bf Var}[f] = 4\alpha(1-\alpha)$ (Fact 1.14) whereas $\mathbf{I}[f]$ is $n$ times the (fractional) size of $A$’s edge boundary. In particular, the Poincaré inequality says that subsets $A \subseteq \{-1,1\}^n$ of measure $\alpha = 1/2$ must have edge boundary at least as large as those of the dictator sets.

For $\alpha \notin \{0, 1/2, 1\}$ the Poincaré inequality is not sharp as an edge-isoperimetric inequality for the Hamming cube; for small $\alpha$ even the asymptotic dependence is not optimal. Precisely optimal edge-isoperimetric results (and also vertex-isoperimetric results) are known for the Hamming cube. The following simplified theorem is optimal for $\alpha$ of the form $2^{-i}$:

Theorem 38 For $f : \{-1,1\}^n \to \{-1,1\}$ with $\alpha = \min\{\mathop{\bf Pr}[f = 1], \mathop{\bf Pr}[f=-1]\}$, $2\alpha \log(1/\alpha) \leq \mathbf{I}[f].$

This result illustrates an important recurring concept in the analysis of boolean functions: the Hamming cube is a “small-set expander”. Roughly speaking, this is the idea that “small” subsets $A \subseteq \{-1,1\}^n$ have unusually large “boundary size”.

8 comments to §2.3: Total influence

• bireswar

In the definition of discrete gradient operator will it be \nabla f: \{-1,1}^n -> etc ?

• Sorry, I don’t understand the question…

• bireswar

It is about a tiny typo. I think the ^n is missing there.

• Ah, got it. Fixed, thanks!

• John Engbers

Stylistic comment (very minor): In this section, all expected values except the one in (2), the one in Prop 31, and in the sentence after the Poincare Inequality have the “_{x}” (or equivalent) attached.

• Thanks — sometimes I’m willing to drop it, but I shouldn’t do it when I refer to x inside the expectation, as in (2). Fixed.

• Stefan Schneider

In the first proof, the layout is messed up. Wrong placement of ampersands?

• Hi Stefan — it’s a multline environment. I agree it doesn’t come out ideally…