Course: Math 833 - Modern Discrete Probability (MDP)
Author: Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison
Updated: Nov 23, 2020
Copyright: © 2020 Sebastien Roch
Throughout this section, $\|\mathbf{v}\|$ is the Euclidean norm of $\mathbf{v} \in\mathbb{R}^d$. The Euclidean norm naturally generalizes to matrices. Indeed recall that the Frobenius norm of an $n \times m$ matrix $A \in \mathbb{R}^{n \times m}$ is defined as
$$ \|A\|_F = \sqrt{\sum_{i=1}^n \sum_{j=1}^m a_{ij}^2}. $$Notice that the Frobenius norm does not directly relate to $A$ as a representation of a linear map. It is desirable in many contexts to quantify how two matrices differ in terms of how they act on vectors.
For instance, one is often interested in bounding quantities of the following form. Let $B, B' \in \mathbb{R}^{n \times m}$ and let $\mathbf{x} \in \mathbb{R}^m$ be of unit norm. What can be said about $\|B \mathbf{x} - B' \mathbf{x}\|$? Intuitively, what we would like is this: if the norm of $B - B'$ is small then $B$ is close to $B'$ as a linear map, that is, the vector norm $\|B \mathbf{x} - B' \mathbf{x}\|$ is small for any unit vector $\mathbf{x}$. The following definition provides us with such a notion. Define $\mathbb{S}^{m-1} = \{\mathbf{x} \in \mathbb{R}^m\,:\,\|\mathbf{x}\| = 1\}$.
Definition (Induced Norm): The $2$-norm of a matrix $A \in \mathbb{R}^{n \times m}$ is
$$ \|A\|_2 = \max_{\mathbf{0} \neq \mathbf{x} \in \mathbb{R}^m} \frac{\|A \mathbf{x}\|}{\|\mathbf{x}\|} = \max_{\mathbf{x} \in \mathbb{S}^{m-1}} \|A \mathbf{x}\|. $$$\lhd$
The equality in the definition uses the homogeneity of the vector norm. Also the definition implicitly uses the Extreme Value Theorem. In this case, we use the fact that the function $f(\mathbf{x}) = \|A \mathbf{x}\|$ is continuous and the set $\mathbb{S}^{m-1}$ is closed and bounded to conclude that there exists $\mathbf{x}^* \in \mathbb{S}^{m-1}$ such that $f(\mathbf{x}^*) \geq f(\mathbf{x})$ for all $\mathbf{x} \in \mathbb{S}^{m-1}$.
Exercise: Let $A \in \mathbb{R}^{n \times m}$. Use Cauchy-Schwarz to show that
$$ \|A\|_2 = \max \left\{ \mathbf{x}^T A \mathbf{y} \,:\, \|\mathbf{x}\| = \|\mathbf{y}\| = 1 \right\}. $$$\lhd$
Exercise: Let $A \in \mathbb{R}^{n \times m}$.
(a) Show that $\|A\|_F^2 = \sum_{j=1}^m \|A \mathbf{e}_j\|^2$.
(b) Use (a) and Cauchy-Schwarz to show that $\|A\|_2 \leq \|A\|_F$.
(c) Give an example such that $\|A\|_F = \sqrt{n} \|A\|_2$. $\lhd$
The $2$-norm of a matrix has many other useful properties.
Lemma (Properties of the Induced Norm): Let $A, B \in \mathbb{R}^{n \times m}$ and $\alpha \in \mathbb{R}$. The following hold:
(a) $\|A \mathbf{x}\| \leq \|A\|_2 \|\mathbf{x}\|$, $\forall \mathbf{0} \neq \mathbf{x} \in \mathbb{R}^m$
(b) $\|A\|_2 \geq 0$
(c) $\|A\|_2 = 0$ if and only if $A = 0$
(d) $\|\alpha A\|_2 = |\alpha| \|A\|_2$
(e) $\|A + B \|_2 \leq \|A\|_2 + \|B\|_2$
(f) $\|A B \|_2 \leq \|A\|_2 \|B\|_2$.
Proof: These properties all follow from the definition of the induced norm and the corresponding properties for the vector norm:
Claims (a) and (b) are immediate from the definition.
For (c) note that $\|A\|_2 = 0$ implies $\|A \mathbf{x}\|_2 = 0, \forall \mathbf{x} \in \mathbb{S}^{m-1}$, so that $A \mathbf{x} = \mathbf{0}, \forall \mathbf{x} \in \mathbb{S}^{m-1}$. In particular, $a_{ij} = \mathbf{e}_i^T A \mathbf{e}_j = 0, \forall i,j$.
For (d), (e), (f), observe that for all $\mathbf{x} \in \mathbb{S}^{m-1}$
$\square$
Exercise: Use Cauchy-Schwarz to show that for any $A, B$ it holds that
$$ \|A B \|_F \leq \|A\|_F \|B\|_F. $$$\lhd$
For a symmetric matrix $C \in \mathbb{R}^{d \times d}$, we let $\lambda_j(C)$, $j=1, \ldots, d$, be the eigenvalues of $C$ in non-increasing order with corresponding orthonormal eigenvectors $\mathbf{v}_j(C)$, $j=1, \ldots, d$. Define the subspaces
$$ \mathcal{V}_k(C) = \mathrm{span}(\mathbf{v}_1(C), \ldots, \mathbf{v}_k(C)) \quad\text{and}\quad \mathcal{W}_{d-k+1}(C) = \mathrm{span}(\mathbf{v}_k(C), \ldots, \mathbf{v}_d(C)). $$The following lemma is one version of what is known as Weyl's Inequality.
Lemma (Weyl): Let $A \in \mathbb{R}^{d \times d}$ and $B \in \mathbb{R}^{d \times d}$ be symmetric matrices. Then, for all $j=1, \ldots, d$,
$$ \max_{j \in [d]} \left|\lambda_j(B) - \lambda_j(A)\right| \leq \|B- A\|_2 $$where $\|C\|_2$ is the induced $2$-norm of $C$.
Proof idea: We use the extremal characterization of the eigenvalues together with a dimension argument.
Exercise: Prove the following claim, which is known as the Subspace Intersection Lemma. Let $\mathcal{S}_1$ and $\mathcal{S}_2$ be linear subspaces of $\mathbb{R}^d$ and let
$$ \mathcal{S}_1 + \mathcal{S}_2 = \{\mathbf{x}_1 + \mathbf{x}_2 \,:\, \forall \mathbf{x}_1 \in \mathcal{S}_1, \mathbf{x}_2 \in \mathcal{S}_2\}. $$Then it holds that
$$ \mathrm{dim}(\mathcal{S}_1 + \mathcal{S}_2) = \mathrm{dim}(\mathcal{S}_1) + \mathrm{dim}(\mathcal{S}_2) - \mathrm{dim}(\mathcal{S}_1 \cap \mathcal{S}_2). $$[Hint: Consider a basis of $\mathcal{S}_1 \cap \mathcal{S}_2$ and complete into bases of $\mathcal{S}_1$ and $\mathcal{S}_2$. Show that the reuslting list of vectors is linear independent.] $\lhd$
Exercise: Show that, for any linear subspaces $\mathcal{S}_1, \ldots, \mathcal{S}_m$ of $\mathcal{V} = \mathbb{R}^d$, it holds that
$$ \mathrm{dim}\left(\bigcap_{k=1}^m \mathcal{S}_k\right) \geq \sum_{k=1}^m \mathrm{dim}\left(\mathcal{S}_k\right) - (m-1) \,\mathrm{dim}(\mathcal{V}). $$[Hint: Use the Subspace Intersection Lemma and induction.] $\lhd$
Proof (Weyl): Let $H = B - A$. We prove only the upper bound. The other direction follows from interchanging the roles of $A$ and $B$. Because
$$ \mathrm{dim}(\mathcal{V}_j(B)) + \mathrm{dim}(\mathcal{W}_{d-j+1}(A)) = j + (d-j+1) = d+1 $$it follows from the exercise above that
$$ \mathrm{dim}\left(\mathcal{V}_j(B) \cap \mathcal{W}_{d-j+1}(A)\right) \geq d+1 - d = 1. $$Hence the $\mathcal{V}_j(B) \cap \mathcal{W}_{d-j+1}(A)$ is non-empty. Let $\mathbf{v}$ be a unit vector in that intersection.
By Courant-Fischer,
$$ \lambda_j(B) \leq \langle \mathbf{v}, (A+H) \mathbf{v}\rangle = \langle \mathbf{v}, A \mathbf{v}\rangle + \langle \mathbf{v}, H \mathbf{v}\rangle \leq \lambda_j(A) + \langle \mathbf{v}, H \mathbf{v}\rangle. $$Moreover, by Cauchy-Schwarz, since $\|\mathbf{v}\|=1$
$$ \langle \mathbf{v}, H \mathbf{v}\rangle \leq \|\mathbf{v}\| \|H\mathbf{v}\| \leq \|H\|_2 $$which proves the claim. $\square$
While Weyl's Inequality indicates that the eigenvalues of $A$ and $B$ are close when $\|A - B\|_2$ is small, it says nothing about the eigenvectors. The following theorem gives a related bound for eigenvectors. It is usually stated in terms of the angle between the eigenvectors. Here we give a version that will be more suited to our applications. We do not optimize the constants. We use the same notation as in the previous section.
Theorem (Davis-Kahan): Let $A \in \mathbb{R}^{d \times d}$ and $B \in \mathbb{R}^{d \times d}$ be symmetric matrices. For an $i \in \{1,\ldots,d\}$, assume that
$$ \delta := \min_{j \neq i} |\lambda_i(A) - \lambda_j(A)| > 0. $$Then
$$ \min_{s \in \{+1,-1\}} \|\mathbf{v}_i(A) - s \mathbf{v}_i(B)\|^2 \leq \frac{8 \|A - B\|_2^2}{\delta^2}. $$Proof: Expand $\mathbf{v}_i(B)$ in the basis formed by the eigenvectors of $A$, that is, $ \mathbf{v}_i(B) = \sum_{j=1}^d \langle \mathbf{v}_i(B), \mathbf{v}_j(A) \rangle \,\mathbf{v}_j(A), $ where we used the orthonormality of the $\mathbf{v}_j(A)$'s. On the one hand,
$$ \begin{align*} \|(A - \lambda_i(A) I) \,\mathbf{v}_i(B)\|^2 &= \left\|\sum_{j=1}^d \langle \mathbf{v}_i(B), \mathbf{v}_j(A) \rangle (A - \lambda_i(A) I)\,\mathbf{v}_j(A)\right\|^2\\ &= \left\|\sum_{j=1, j \neq i}^d \langle \mathbf{v}_i(B), \mathbf{v}_j(A) \rangle (\lambda_j(A) - \lambda_i(A))\,\mathbf{v}_j(A)\right\|^2\\ &= \sum_{j=1, j \neq i}^d \langle \mathbf{v}_i(B), \mathbf{v}_j(A) \rangle^2 (\lambda_j(A) - \lambda_i(A))^2\\ &\geq \delta^2 (1 - \langle \mathbf{v}_i(B), \mathbf{v}_i(A) \rangle^2), \end{align*} $$where, on the last two lines, we used the orthonormality of the $\mathbf{v}_j(A)$'s and $\mathbf{v}_j(B)$'s.
On the other hand, letting $E = A - B$, by the triangle inequality
$$ \begin{align*} \|(A - \lambda_i(A) I) \,\mathbf{v}_i(B)\| &= \|(B + E - \lambda_i(A) I) \,\mathbf{v}_i(B)\|\\ &\leq \|(B - \lambda_i(A) I) \,\mathbf{v}_i(B)\| + \|E \,\mathbf{v}_i(B)\|\\ &\leq |\lambda_i(B) - \lambda_i(A)| \|\mathbf{v}_i(B)\| + \|E\|_2 \|\mathbf{v}_i(B)\|\\ &\leq |\lambda_i(B) - \lambda_i(A)| + \|E\|_2, \end{align*} $$where, on the last line, we used the orthonormality of $\mathbf{v}_i(B)$ and the definition of $\|E\|_2$. By Weyl, we also have $|\lambda_i(B) - \lambda_i(A)| \leq \|E\|_2$.
Combining the last two inequalities gives
$$ (1 - \langle \mathbf{v}_i(B), \mathbf{v}_i(A) \rangle^2) \leq \frac{4 \|E\|_2^2}{\delta^2}. $$The result follows by noting that, since $|\langle \mathbf{v}_i(B), \mathbf{v}_i(A) \rangle| \leq 1$ by Cauchy-Schwarz,
$$ \begin{align*} \min_{s \in \{+1,-1\}} \|\mathbf{v}_i(A) - s \mathbf{v}_i(B)\|^2 &= 2 - 2 |\langle \mathbf{v}_i(B), \mathbf{v}_i(A) \rangle|\\ &\leq 2 (1 - \langle \mathbf{v}_i(B), \mathbf{v}_i(A) \rangle^2)\\ &\leq \frac{8 \|E\|_2^2}{\delta^2}. \end{align*} $$$\square$