the Eckart-Young-Misky Theorem

The theorem

Eckart-Young-Misky Theorem

A \in \mathbb{R}^{m\times n}

A = U\Sigma V^{\top}= \sum_{i=1}^n \sigma_i \vec{u}_i \vec{v}_i^{\top}

And we define

A_k = \sum_{i=1}^k \sigma_i \vec{u}_i \vec{v}_i

We will see that

(1)

\argmin_{B \in \mathbb{R}^{m\times n}, rk(B) = k} ||A-B||_F= A_k

(2)

\argmin_{B \in \mathbb{R}^{m \times n}, rk(B) = k} ||A-B||_2 = A_k

L2 Norm Case

We will first prove the L2-norm case

||A-A_k||_2 = \sum_{i=k+1}^n ||\sigma_i \vec{u}_i \vec{v}_i^{\top}||_2 = \max_{||\vec{w}||_2=1} ||\vec{w}^{\top}(A-A_k)\vec{w}||_2 = \sigma_{k+1}

Now show that for all $rk(B) = k, ||A-B||_2 \ge \sigma_{k+1}$

\begin{split} ||A-B||_2 &\ge ||(A-B)\vec{w}||_2 \quad (\text{for any } \vec{w} ) \\ &\text{Choose $\vec{w} \in N(B)$} \\ &\ge ||A\vec{w}||_2 \end{split}

Consider $V_{k+1}$

V_{k+1} = \begin{bmatrix} \vec{v}_1 &\vec{v}_2 &\cdots &\vec{v}_{k+1} \end{bmatrix}

Property:

$rk(V_{k+1}) = k+1$

$Dim(N(B)) = n-k$

So…. $(n-k)+(k+1) = n+1 > n$

There exist one dimension overlap in $R(V_{k+1})$ and $N(B)$

So instead of choosing $\vec{w} \in N(B)$ , chose $\vec{w} \in N(B) \cap R(V_{k+1})$ such that

\begin{split} \vec{w} &= V\vec{\alpha} = \begin{bmatrix} V_{k+1} &V_{rest} \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \vdots \\ \alpha_{k+1} \\ 0 \\ \vdots \\ 0 \end{bmatrix} \\ &=\alpha_1 \vec{v}_1 + \alpha_2 \vec{v}_2 + \cdots + \alpha_{k+1} \vec{v}_{k+1} \end{split}

And choose $\alpha_i$ such that $||\vec{w}||_2 = 1$ ⇒ $\sum_{i=1}^{k+1} \alpha_i^2 = 1$

Coming back to

\begin{split} ||A-B||_2 &\ge ||(A-B)\vec{w}||_2 \quad (\text{for any } \vec{w} ) \\ &\quad = ||A\vec{w}||_2 \\ &\quad = ||U\Sigma V^{\top} V\vec{\alpha}||_2 \\ &\quad = ||U\Sigma\vec{\alpha}||_2 \\ &\quad = ||\Sigma \vec{\alpha}||_2 \\ &\quad = \alpha_1^2\sigma_1^2 + \cdots + \alpha_{k+1} ^ 2\sigma_{k+1} ^ 2 \\ &\quad \ge \sigma_{k+1}^2 \end{split}

Frob. Norm Case

||A||_F = \sqrt{\sum_{i,j} A_{ij}^2} = \sqrt{trace(A^{\top}A)}

Nice property about trace:

trace(AB)=trace(BA)

Also:

||AU||_F = ||UA||_F = ||A||_F

Frobenius Norm Invariant to Orthonormal Transform Proof

So:

\begin{split} ||A||_F &=||U\Sigma V^{\top}||_F = ||\Sigma||_F \\ &=\sqrt{\sum_{i=1}^n \sigma_i^2} \end{split}

We still want to show:

A_k = \argmin_{B \in \mathbb{R}^{m \times n}, rk(B) = k} ||A-B||_F

In other words, we want

||A-B||_F \ge ||A-A_k||_F

where

A_k = \sum_{i=1}^k \sigma_i \vec{u}_i \vec{v}_i^{\top}

Let’s start!

\begin{split} ||A-A_k||_F &= ||\sum_{i=k+1}^n \sigma_i \vec{u}_i \vec{v}_i^{\top}||_F \\ &= \sqrt{\sum_{i=k+1}^n \sigma^2} \end{split}

So with this, can we try show that:

\sum_{i=1}^n \sigma_i^2(A-B) \ge \sum_{i = k+1}^n \sigma_i^2(A)

Note in our last proof for the L2-norm case, we have

||A||_2 = \sigma_{\max}(A)

Therefore,

\begin{split} \sigma_{k+i}(A) &= \text{$(k+i)$-th largest $\sigma$ of A} \\ &= \text{largest $\sigma$ after top $(k+i-1)$ $\sigma$ are removed} \\ &= ||A-A_{k+i-1}||_2 \end{split}

Maybe we can do some similar transformation for $\sigma_i^2(A-B)$ ?

Denote $A-B=C$

\sigma_i(A-B)=\sigma_i(C)=||C-C_{i-1}||_2

We already have the fact that:

B \rightarrow \text{rank } k \\ \sigma_{k+1}(B) = 0 \\ ||B-B_k||_2 = 0

Hmm…So we can just add a zero term to the $\sigma_i(A-B)$

\begin{split} \sigma_i(A-B) &=||C-C_{i-1}||_2 + ||B-B_k||_2 \\ &\underbrace{\ge}_{\mathclap{\text{triangular inequality}}}||C+B-C_{i-1}-B_k||_2 \\ &\ge ||A-C_{i-1}-B_k||_2 \end{split}

We know:

rk(B_k)=k, rk(C_{i-1}) \le i-1

We also know this fact that for any two matrices $A, B$ :

rk(A+B) \le rk(A)+rk(B)

So we know for $D = C_{i-1}+B_k$

rk(D) \le k + i-1

So.

\begin{split} \sigma_i(A-B) &\ge ||A-C_{i-1}-B_k||_2 \\ &\ge ||A-D||_2 \\ \end{split}

We know that (from previous section)

\argmin_{D \in \mathbb{R}^{m \times n}, rk(D)=i+k-1} ||A-D||_2 = A_{k+i-1} \\ \min_{D \in \mathbb{R}^{m \times n}, rk(D)=i+k-1} ||A-D||_2 = \sigma_{k+i}(A)

Therefore

\begin{split} \sigma_i(A-B) &\ge ||A-D||_2 \\ &\ge \sigma_{k+i}(A) \end{split}

Yeah!!!

We showed that

\forall B\in \mathbb{R}^{m \times n}, rk(B) = k, \\ \sigma_i(A-B) \ge \sigma_{k+i}(A)

therefore

\sum_{i=1}^n \sigma_i^2(A-B) \ge \sum_{i = k+1}^n \sigma_i^2(A)

And therefore

\sqrt{\sum_{i=1}^n \sigma_i^2(A-B)} \ge \sqrt{\sum_{i = k+1}^n \sigma_i^2(A)}

And therefore

||A-B||_F \ge ||A-A_k||_F

Some wrong type of proof

\begin{split} \min_{rk(B) = k} ||A-B||_F &=\min_{rk(B)=k} ||U\Sigma V^{\top} -B||_F \\ &=\min_{rk(B)=k} ||\Sigma - U^{\top}BV||_F \\ &=\underbrace{\min_{rk(Z)=k, Z \in diag} ||\Sigma - Z||_F}_{\mathclap{\text{Since having elements that does not lay along the diagonal only increases F norm}}} \\ &\text{So just try to pick $Z = \Sigma$ and it completes the proof} \end{split}

What’s wrong???

Last few steps ⇒ proof by talking

When we transform $U^{\top}BV \rightarrow Z$ , if we eliminate the non-diagonal terms, we might be increasing ranks.