The theorem Eckart-Young-Misky Theorem A ∈ R m × n A \in \mathbb{R}^{m\times n} A ∈ R m × n A = U Σ V ⊤ = ∑ i = 1 n σ i u ⃗ i v ⃗ i ⊤ A = U\Sigma V^{\top}= \sum_{i=1}^n \sigma_i \vec{u}_i \vec{v}_i^{\top} A = U Σ V ⊤ = i = 1 ∑ n σ i u i v i ⊤ And we define
A k = ∑ i = 1 k σ i u ⃗ i v ⃗ i A_k = \sum_{i=1}^k \sigma_i \vec{u}_i \vec{v}_i A k = i = 1 ∑ k σ i u i v i We will see that
(1)
arg min B ∈ R m × n , r k ( B ) = k ∣ ∣ A − B ∣ ∣ F = A k \argmin_{B \in \mathbb{R}^{m\times n}, rk(B) = k} ||A-B||_F= A_k B ∈ R m × n , r k ( B ) = k arg min ∣∣ A − B ∣ ∣ F = A k (2)
arg min B ∈ R m × n , r k ( B ) = k ∣ ∣ A − B ∣ ∣ 2 = A k \argmin_{B \in \mathbb{R}^{m \times n}, rk(B) = k} ||A-B||_2 = A_k B ∈ R m × n , r k ( B ) = k arg min ∣∣ A − B ∣ ∣ 2 = A k L2 Norm Case We will first prove the L2-norm case
∣ ∣ A − A k ∣ ∣ 2 = ∑ i = k + 1 n ∣ ∣ σ i u ⃗ i v ⃗ i ⊤ ∣ ∣ 2 = max ∣ ∣ w ⃗ ∣ ∣ 2 = 1 ∣ ∣ w ⃗ ⊤ ( A − A k ) w ⃗ ∣ ∣ 2 = σ k + 1 ||A-A_k||_2 = \sum_{i=k+1}^n ||\sigma_i \vec{u}_i \vec{v}_i^{\top}||_2 = \max_{||\vec{w}||_2=1} ||\vec{w}^{\top}(A-A_k)\vec{w}||_2 = \sigma_{k+1} ∣∣ A − A k ∣ ∣ 2 = i = k + 1 ∑ n ∣∣ σ i u i v i ⊤ ∣ ∣ 2 = ∣∣ w ∣ ∣ 2 = 1 max ∣∣ w ⊤ ( A − A k ) w ∣ ∣ 2 = σ k + 1 Now show that for all r k ( B ) = k , ∣ ∣ A − B ∣ ∣ 2 ≥ σ k + 1 rk(B) = k, ||A-B||_2 \ge \sigma_{k+1} r k ( B ) = k , ∣∣ A − B ∣ ∣ 2 ≥ σ k + 1
∣ ∣ A − B ∣ ∣ 2 ≥ ∣ ∣ ( A − B ) w ⃗ ∣ ∣ 2 ( for any w ⃗ ) Choose w ⃗ ∈ N ( B ) ≥ ∣ ∣ A w ⃗ ∣ ∣ 2 \begin{split}
||A-B||_2 &\ge ||(A-B)\vec{w}||_2 \quad (\text{for any } \vec{w} ) \\
&\text{Choose $\vec{w} \in N(B)$} \\
&\ge ||A\vec{w}||_2
\end{split} ∣∣ A − B ∣ ∣ 2 ≥ ∣∣ ( A − B ) w ∣ ∣ 2 ( for any w ) Choose w ∈ N ( B ) ≥ ∣∣ A w ∣ ∣ 2 Consider V k + 1 V_{k+1} V k + 1
V k + 1 = [ v ⃗ 1 v ⃗ 2 ⋯ v ⃗ k + 1 ] V_{k+1} = \begin{bmatrix}
\vec{v}_1 &\vec{v}_2 &\cdots &\vec{v}_{k+1}
\end{bmatrix} V k + 1 = [ v 1 v 2 ⋯ v k + 1 ] Property:
r k ( V k + 1 ) = k + 1 rk(V_{k+1}) = k+1 r k ( V k + 1 ) = k + 1
D i m ( N ( B ) ) = n − k Dim(N(B)) = n-k D im ( N ( B )) = n − k
So…. ( n − k ) + ( k + 1 ) = n + 1 > n (n-k)+(k+1) = n+1 > n ( n − k ) + ( k + 1 ) = n + 1 > n
There exist one dimension overlap in R ( V k + 1 ) R(V_{k+1}) R ( V k + 1 ) and N ( B ) N(B) N ( B )
So instead of choosing w ⃗ ∈ N ( B ) \vec{w} \in N(B) w ∈ N ( B ) , chose w ⃗ ∈ N ( B ) ∩ R ( V k + 1 ) \vec{w} \in N(B) \cap R(V_{k+1}) w ∈ N ( B ) ∩ R ( V k + 1 ) such that
w ⃗ = V α ⃗ = [ V k + 1 V r e s t ] [ α 1 ⋮ α k + 1 0 ⋮ 0 ] = α 1 v ⃗ 1 + α 2 v ⃗ 2 + ⋯ + α k + 1 v ⃗ k + 1 \begin{split}
\vec{w} &= V\vec{\alpha} = \begin{bmatrix}
V_{k+1} &V_{rest}
\end{bmatrix}
\begin{bmatrix}
\alpha_1 \\
\vdots \\
\alpha_{k+1} \\
0 \\
\vdots \\
0
\end{bmatrix} \\
&=\alpha_1 \vec{v}_1 + \alpha_2 \vec{v}_2 + \cdots + \alpha_{k+1} \vec{v}_{k+1}
\end{split} w = V α = [ V k + 1 V res t ] ⎣ ⎡ α 1 ⋮ α k + 1 0 ⋮ 0 ⎦ ⎤ = α 1 v 1 + α 2 v 2 + ⋯ + α k + 1 v k + 1 And choose α i \alpha_i α i such that ∣ ∣ w ⃗ ∣ ∣ 2 = 1 ||\vec{w}||_2 = 1 ∣∣ w ∣ ∣ 2 = 1 ⇒ ∑ i = 1 k + 1 α i 2 = 1 \sum_{i=1}^{k+1} \alpha_i^2 = 1 ∑ i = 1 k + 1 α i 2 = 1
Coming back to
∣ ∣ A − B ∣ ∣ 2 ≥ ∣ ∣ ( A − B ) w ⃗ ∣ ∣ 2 ( for any w ⃗ ) = ∣ ∣ A w ⃗ ∣ ∣ 2 = ∣ ∣ U Σ V ⊤ V α ⃗ ∣ ∣ 2 = ∣ ∣ U Σ α ⃗ ∣ ∣ 2 = ∣ ∣ Σ α ⃗ ∣ ∣ 2 = α 1 2 σ 1 2 + ⋯ + α k + 1 2 σ k + 1 2 ≥ σ k + 1 2 \begin{split}
||A-B||_2 &\ge ||(A-B)\vec{w}||_2 \quad (\text{for any } \vec{w} ) \\
&\quad = ||A\vec{w}||_2 \\
&\quad = ||U\Sigma V^{\top} V\vec{\alpha}||_2 \\
&\quad = ||U\Sigma\vec{\alpha}||_2 \\
&\quad = ||\Sigma \vec{\alpha}||_2 \\
&\quad = \alpha_1^2\sigma_1^2 + \cdots + \alpha_{k+1} ^ 2\sigma_{k+1} ^ 2 \\
&\quad \ge \sigma_{k+1}^2
\end{split} ∣∣ A − B ∣ ∣ 2 ≥ ∣∣ ( A − B ) w ∣ ∣ 2 ( for any w ) = ∣∣ A w ∣ ∣ 2 = ∣∣ U Σ V ⊤ V α ∣ ∣ 2 = ∣∣ U Σ α ∣ ∣ 2 = ∣∣Σ α ∣ ∣ 2 = α 1 2 σ 1 2 + ⋯ + α k + 1 2 σ k + 1 2 ≥ σ k + 1 2
Frob. Norm Case ∣ ∣ A ∣ ∣ F = ∑ i , j A i j 2 = t r a c e ( A ⊤ A ) ||A||_F = \sqrt{\sum_{i,j} A_{ij}^2} = \sqrt{trace(A^{\top}A)} ∣∣ A ∣ ∣ F = i , j ∑ A ij 2 = t r a ce ( A ⊤ A ) Nice property about trace:
t r a c e ( A B ) = t r a c e ( B A ) trace(AB)=trace(BA) t r a ce ( A B ) = t r a ce ( B A ) Also:
∣ ∣ A U ∣ ∣ F = ∣ ∣ U A ∣ ∣ F = ∣ ∣ A ∣ ∣ F ||AU||_F = ||UA||_F = ||A||_F ∣∣ A U ∣ ∣ F = ∣∣ U A ∣ ∣ F = ∣∣ A ∣ ∣ F Frobenius Norm Invariant to Orthonormal Transform Proof
So:
∣ ∣ A ∣ ∣ F = ∣ ∣ U Σ V ⊤ ∣ ∣ F = ∣ ∣ Σ ∣ ∣ F = ∑ i = 1 n σ i 2 \begin{split}
||A||_F
&=||U\Sigma V^{\top}||_F = ||\Sigma||_F \\
&=\sqrt{\sum_{i=1}^n \sigma_i^2}
\end{split} ∣∣ A ∣ ∣ F = ∣∣ U Σ V ⊤ ∣ ∣ F = ∣∣Σ∣ ∣ F = i = 1 ∑ n σ i 2
We still want to show:
A k = arg min B ∈ R m × n , r k ( B ) = k ∣ ∣ A − B ∣ ∣ F A_k = \argmin_{B \in \mathbb{R}^{m \times n}, rk(B) = k} ||A-B||_F A k = B ∈ R m × n , r k ( B ) = k arg min ∣∣ A − B ∣ ∣ F In other words, we want
∣ ∣ A − B ∣ ∣ F ≥ ∣ ∣ A − A k ∣ ∣ F ||A-B||_F \ge ||A-A_k||_F ∣∣ A − B ∣ ∣ F ≥ ∣∣ A − A k ∣ ∣ F where
A k = ∑ i = 1 k σ i u ⃗ i v ⃗ i ⊤ A_k = \sum_{i=1}^k \sigma_i \vec{u}_i \vec{v}_i^{\top} A k = i = 1 ∑ k σ i u i v i ⊤ Let’s start! ∣ ∣ A − A k ∣ ∣ F = ∣ ∣ ∑ i = k + 1 n σ i u ⃗ i v ⃗ i ⊤ ∣ ∣ F = ∑ i = k + 1 n σ 2 \begin{split}
||A-A_k||_F &= ||\sum_{i=k+1}^n \sigma_i \vec{u}_i \vec{v}_i^{\top}||_F \\
&= \sqrt{\sum_{i=k+1}^n \sigma^2}
\end{split} ∣∣ A − A k ∣ ∣ F = ∣∣ i = k + 1 ∑ n σ i u i v i ⊤ ∣ ∣ F = i = k + 1 ∑ n σ 2 So with this, can we try show that:
∑ i = 1 n σ i 2 ( A − B ) ≥ ∑ i = k + 1 n σ i 2 ( A ) \sum_{i=1}^n \sigma_i^2(A-B) \ge \sum_{i = k+1}^n \sigma_i^2(A) i = 1 ∑ n σ i 2 ( A − B ) ≥ i = k + 1 ∑ n σ i 2 ( A ) Note in our last proof for the L2-norm case, we have
∣ ∣ A ∣ ∣ 2 = σ max ( A ) ||A||_2 = \sigma_{\max}(A) ∣∣ A ∣ ∣ 2 = σ m a x ( A ) Therefore,
σ k + i ( A ) = ( k + i ) -th largest σ of A = largest σ after top ( k + i − 1 ) σ are removed = ∣ ∣ A − A k + i − 1 ∣ ∣ 2 \begin{split}
\sigma_{k+i}(A)
&= \text{$(k+i)$-th largest $\sigma$ of A} \\
&= \text{largest $\sigma$ after top $(k+i-1)$ $\sigma$ are removed} \\
&= ||A-A_{k+i-1}||_2
\end{split} σ k + i ( A ) = ( k + i ) -th largest σ of A = largest σ after top ( k + i − 1 ) σ are removed = ∣∣ A − A k + i − 1 ∣ ∣ 2 Maybe we can do some similar transformation for σ i 2 ( A − B ) \sigma_i^2(A-B) σ i 2 ( A − B ) ?
Denote A − B = C A-B=C A − B = C
σ i ( A − B ) = σ i ( C ) = ∣ ∣ C − C i − 1 ∣ ∣ 2 \sigma_i(A-B)=\sigma_i(C)=||C-C_{i-1}||_2 σ i ( A − B ) = σ i ( C ) = ∣∣ C − C i − 1 ∣ ∣ 2 We already have the fact that:
B → rank k σ k + 1 ( B ) = 0 ∣ ∣ B − B k ∣ ∣ 2 = 0 B \rightarrow \text{rank } k \\
\sigma_{k+1}(B) = 0 \\
||B-B_k||_2 = 0 B → rank k σ k + 1 ( B ) = 0 ∣∣ B − B k ∣ ∣ 2 = 0 Hmm…So we can just add a zero term to the σ i ( A − B ) \sigma_i(A-B) σ i ( A − B )
σ i ( A − B ) = ∣ ∣ C − C i − 1 ∣ ∣ 2 + ∣ ∣ B − B k ∣ ∣ 2 ≥ undefined triangular inequality ∣ ∣ C + B − C i − 1 − B k ∣ ∣ 2 ≥ ∣ ∣ A − C i − 1 − B k ∣ ∣ 2 \begin{split}
\sigma_i(A-B)
&=||C-C_{i-1}||_2 + ||B-B_k||_2 \\
&\underbrace{\ge}_{\mathclap{\text{triangular inequality}}}||C+B-C_{i-1}-B_k||_2 \\
&\ge ||A-C_{i-1}-B_k||_2
\end{split} σ i ( A − B ) = ∣∣ C − C i − 1 ∣ ∣ 2 + ∣∣ B − B k ∣ ∣ 2 triangular inequality ≥ ∣∣ C + B − C i − 1 − B k ∣ ∣ 2 ≥ ∣∣ A − C i − 1 − B k ∣ ∣ 2 We know:
r k ( B k ) = k , r k ( C i − 1 ) ≤ i − 1 rk(B_k)=k, rk(C_{i-1}) \le i-1 r k ( B k ) = k , r k ( C i − 1 ) ≤ i − 1 We also know this fact that for any two matrices A , B A, B A , B :
r k ( A + B ) ≤ r k ( A ) + r k ( B ) rk(A+B) \le rk(A)+rk(B) r k ( A + B ) ≤ r k ( A ) + r k ( B ) So we know for D = C i − 1 + B k D = C_{i-1}+B_k D = C i − 1 + B k
r k ( D ) ≤ k + i − 1 rk(D) \le k + i-1 r k ( D ) ≤ k + i − 1 So.
σ i ( A − B ) ≥ ∣ ∣ A − C i − 1 − B k ∣ ∣ 2 ≥ ∣ ∣ A − D ∣ ∣ 2 \begin{split}
\sigma_i(A-B)
&\ge ||A-C_{i-1}-B_k||_2 \\
&\ge ||A-D||_2 \\
\end{split} σ i ( A − B ) ≥ ∣∣ A − C i − 1 − B k ∣ ∣ 2 ≥ ∣∣ A − D ∣ ∣ 2 We know that (from previous section)
arg min D ∈ R m × n , r k ( D ) = i + k − 1 ∣ ∣ A − D ∣ ∣ 2 = A k + i − 1 min D ∈ R m × n , r k ( D ) = i + k − 1 ∣ ∣ A − D ∣ ∣ 2 = σ k + i ( A ) \argmin_{D \in \mathbb{R}^{m \times n}, rk(D)=i+k-1} ||A-D||_2 = A_{k+i-1} \\
\min_{D \in \mathbb{R}^{m \times n}, rk(D)=i+k-1} ||A-D||_2 = \sigma_{k+i}(A) D ∈ R m × n , r k ( D ) = i + k − 1 arg min ∣∣ A − D ∣ ∣ 2 = A k + i − 1 D ∈ R m × n , r k ( D ) = i + k − 1 min ∣∣ A − D ∣ ∣ 2 = σ k + i ( A ) Therefore
σ i ( A − B ) ≥ ∣ ∣ A − D ∣ ∣ 2 ≥ σ k + i ( A ) \begin{split}
\sigma_i(A-B)
&\ge ||A-D||_2 \\
&\ge \sigma_{k+i}(A)
\end{split} σ i ( A − B ) ≥ ∣∣ A − D ∣ ∣ 2 ≥ σ k + i ( A ) Yeah!!!
We showed that
∀ B ∈ R m × n , r k ( B ) = k , σ i ( A − B ) ≥ σ k + i ( A ) \forall B\in \mathbb{R}^{m \times n}, rk(B) = k, \\
\sigma_i(A-B) \ge \sigma_{k+i}(A) ∀ B ∈ R m × n , r k ( B ) = k , σ i ( A − B ) ≥ σ k + i ( A ) therefore
∑ i = 1 n σ i 2 ( A − B ) ≥ ∑ i = k + 1 n σ i 2 ( A ) \sum_{i=1}^n \sigma_i^2(A-B) \ge \sum_{i = k+1}^n \sigma_i^2(A) i = 1 ∑ n σ i 2 ( A − B ) ≥ i = k + 1 ∑ n σ i 2 ( A ) And therefore
∑ i = 1 n σ i 2 ( A − B ) ≥ ∑ i = k + 1 n σ i 2 ( A ) \sqrt{\sum_{i=1}^n \sigma_i^2(A-B)} \ge \sqrt{\sum_{i = k+1}^n \sigma_i^2(A)} i = 1 ∑ n σ i 2 ( A − B ) ≥ i = k + 1 ∑ n σ i 2 ( A ) And therefore
∣ ∣ A − B ∣ ∣ F ≥ ∣ ∣ A − A k ∣ ∣ F ||A-B||_F \ge ||A-A_k||_F ∣∣ A − B ∣ ∣ F ≥ ∣∣ A − A k ∣ ∣ F Some wrong type of proof min r k ( B ) = k ∣ ∣ A − B ∣ ∣ F = min r k ( B ) = k ∣ ∣ U Σ V ⊤ − B ∣ ∣ F = min r k ( B ) = k ∣ ∣ Σ − U ⊤ B V ∣ ∣ F = min r k ( Z ) = k , Z ∈ d i a g ∣ ∣ Σ − Z ∣ ∣ F undefined Since having elements that does not lay along the diagonal only increases F norm So just try to pick Z = Σ and it completes the proof \begin{split}
\min_{rk(B) = k} ||A-B||_F
&=\min_{rk(B)=k} ||U\Sigma V^{\top} -B||_F \\
&=\min_{rk(B)=k} ||\Sigma - U^{\top}BV||_F \\
&=\underbrace{\min_{rk(Z)=k, Z \in diag} ||\Sigma - Z||_F}_{\mathclap{\text{Since having elements that does not lay along the diagonal only increases F norm}}} \\
&\text{So just try to pick $Z = \Sigma$ and it completes the proof}
\end{split} r k ( B ) = k min ∣∣ A − B ∣ ∣ F = r k ( B ) = k min ∣∣ U Σ V ⊤ − B ∣ ∣ F = r k ( B ) = k min ∣∣Σ − U ⊤ B V ∣ ∣ F = Since having elements that does not lay along the diagonal only increases F norm r k ( Z ) = k , Z ∈ d ia g min ∣∣Σ − Z ∣ ∣ F So just try to pick Z = Σ and it completes the proof What’s wrong???
Last few steps ⇒ proof by talking
When we transform U ⊤ B V → Z U^{\top}BV \rightarrow Z U ⊤ B V → Z , if we eliminate the non-diagonal terms, we might be increasing ranks.