Analysis of Multiplicative Weights Algorithm

Claim

Assume that all losses are in $[0,1]$ , the regret of the Multiplicative Weight Algorithm run for $T$ steps with a parameter $0 < \epsilon \le 1/2$ is:

R_T \le \epsilon T + \frac{\ln n}{\epsilon}

In particular, if $T > 4 \ln n$ and we choose $\epsilon = \sqrt{\frac{\ln n}{T}}$ , we have

R_T \le 2\sqrt{T \ln n}

Outline

Define

W_t = \sum_{i=1}^n w_i^{(t)}

to be the sum of the weights at time $t$ , at the end of the execution of the algorithm:

If $W_{T+1}$ is small, then the offline optimum is large

If the loss suffered by the algorithm is large, then $W_{T+1}$ is small

Therefore

If the algorithm suffers a large loss, then the offline optimum is also large, so regret is small

Call the offline optimum $L^*$ :

L^* = \min_{x \in \Delta} \sum_{t=1}^T \sum_{i=1}^n x_i \cdot l_i^{(t)} = \min_{i=1, \dots, n} \sum_{t=1}^T l_i^{(t)}

and we can use $L_t$ to denote the loss incurred by the algorithm at time $t$

L_t = \sum_{i=1}^n x_i^{(t)}l_i^{(t)}

Lemmas

Lemma 2

If $W_{T+1}$ is small, then the offline optimum is large

W_{T+1} \ge (1-\epsilon)^{L^*}

Proof

Let $j$ be any strategy, then

W_{T+1} = \sum_{i=1}^n w_{i}^{(T+1)} \ge w_j^{(T+1)} = \prod_{t=1}^T (1 - \epsilon)^{l_j^{(t)}} = (1-\epsilon)^{\sum_{t=1}^T l_j^{(t)}}

Now let $i^*$ be an offline optimal strateggy,

L^* = \sum_{t=1}^T l_{i^*}^{(t)}

And observe that for $i^*$ ,

W_{T+1} \ge (1-\epsilon)^{\sum_{t=1}^T l_{i^*}^{(t)}} = (1-\epsilon)^{L^*}

Lemma 3

If the loss suffered by the algorithm is large, then $W_{T+1}$ is small

W_{T+1} \le n \cdot \prod_{t=1}^T (1-\epsilon L_t)

Proof

We know $W_1 = n$ , so we can prove

W_{T+1} \le W_t \cdot (1-\epsilon L_t)

Observe that

\begin{split} W_{t+1} &= \sum_{i=1}^n w_i^{(t+1)} \\ &= \sum_{i=1}^n w_i^{(t)} \cdot (1-\epsilon)^{l_i^{(t)}} \\ &\le \sum_{i=1}^n w_{i}^{(t)} \cdot (1-\epsilon \cdot l_i^{(t)}) \qquad \because \forall z, \epsilon \in [0,1], (1-\epsilon)^z \le 1- \epsilon z \\ &=W_t \sum_{i=1}^n x_i^{(t)} \cdot (1-\epsilon \cdot l_i^{(t)}) \qquad \because x_i^{(t)} = w_i^{(t)} / W_t \\ &= W_t \cdot (\sum_{i=1}^n x_i^{(t)} - \sum_{i=1}^n \epsilon \cdot x_i^{(t)} \cdot l_i^{(t)}) \\ &= W_t \cdot (1-\epsilon L_t) \end{split}

Proof

LEMMA 2: If $W_{T+1}$ is small, then the offline optimum is large

W_{T+1} \ge (1-\epsilon)^{L^*}

LEMMA 3: If the loss suffered by the algorithm is large, then $W_{T+1}$ is small

W_{T+1} \le n \cdot \prod_{t=1}^T (1-\epsilon L_t)

Putting them together

\begin{split} (1-\epsilon)^{L^*} &\le n \cdot \prod_{t=1}^T (1-\epsilon L_t) \\ L^*\ln(1-\epsilon) &\le \ln n + \sum_{t+1}^T \ln(1-\epsilon L_t) \end{split}

Note that because of tarylor series $\ln (1-z) = \sum_{n=1}^\infin - \frac{z^n}{n}$ , we have inequality

\forall 0 \le z \le 1/2, -z-z^2 \le \ln (1-z) \le -z

Then apply this inequality

L^* \cdot (-\epsilon - \epsilon^2) \le \ln n - \epsilon \cdot \sum_{i=1}^T L_t

Divide by $\epsilon$ and rearrange a bit,

\sum_{i=1}^T L_t - L^* \le \epsilon L^* + \frac{\ln n}{\epsilon} \le \epsilon T + \frac{\ln n}{\epsilon}