Robust Principal Component Pursuit - Background Matrix Recovery

I recently spent some time working on a simple linear algebra problem - decompose a matrix $ M $ into a low-rank component $ L $ and a sparse component $ S $. The algorithm I used was very trivial to implement (and parallelize using map-reduce).

In this post, I will implement this very simple algorithm, explain the objective function and demonstrate its (amazing) effectiveness on a surveillance-camera dataset.

Given a matrix $ M $, we want to decompose it into $ L + S $ where $ L $ is a dense, low-rank matrix and $ S $ is a sparse matrix. The typical approach to solve these problems involves solving an optimization problem involving the nuclear norm.

Surprisingly, the PCP estimate fully recovers both these matrices. Its objective is $ || L_{0} ||_{*} + \lambda||S||_{1} $ subject to $ L + S = M $. Essentially, the $ l1 $ norm enforces sparsity in S and the constraint is intuitive. This works because:

If the singular vectors are reasonably spread, they can be recovered with very high accuracy. This is clearly seen since a whole host of algorithms we use on a daily basis make use of this. The recovery is also not perturbed too much by slight corruptions (which is related to the sparse component). Thus, recovering the singular values while enforcing sparsity in the corruption matrix (the sparse matrix) is a sufficient condition for recovering the original matrix.

Also, the lambda component itself needs no tuning (!!!!). So no cross-validation component exists. $ \lambda $ is just set to $ \frac{1}{\sqrt{n}} $ where $ n $ is the dimension of the matrix.

The augmented Lagrangian problem works with the following:

$ l(L, S, Y) = ||L||_{*} + \lambda||S||_{1} + \langle Y, M - L - S \rangle + \frac{\mu}{2} ||M - L - S||_{F}^{2} $

where:

  • The $ ||L||_{*} $ is the nuclear norm and this ensures that the matrix $ L $ is low-rank
  • The $ ||S||_{1} $ is sparsity inducing $ l1 $ norm.
  • The other two expressions drive the value $ L + S $ towards that of $ M $.

The iteration steps are:

init: S0 = Y0 = 0, mu > 0
while not converged {
 Li+1 = SVD_thresh(M - Si - (1/mu)Yi, mu);
 Si+1 = thresh(M - Li+1 + (1/mu)Yi, lambda * mu);
 Yi+1 = Yi + mu(M - Li+1 - Si+1);
}
return L, S

where SVD_thresh reconstructs a matrix by zero’ing out singular values that fall below a certain threshold and thresh zeros out matrix elements that fall below the specified threshold.

This simple algorithm accurately recovers the low-rank and sparse matrix components!

Background Detection

A surveillance video camera emits a sequence of frames. Now, consider a matrix where one single row (or an individual column) is created using all the pixels of a frame (so a 160 x 130 video becomes a 20800 dimension vector). Since the background component is common across frames, the low-rank component is just a rank–1 matrix (the only basis vector is the background itself). The sparse component consists of the moving parts (people, escalators, walkways and so on).

Pasted below is the python version of the algorithm above (this is a port from the Matlab version shared by the authors here and uses the excellent PROPACK library to compute the partial SVD):

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
#!/usr/bin/env python 
import sys, os
import argparse

import numpy as np
from pypropack import svdp;
from scipy.io import loadmat

MAX_ITERS = 1000
TOL = 1.0e-7

def process_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--mat-file',
        metavar = 'mat_file',
        help = 'Location to .mat file'
    )
    parser.add_argument(
        '--test',
        dest = 'test',
        default = False,
        action = 'store_true'
    )
    parser.add_argument(

    )

    return parser.parse_args()

def choose_svd(n, sv):
    return True

def converged(Z, d_norm):
    err = np.linalg.norm(Z, 'fro') / d_norm
    print 'ERR', err
    return err < TOL

def pcp(X):
    m, n = X.shape
    # Set params 
    lamda = 1./ np.sqrt(m);
    # Initialize
    Y = X;
    u, s, v = svdp(Y, k=1, which='L');
    norm_two = s[0]
    norm_inf = np.linalg.norm( Y[:], np.inf) / lamda;
    dual_norm = max(norm_two, norm_inf);
    Y = Y / dual_norm;

    A_hat = np.zeros((m, n));
    E_hat = np.zeros((m, n));
    mu = 1.25/norm_two 
    mu_bar = mu * 1e7
    rho = 1.5
    d_norm = np.linalg.norm(X, 'fro');

    num_iters = 0;
    total_svd = 0;
    stopCriterion = 1;
    sv = 10;
    while True:
        num_iters += 1;

        temp_T = X - A_hat + (1/mu)*Y;
        E_hat = np.maximum(temp_T - lamda/mu, 0);
        E_hat = E_hat + np.minimum(temp_T + lamda/mu, 0);

        u, s, v = svdp(X - E_hat + (1/mu)*Y, sv, which = 'L');

        diagS = np.diag(s);
        svp = len(np.where(s > 1/mu))

        if svp < sv:
            sv = min(svp + 1, n);

        else:
            sv = min(svp + round(0.05*n), n);

        A_hat = np.dot(
            np.dot(
                u[:,0:svp],
                np.diag(s[0:svp] - 1/mu)
            ), 
            v[0:svp,:]
        )

        total_svd = total_svd + 1;

        Z = X - A_hat - E_hat;

        Y = Y + mu*Z;
        mu = min(mu*rho, mu_bar);

        if converged(Z, d_norm) or num_iters >= MAX_ITERS:
            return A_hat, E_hat

if __name__ == '__main__': 
    args = process_args()
    # Load Data
    if not args.test:
        data = loadmat(args.mat_file)
        A_hat, E_hat = pcp(data['X'])
        print A_hat, E_hat
    else:
        data = (10*np.ones((10, 10))) + (-5 * np.eye(10))
        print data
        A_hat, E_hat = pcp(data)
        np.save('low_rank', A_hat)
        np.save('sparse', E_hat)

Next, let us deploy this on a dataset. I obtained a video from a surveillance camera (a few datasets are available at this webpage. I will use the Escalator dataset from that page).

The following script builds a numpy matrix from the bitmap image sequence:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
'''
Convert a bitmap sequence to matrix.
'''
import os
import numpy as np
import argparse
from PIL import Image

def bitmap_to_mat(bitmap_seq):
	matrix = []

	for bitmap_file in bitmap_seq:
		im = Image.open(bitmap_file).convert('L') # convert to grayscale
		pixels = list(im.getdata())
		matrix.append(pixels)

	return np.array(matrix)

if __name__ == '__main__':
	parser = argparse.ArgumentParser()

	parser.add_argument('--bmp-dir', dest = 'bmp_dir')
	parser.add_argument('--bmp-out', dest = 'bmp_out')

	parsed = parser.parse_args()

	bmp_seq = map(
		lambda s: os.path.join(parsed.bmp_dir, s),
		os.listdir(parsed.bmp_dir)
	)
	res = bitmap_to_mat(bmp_seq)
	np.save(parsed.bmp_out, res)

Next, let us run the decomposition and see what the convergence looks like:

➜  robust_pcp git:(master) python robust_pcp.py --input-file bmp_matrix.npy
ERR 0.799466063211
 WARNING: Maximum dimension of Krylov subspace exceeded prior to convergence. Try increasing KMAX.
 neig =            1
ERR 0.533151636032
 WARNING: Maximum dimension of Krylov subspace exceeded prior to convergence. Try increasing KMAX.
 neig =            1
ERR 0.580203519826
ERR 0.399795997647
 WARNING: Maximum dimension of Krylov subspace exceeded prior to convergence. Try increasing KMAX.
 neig =            1
ERR 0.526129524054
 WARNING: Maximum dimension of Krylov subspace exceeded prior to convergence. Try increasing KMAX.
 neig =            1
ERR 0.576362875779
ERR 0.438452861276
ERR 0.137666099277
 WARNING: Maximum dimension of Krylov subspace exceeded prior to convergence. Try increasing KMAX.
 neig =            1
ERR 0.526604907584
 WARNING: Maximum dimension of Krylov subspace exceeded prior to convergence. Try increasing KMAX.
 neig =            1
ERR 0.57572793864
ERR 0.45163686726
ERR 0.111636379698
ERR 0.0718803322749
ERR 0.048067444228
 WARNING: Maximum dimension of Krylov subspace exceeded prior to convergence. Try increasing KMAX.
 neig =            1
ERR 0.536409101582
ERR 0.373332905257
ERR 0.0414766101901
ERR 0.0264232708867
 WARNING: Maximum dimension of Krylov subspace exceeded prior to convergence. Try increasing KMAX.
 neig =            1
ERR 0.509857058194
ERR 0.341770886808
ERR 0.00828382155722
ERR 0.00369347648645
ERR 0.00224842770609
ERR 0.00133359512154
ERR 0.00076422430761
ERR 0.00043106684577
ERR 0.000242042207018
ERR 0.000134750434297
ERR 7.47073010806e-05
ERR 4.12146122072e-05
ERR 2.26314422366e-05
ERR 1.23645958329e-05
ERR 6.76884094823e-06
ERR 3.71124761483e-06
ERR 2.01918022651e-06
ERR 1.10232817344e-06
ERR 6.04773368406e-07
ERR 3.27599538767e-07
ERR 1.79967651854e-07
ERR 9.8752813134e-08

So in 40 iterations, the low-rank and sparse matrices are completely recovered.

Now, let us take a look at what the decomposition looks like. The following piece of code take one of those matrices we built (from the bitmaps) and animates them (i.e. iterates through the sequence):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
'''
Animate supplied npy files
'''
import argparse
import numpy as np
import time
import Tkinter
from PIL import Image, ImageTk

def animate(matrices, w, h):
	mats = [np.load(x) for x in matrices]

	tk_win = Tkinter.Toplevel()
	tk_win.title('Escalator')
	canvas = Tkinter.Canvas(tk_win, width=7*w, height=7*h)
	canvas.pack()
	tk_ims = [None for _ in mats]
	for i, row in enumerate(mats[0]):
		ims = [Image.new('L', (w, h)) for _ in mats]
		for j, im in enumerate(ims):
			im.putdata(map(float, list(mats[j][i])))
			tk_ims[j] = ImageTk.PhotoImage(im)
			canvas.create_image((j * w) + 200, h, image = tk_ims[j])
			canvas.update()


if __name__ == '__main__':
	parser = argparse.ArgumentParser()

	parser.add_argument(
		'width',
		type = int,
		help = 'frame width'
	)
	parser.add_argument(
		'height',
		type = int,
		help = 'frame width'
	)
	parser.add_argument(
		'--npy-files',
		dest = 'npy_files',
		nargs = '+',
		help = 'numpy matrices'
	)

	parsed = parser.parse_args()

	animate(parsed.npy_files, parsed.width, parsed.height)

And here’s a frame from the resulting animation. The first frame is the original image sequence (in grayscale), the second is the low-rank component (the background) and the sparse component consists of the people:

The algorithm is thus super-powerful and easy to implement.

The full source code is available in this github repository.

References:


Twitter: @shriphani
Instagram: @life_of_ess
Fortior Per Mentem
(c) Shriphani Palakodety 2013-2020