Processing math: 100%

Low-rank Matrix Completion Demo¶

2017-11-07 Greg Ongie, University of Michigan

Updated 2018-08-18: added comparison with POGM

2018-08-18 Julia 0.7.0 by Jeff Fessler

TOP SECRET

On 2017-11-06 10:34:12 GMT Agent 556 obtained a photo of the illegal arms dealer, code name **Professor X-Ray**. However, Agent 556 spilled soup on their lapel-camera shorting out several CCD sensors. The image matrix has several missing entries--we were only able to recover 25% of data! Agent 551: Your misson, should you choose to accept it, is to recover the missing entries and uncover the true identity of **Professor X-Ray**.

In [1]:
using LinearAlgebra
using DelimitedFiles: readdlm
using LaTeXStrings
using Plots; pyplot()
function jim(Y; title="") # jiffy image display 
    heatmap(Y', color=:grays, aspectratio=1,
        xtick = [1, size(Y,1)],
        ytick = [1, size(Y,2)],
        title = title,
        yflip = true,
    )
end
Out[1]:
jim (generic function with 1 method)
In [2]:
# Read in data with missing pixels set to zero
Y = readdlm("professorxray.txt")'
@show size(Y)
jim(Y, title="Y: Corrupted image matrix of Professor X-Ray\n (missing pixels set to 0)")
size(Y) = (205, 300)
Out[2]:
In [3]:
# Create binary mask Omega (true=observed, false=unobserved)
Omega = Y .> 0

# Show mask, count proportion of missing entries
percent_nonzero = 100 * sum(Omega) / prod(size(Omega))
@show percent_nonzero
jim(Omega, title = L"$\Omega$: Locations of observed entries")
percent_nonzero = 25.0
Out[3]:

Can we use simple low-rank approximation?

In [4]:
r = 20
U,s,V = svd(Y)
Xr = U[:,1:r] * Diagonal(s[1:r]) * V[:,1:r]'
jim(Xr, title="low-rank approximation")
Out[4]:

No. Instead, we will try to uncover the identity **Professor X-Ray** using low-rank matrix completion.

The optimization problem we will solve is: minX 12||PΩ(X)−PΩ(Y)||2+β|||X|||∗(NN-min) where Y is the zero-filled input data matrix, and PΩ is the operator that extract a vector of entries belonging to the index set Ω.

In [5]:
# Define cost function
nucnorm(X) = sum(svdvals(X))
costfun(Y, Omega, X, beta) = 0.5*norm(X[Omega]-Y[Omega])^2 + beta*nucnorm(X)
beta = 0.01
costfun(X, beta) = costfun(Y, Omega, X, beta)
#costfun(X) = costfun(Y, Omega, X, beta)
Out[5]:
costfun (generic function with 2 methods)
In [6]:
# Define singular value soft thresholding (SVST)
function SVST(X,beta)
    U,s,V = svd(X)
    sthresh = max.(s .- beta, 0)
    return U * Diagonal(sthresh) * V'
end;

Iterative Soft-Thresholding Algorithm (ISTA)¶

Extension of gradient descent to convex cost functions that look like minxf(x)+g(x) where f(x) is smooth and g(x) is non-smooth. Also known as proximal gradient descent.

ISTA algorithm for solving (NN-min):
initialize X0=Y (zero-fill missing entries)
for k=0,1,2,...
[ˆXk]i,j={[Xk]i,jif (i,j)∉Ω[Y]i,jif (i,j)∈Ω      (Put back in known entries)

Xk+1=SVST(ˆXk,β)      (Singular value soft-thresholding)
end

In [7]:
# ISTA (Iterative Soft-Thresholding Algorithm)
function ista(Y, Omega; beta=beta, niter=250)
X = copy(Y)
Xold = X
t = 1
cost_ista = zeros(niter+1)
cost_ista[1] = costfun(X,beta)
for k=1:niter
    X[Omega] = Y[Omega]
    X = SVST(X,beta)
    cost_ista[k+1] = costfun(X,beta)
end
return (X, cost_ista)
end

niter = 250
(Xista, cost_ista) = ista(Y, Omega, niter=niter)

jim(Xista, title="ISTA at $niter")
Out[7]:

What went wrong? Let's investigate. First, let's see if the above solution is actually low-rank.

In [8]:
sX = svdvals(Xista)
sY = svdvals(Y)
plot(sY, title="singular values", label="svd(Y) : initialization",
    line=:red, marker=(:o, :red))
plot!(sX, label="svd(X) : ISTA output",
    line=:blue, marker=(:o, :blue))
Out[8]:

Now let's check the cost function:

In [9]:
plot(cost_ista,
    title="cost vs. iteration",
    xlabel="iteration",
    ylabel="cost function value",
    marker=:o,
    label="ISTA")
Out[9]:

Fast Iterative Soft-Thresholding Algorithm (FISTA)¶

Modification of ISTA to include Nesterov acceleration for faster convergence.

Reference:

FISTA algorithm for solving (NN-min)
initialize matrices Z0=X0=Y
for k=0,1,2,...
[ˆZk]i,j={[Zk]i,jif (i,j)∉Ω[Y]i,jif (i,j)∈Ω      (Put back in known entries)

Xk+1=SVST(ˆZk,β)

tk+1=1+√1+4t2k2     (Nesterov step-size)

Zk+1=Xk+1+tk−1tk+1(Xk+1−Xk)      (Momentum update)
end

In [10]:
# FISTA algorithm
function fista(Y, Omega; beta=0.01, niter=250)
X = copy(Y)
Z = copy(X)
Xold = copy(X)
told = 1
cost_fista = zeros(niter+1)
cost_fista[1] = costfun(X,beta)
for k=1:niter
    Z[Omega] = Y[Omega]
    X = SVST(Z,beta)
    t = (1 + sqrt(1+4*told^2))/2
    Z = X + ((told-1)/t)*(X-Xold)
    Xold = X
    told = t
    cost_fista[k+1] = costfun(X,beta) # comment out to speed-up
end
return (X, cost_fista)
end

(Xfista, cost_fista) = fista(Y, Omega)
jim(Xfista, title="FISTA")
Out[10]:
In [11]:
sX = svdvals(Xfista)
sY = svdvals(Y)

effective_rank = sum(sX .> (0.01*sX[1]))
@show effective_rank

plot(sY, title="singular values", label="svd(Y) : initialization",
    line=:red, marker=(:o, :red))
plot!(sX, label="svd(X) : FISTA output",
    line=:green, marker=(:o, :green))
effective_rank = 19
Out[11]:
In [12]:
plot(cost_ista, label="ISTA",
    line=:red, marker=(:o, :red))
plot!(cost_fista, label="FISTA",
    line=:blue, marker=(:o, :blue),
    title="cost vs. iteration",
    xlabel="iteration",
    ylabel="cost function value")
Out[12]:

Proximal Optimized Gradient Method (POGM)¶

Optimized acceleration of ISTA with a proximal version of the optimized gradient method (OGM). Similar to FISTA but has an additional term in the momentum update (in red below).

Reference:

POGM algorithm for solving (NN-min)
initialize matrices Z0=X0=Y
for k=0,1,2,...
[ˆZk]i,j={[Zk]i,jif (i,j)∉Ω[Y]i,jif (i,j)∈Ω      (Put back in known entries)

Xk+1=SVST(ˆZk,β)

tk+1=1+√1+4t2k2     (Nesterov step-size)

Zk+1=Xk+1+tk−1tk+1(Xk+1−Xk)+ tktk+1(Xk+1−Zk)      (Momentum update)
end

In [13]:
# POGM algorithm
function pogm(Y, Omega; beta=0.01, niter=250)
X = copy(Y)
Z = copy(Y)
Zold = copy(Z)
Xold = copy(X)
told = 1
cost_pogm = zeros(niter+1,1)
cost_pogm[1] = costfun(X,beta)
for k=1:niter
    Z[Omega] = Y[Omega]
    X = SVST(Z,beta)
    t = (1 + sqrt(1+4*told^2))/2
    Z = X + ((told-1)/t)*(X-Xold) + (told/t)*(X-Z)
    Zold = Z
    Xold = X
    told = t
    cost_pogm[k+1] = costfun(X,beta) # comment out to speed-up
end
return (X, cost_pogm)
end

(Xpogm, cost_pogm) = pogm(Y, Omega)
jim(Xpogm, title="POGM")
Out[13]:
In [14]:
plot(cost_ista, label="ISTA",
    line=:red, marker=(:o, :red))
plot!(cost_fista, label="FISTA",
    line=:blue, marker=(:o, :blue))
plot!(cost_pogm, label="POGM",
    line=:green, marker=(:o, :green),
    title="cost vs. iteration",
    xlabel="iteration",
    ylabel="cost function value")
Out[14]:

Alternating directions method of multipliers (ADMM) algorithm¶

Another approach that uses SVST as a sub-routine, closely related to proximal gradient decent.
Faster than FISTA but the algorithm requires a tuning parameter μ (here we use μ=β).

References:

In [15]:
# alternating directions method of multipliers (ADMM) algorithm
function admm(Y, Omega; beta=0.01, niter=250, mu=beta)
X = copy(Y)
Z = zero(X)
L = zero(X)
cost_admm = zeros(niter+1,1)
cost_admm[1] = costfun(X,beta)
for k=1:niter
    Z = SVST(X+L, beta/mu)
    X = (Y + mu*(Z-L)) ./ (mu .+ Omega)
    L = L + X - Z
    cost_admm[k+1] = costfun(X,beta) # comment out to speed-up
end
return (X, cost_admm)
end
(Xadmm, cost_admm) = admm(Y, Omega)
jim(Xadmm, title="ADMM")
Out[15]:
In [16]:
plot(cost_ista, label="ISTA",
    line=:red, marker=(:o, :red))
plot!(cost_fista, label="FISTA",
    line=:blue, marker=(:o, :blue))
plot!(cost_pogm, label="POGM",
    line=:green, marker=(:o, :green),
    title="cost vs. iteration",
    xlabel="iteration",
    ylabel="cost function value")
plot!(cost_admm, label="ADMM",
    line=:magenta, marker=(:o, :magenta))
Out[16]: