A list of "easy" linear systems

Consider $\mathbf{A} \mathbf{x} = \mathbf{b}$, $\mathbf{A} \in \mathbb{R}^{n \times n}$. Or, consider matrix inverse (if you want). $\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems.

Don't blindly use A \ b and inv in Julia or solve function in R. Don't waste computing resources by bad choices of algorithms!

In [1]:
versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

Diagonal matrix

Diagonal $\mathbf{A}$: $n$ flops. Use Diagonal type of Julia.

In [2]:
using BenchmarkTools, LinearAlgebra, Random

# generate random data
Random.seed!(123)
n = 1000
A = diagm(0 => randn(n)) # a diagonal matrix stored as Matrix{Float64}
b = randn(n);
In [3]:
# should give link: https://github.com/JuliaLang/julia/blob/5b637df34396034b0dd353e603ab3d61322369fb/stdlib/LinearAlgebra/src/generic.jl#L956
@which A \ b
Out[3]:
\(A::AbstractArray{T,2} where T, B::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T) in LinearAlgebra at /Applications/Julia-1.4.app/Contents/Resources/julia/share/julia/stdlib/v1.4/LinearAlgebra/src/generic.jl:1103
In [4]:
# check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n))
@benchmark $A \ $b
Out[4]:
BenchmarkTools.Trial: 
  memory estimate:  15.89 KiB
  allocs estimate:  3
  --------------
  minimum time:     842.124 μs (0.00% GC)
  median time:      930.741 μs (0.00% GC)
  mean time:        945.839 μs (0.00% GC)
  maximum time:     1.864 ms (0.00% GC)
  --------------
  samples:          5267
  evals/sample:     1
In [5]:
# O(n) computation, no extra array allocation
@benchmark Diagonal($A) \ $b
Out[5]:
BenchmarkTools.Trial: 
  memory estimate:  15.89 KiB
  allocs estimate:  3
  --------------
  minimum time:     3.185 μs (0.00% GC)
  median time:      4.186 μs (0.00% GC)
  mean time:        6.320 μs (30.16% GC)
  maximum time:     829.419 μs (98.71% GC)
  --------------
  samples:          10000
  evals/sample:     8

Bidiagonal, tridiagonal, and banded matrices

Bidiagonal, tridiagonal, or banded $\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops.

In [6]:
Random.seed!(123) 

n  = 1000
dv = randn(n)
ev = randn(n - 1)
b  = randn(n) # rhs
# symmetric tridiagonal matrix
A  = SymTridiagonal(dv, ev)
Out[6]:
1000×1000 SymTridiagonal{Float64,Array{Float64,1}}:
 1.19027  1.14112     ⋅          ⋅        …    ⋅            ⋅ 
 1.14112  2.04818    0.597106    ⋅             ⋅            ⋅ 
  ⋅       0.597106   1.14265   -1.30405        ⋅            ⋅ 
  ⋅        ⋅        -1.30405    0.459416       ⋅            ⋅ 
  ⋅        ⋅          ⋅        -1.2566         ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅        …    ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅        …    ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
 ⋮                                        ⋱                
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅        …    ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅        …    ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅             ⋅            ⋅ 
  ⋅        ⋅          ⋅          ⋅            0.000636658   ⋅ 
  ⋅        ⋅          ⋅          ⋅           -1.76795      0.1559
  ⋅        ⋅          ⋅          ⋅            0.1559       0.318717
In [7]:
# convert to a full matrix
Afull = Matrix(A)

# LU decomposition (2/3) n^3 flops!
@benchmark $Afull \ $b
Out[7]:
BenchmarkTools.Trial: 
  memory estimate:  7.65 MiB
  allocs estimate:  5
  --------------
  minimum time:     11.807 ms (0.00% GC)
  median time:      14.656 ms (0.00% GC)
  mean time:        15.113 ms (3.54% GC)
  maximum time:     20.284 ms (0.00% GC)
  --------------
  samples:          331
  evals/sample:     1
In [8]:
# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark $A \ $b
Out[8]:
BenchmarkTools.Trial: 
  memory estimate:  23.86 KiB
  allocs estimate:  5
  --------------
  minimum time:     13.741 μs (0.00% GC)
  median time:      15.523 μs (0.00% GC)
  mean time:        16.682 μs (3.75% GC)
  maximum time:     6.288 ms (99.49% GC)
  --------------
  samples:          10000
  evals/sample:     1

Triangular matrix

Triangular $\mathbf{A}$: $n^2$ flops to solve linear system.

In [9]:
Random.seed!(123)

n = 1000
A = tril(randn(n, n)) # a lower-triangular matrix stored as Matrix{Float64}
b = randn(n)

# check istril() then triangular solve
@benchmark $A \ $b
Out[9]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     821.782 μs (0.00% GC)
  median time:      942.708 μs (0.00% GC)
  mean time:        963.922 μs (0.00% GC)
  maximum time:     1.721 ms (0.00% GC)
  --------------
  samples:          5165
  evals/sample:     1
In [10]:
# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular($A) \ $b
Out[10]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     564.218 μs (0.00% GC)
  median time:      635.351 μs (0.00% GC)
  mean time:        679.002 μs (0.00% GC)
  maximum time:     1.862 ms (0.00% GC)
  --------------
  samples:          7261
  evals/sample:     1

Block diagonal matrix

Block diagonal: Suppose $n = \sum_b n_b$. For linear equations, $(\sum_b n_b)^3$ (without using block diagonal structure) vs $\sum_b n_b^3$ (using block diagonal structure).

Julia has a blockdiag function that generates a sparse matrix. Anyone interested writing a BlockDiagonal.jl package?

In [11]:
using SparseArrays

Random.seed!(123)

B  = 10 # number of blocks
ni = 100
A  = blockdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)
Out[11]:
1000×1000 SparseMatrixCSC{Float64,Int64} with 956 stored entries:
  [13 ,    1]  =  -0.733961
  [8  ,    3]  =  0.459398
  [59 ,    3]  =  1.70619
  [35 ,    4]  =  0.678443
  [10 ,    5]  =  0.28718
  [3  ,    6]  =  1.06816
  [44 ,    6]  =  -0.306877
  [65 ,    6]  =  -1.92021
  [4  ,    7]  =  1.6696
  [16 ,    7]  =  -0.213558
  [46 ,    7]  =  -0.163711
  [95 ,    7]  =  -0.902986
  ⋮
  [969,  988]  =  -0.276241
  [972,  988]  =  -0.18848
  [976,  990]  =  -1.18261
  [924,  991]  =  1.49252
  [916,  992]  =  0.90305
  [921,  993]  =  0.288026
  [996,  993]  =  -0.09996
  [993,  994]  =  -0.3265
  [934,  996]  =  0.469308
  [962,  997]  =  1.05611
  [987,  998]  =  0.814419
  [997,  998]  =  0.0154335
  [972, 1000]  =  0.530357
In [12]:
using UnicodePlots
spy(A)
Out[12]:
                     Sparsity Pattern
        ┌──────────────────────────────────────────┐    
      1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ > 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ < 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
   1000⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        └──────────────────────────────────────────┘    
        1                                       1000
                         nz = 956

Kronecker product

Use $$ \begin{eqnarray*} (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\ (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\ \text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p} \end{eqnarray*} $$ to avoid forming and doing costly computation on the potentially huge Kronecker $\mathbf{A} \otimes \mathbf{B}$.

Anyone interested writing a package?

In [13]:
using MatrixDepot, LinearAlgebra

A = matrixdepot("lehmer", 50) # a pd matrix
include group.jl for user defined matrix generators
verify download of index files...
used remote site is https://sparse.tamu.edu/?per_page=All
populating internal database...
Out[13]:
50×50 Array{Float64,2}:
 1.0        0.5        0.333333   0.25       …  0.0208333  0.0204082  0.02
 0.5        1.0        0.666667   0.5           0.0416667  0.0408163  0.04
 0.333333   0.666667   1.0        0.75          0.0625     0.0612245  0.06
 0.25       0.5        0.75       1.0           0.0833333  0.0816327  0.08
 0.2        0.4        0.6        0.8           0.104167   0.102041   0.1
 0.166667   0.333333   0.5        0.666667   …  0.125      0.122449   0.12
 0.142857   0.285714   0.428571   0.571429      0.145833   0.142857   0.14
 0.125      0.25       0.375      0.5           0.166667   0.163265   0.16
 0.111111   0.222222   0.333333   0.444444      0.1875     0.183673   0.18
 0.1        0.2        0.3        0.4           0.208333   0.204082   0.2
 0.0909091  0.181818   0.272727   0.363636   …  0.229167   0.22449    0.22
 0.0833333  0.166667   0.25       0.333333      0.25       0.244898   0.24
 0.0769231  0.153846   0.230769   0.307692      0.270833   0.265306   0.26
 ⋮                                           ⋱                        
 0.025641   0.0512821  0.0769231  0.102564      0.8125     0.795918   0.78
 0.025      0.05       0.075      0.1           0.833333   0.816327   0.8
 0.0243902  0.0487805  0.0731707  0.097561   …  0.854167   0.836735   0.82
 0.0238095  0.047619   0.0714286  0.0952381     0.875      0.857143   0.84
 0.0232558  0.0465116  0.0697674  0.0930233     0.895833   0.877551   0.86
 0.0227273  0.0454545  0.0681818  0.0909091     0.916667   0.897959   0.88
 0.0222222  0.0444444  0.0666667  0.0888889     0.9375     0.918367   0.9
 0.0217391  0.0434783  0.0652174  0.0869565  …  0.958333   0.938776   0.92
 0.0212766  0.0425532  0.0638298  0.0851064     0.979167   0.959184   0.94
 0.0208333  0.0416667  0.0625     0.0833333     1.0        0.979592   0.96
 0.0204082  0.0408163  0.0612245  0.0816327     0.979592   1.0        0.98
 0.02       0.04       0.06       0.08          0.96       0.98       1.0
In [14]:
B = matrixdepot("oscillate", 100) # pd matrix
Out[14]:
100×100 Array{Float64,2}:
  0.136976      0.125884     -0.0085173    …  -9.24644e-14   5.69038e-13
  0.125884      0.549944      0.0138321        2.83263e-14  -1.74324e-13
 -0.0085173     0.0138321     0.371446        -6.08978e-14   3.74773e-13
  0.00156297   -0.00104372    0.0210066        6.39233e-13  -3.93392e-12
 -0.000208007   9.74801e-5   -0.00124548      -1.97571e-13   1.21588e-12
  0.000387773  -8.59438e-5    1.35244e-5   …   6.67036e-13  -4.10503e-12
 -0.000670469   0.000224268  -0.000903446     -5.17566e-12   3.18517e-11
  0.000266764  -8.97996e-5    0.000368829      2.08513e-12  -1.28322e-11
 -5.68863e-5    1.93692e-5   -8.39725e-5      -5.04465e-13   3.10454e-12
  1.58216e-5   -5.29662e-6    2.19197e-5       1.06314e-12  -6.54271e-12
 -3.35909e-6    1.1126e-6    -4.35495e-6   …  -4.06619e-13   2.50239e-12
  3.85339e-5   -1.19509e-5    2.91111e-5       1.73688e-11  -1.0689e-10
 -1.29657e-5    4.02036e-6   -9.77429e-6      -5.89803e-12   3.62972e-11
  ⋮                                        ⋱                
 -3.02294e-13   9.26071e-14  -1.99093e-13     -4.554e-5      0.000224659
  7.38493e-14  -2.26236e-14   4.86377e-14      2.13479e-5   -8.23636e-5
 -5.24885e-14   1.60798e-14  -3.45693e-14  …  -0.000265884   0.000640801
  2.29939e-14  -7.04415e-15   1.5144e-14       0.000137203  -0.000484487
 -3.08039e-14   9.43672e-15  -2.02877e-14     -0.000206761   0.000754033
  8.51286e-15  -2.6079e-15    5.60664e-15      0.0001034    -0.000408093
 -1.49502e-14   4.57998e-15  -9.84635e-15     -0.000495446   0.00212537
  9.87574e-14  -3.02542e-14   6.50424e-14  …   0.00421434   -0.0160006
 -1.34834e-13   4.13062e-14  -8.88028e-14     -0.0162629     0.0294735
  1.35662e-13  -4.15598e-14   8.9348e-14       0.0588982    -0.0323317
 -9.24644e-14   2.83263e-14  -6.08978e-14      0.689558      0.0885095
  5.69038e-13  -1.74324e-13   3.74773e-13      0.0885095     0.0618875
In [15]:
M = kron(A, B)
c = ones(size(M, 2)) # rhs
# Method 1: form Kronecker product and Cholesky solve
x1 = cholesky(Symmetric(M)) \ c;
In [16]:
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
m, p = size(A, 1), size(B, 1)
x2 = vec(transpose(cholesky(Symmetric(A)) \ 
    transpose(cholesky(Symmetric(B)) \ reshape(c, p, m))));
In [17]:
# relative error
norm(x1 - x2) / norm(x1)
Out[17]:
1.7488828293455635e-7
In [18]:
using BenchmarkTools

# Method 1: form Kronecker and Cholesky solve
@benchmark cholesky(Symmetric(kron($A, $B))) \ c
Out[18]:
BenchmarkTools.Trial: 
  memory estimate:  381.51 MiB
  allocs estimate:  10
  --------------
  minimum time:     633.858 ms (4.51% GC)
  median time:      708.409 ms (4.61% GC)
  mean time:        715.004 ms (6.53% GC)
  maximum time:     841.395 ms (4.19% GC)
  --------------
  samples:          8
  evals/sample:     1
In [19]:
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
@benchmark vec(transpose(cholesky(Symmetric($A)) \ 
    transpose(cholesky(Symmetric($B)) \ reshape($c, p, m))))
Out[19]:
BenchmarkTools.Trial: 
  memory estimate:  176.52 KiB
  allocs estimate:  21
  --------------
  minimum time:     182.463 μs (0.00% GC)
  median time:      328.807 μs (0.00% GC)
  mean time:        381.279 μs (8.04% GC)
  maximum time:     57.233 ms (77.50% GC)
  --------------
  samples:          10000
  evals/sample:     1

Sparse matrix

Sparsity: sparse matrix decomposition or iterative method.

  • The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (Matrix package), MKL (sparse BLAS), ... as much as possible.
In [20]:
using MatrixDepot

Random.seed!(123)

# a 7701-by-7701 sparse pd matrix
A = matrixdepot("wathen", 50)
# random generated rhs
b = randn(size(A, 1))
Afull = Matrix(A)
count(!iszero, A) / length(A) # sparsity
Out[20]:
0.001994776158751544
In [21]:
using UnicodePlots
spy(A)
Out[21]:
                     Sparsity Pattern
        ┌──────────────────────────────────────────┐    
      1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ > 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ < 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
   7701⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        └──────────────────────────────────────────┘    
        1                                       7701
                        nz = 118301

Matrix-vector multiplication

In [22]:
# dense matrix-vector multiplication
@benchmark $Afull * $b
Out[22]:
BenchmarkTools.Trial: 
  memory estimate:  60.27 KiB
  allocs estimate:  2
  --------------
  minimum time:     24.849 ms (0.00% GC)
  median time:      32.017 ms (0.00% GC)
  mean time:        30.544 ms (0.00% GC)
  maximum time:     35.428 ms (0.00% GC)
  --------------
  samples:          164
  evals/sample:     1
In [23]:
# sparse matrix-vector multiplication
@benchmark $A * $b
Out[23]:
BenchmarkTools.Trial: 
  memory estimate:  60.27 KiB
  allocs estimate:  2
  --------------
  minimum time:     126.764 μs (0.00% GC)
  median time:      132.654 μs (0.00% GC)
  mean time:        145.924 μs (3.68% GC)
  maximum time:     53.924 ms (99.56% GC)
  --------------
  samples:          10000
  evals/sample:     1

Solve linear equation

In [24]:
# solve via dense Cholesky
xchol = cholesky(Symmetric(Afull)) \ b
@benchmark cholesky($(Symmetric(Afull))) \ $b
Out[24]:
BenchmarkTools.Trial: 
  memory estimate:  452.52 MiB
  allocs estimate:  8
  --------------
  minimum time:     2.064 s (0.04% GC)
  median time:      2.093 s (0.03% GC)
  mean time:        2.108 s (0.50% GC)
  maximum time:     2.168 s (0.00% GC)
  --------------
  samples:          3
  evals/sample:     1
In [25]:
# solve via sparse Cholesky
xcholsp = cholesky(Symmetric(A)) \ b
@show norm(xchol - xcholsp)
@benchmark cholesky($(Symmetric(A))) \ $b
norm(xchol - xcholsp) = 3.0502824637644926e-15
Out[25]:
BenchmarkTools.Trial: 
  memory estimate:  12.43 MiB
  allocs estimate:  50
  --------------
  minimum time:     15.376 ms (0.00% GC)
  median time:      17.642 ms (0.00% GC)
  mean time:        17.967 ms (0.19% GC)
  maximum time:     32.246 ms (3.94% GC)
  --------------
  samples:          279
  evals/sample:     1
In [26]:
# sparse solve via conjugate gradient
using IterativeSolvers

xcg = cg(A, b)
@show norm(xcg - xchol)
@benchmark cg($A, $b)
norm(xcg - xchol) = 1.717983907483646e-7
Out[26]:
BenchmarkTools.Trial: 
  memory estimate:  302.14 KiB
  allocs estimate:  19
  --------------
  minimum time:     20.973 ms (0.00% GC)
  median time:      25.313 ms (0.00% GC)
  mean time:        25.297 ms (0.00% GC)
  maximum time:     30.163 ms (0.00% GC)
  --------------
  samples:          198
  evals/sample:     1

Easy plus low rank

Easy plus low rank: $\mathbf{U} \in \mathbb{R}^{n \times r}$, $\mathbf{V} \in \mathbb{R}^{r \times n}$, $r \ll n$. Woodbury formula \begin{eqnarray*} (\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\ \text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T). \end{eqnarray*}

  • Keep HW2 Q2 (multivariate density) and PageRank problem in mind.
  • WoodburyMatrices.jl package can be useful.
In [27]:
using BenchmarkTools, Random, WoodburyMatrices

Random.seed!(123)
n = 1000
r = 5

A = Diagonal(rand(n))
B = randn(n, r)
D = Diagonal(rand(r))
b = randn(n)
# Woodbury structure: W = A + B * D * B'
W = SymWoodbury(A, B, D)
Wfull = Matrix(W) # stored as a Matrix{Float64}
Out[27]:
1000×1000 Array{Float64,2}:
  1.6454      0.0835576  -0.0402228  …  -0.134643   -0.206236   0.49181
  0.0835576   1.46795    -1.07751       -0.144827    0.699883   0.95513
 -0.0402228  -1.07751     3.30195        0.404634   -1.98854   -2.27688
  0.242022    0.639791   -1.53635       -0.273162    1.0991     1.44739
  0.261808   -0.25187     0.704092       0.144494   -0.473455  -0.342117
 -0.114875    0.0392933   0.200295   …   0.0994648  -0.229968  -0.273848
 -0.0707279  -0.0883619   0.325561       0.246623    0.24291   -0.111743
 -0.558469   -0.239546    0.507136       0.041764   -0.438303  -0.992219
 -0.0490726  -0.55282     1.41105        0.099443   -1.51945   -1.40719
  0.760058   -0.0817622   0.245345      -0.0118199  -0.261812   0.38403
 -0.0930756  -0.169087    0.0151538  …  -0.126792   -0.170628  -0.0789466
  0.454723   -0.244737    0.158403      -0.166351   -0.122847   0.0890611
  0.758661    0.180103   -0.549708      -0.115611    0.661376   1.06463
  ⋮                                  ⋱                         
 -0.0459214  -0.728622    1.53889        0.0755688  -1.35929   -1.54189
  0.0709898  -0.496748    1.07023        0.100244   -0.690642  -0.990374
 -0.556211   -0.638137    1.78858    …   0.281981   -1.58692   -2.10545
  0.113368   -0.708252    2.14832        0.239755   -2.07364   -2.09501
 -0.454283   -0.825847    2.01846        0.344052   -1.56463   -2.04137
 -0.40376    -0.073023    0.0912159      0.152356    0.27367   -0.16808
 -0.741087    0.821352   -1.94822       -0.0980451   1.81808    1.31641
 -0.0353531   0.486034   -1.40201    …  -0.187383    1.23985    1.32581
  0.147273    0.0655192  -0.386165      -0.153257    0.120052   0.470425
 -0.134643   -0.144827    0.404634       0.721565   -0.133929  -0.281184
 -0.206236    0.699883   -1.98854       -0.133929    2.81572    1.85812
  0.49181     0.95513    -2.27688       -0.281184    1.85812    2.74306
In [28]:
# compares storage
Base.summarysize(W), Base.summarysize(Wfull)
Out[28]:
(64720, 8000040)

Solve linear equation

In [29]:
# solve via Cholesky
@benchmark cholesky($(Symmetric(Wfull))) \ $b
Out[29]:
BenchmarkTools.Trial: 
  memory estimate:  7.64 MiB
  allocs estimate:  7
  --------------
  minimum time:     6.697 ms (0.00% GC)
  median time:      10.184 ms (0.00% GC)
  mean time:        10.321 ms (7.94% GC)
  maximum time:     51.829 ms (79.15% GC)
  --------------
  samples:          484
  evals/sample:     1
In [30]:
# solve using Woodbury formula
@benchmark $W \ reshape($b, n, 1) # hack; need to file an issue
Out[30]:
BenchmarkTools.Trial: 
  memory estimate:  32.09 KiB
  allocs estimate:  8
  --------------
  minimum time:     12.079 μs (0.00% GC)
  median time:      22.482 μs (0.00% GC)
  mean time:        21.143 μs (0.00% GC)
  maximum time:     119.704 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Matrix-vector multiplication

In [31]:
# multiplication without using Woodbury structure
@benchmark $Wfull * $b
Out[31]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     88.489 μs (0.00% GC)
  median time:      114.167 μs (0.00% GC)
  mean time:        117.734 μs (0.00% GC)
  maximum time:     393.383 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
In [32]:
# multiplication using Woodbury structure
@benchmark $W * $b
Out[32]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     2.798 μs (0.00% GC)
  median time:      5.304 μs (0.00% GC)
  mean time:        5.641 μs (10.99% GC)
  maximum time:     6.209 ms (99.82% GC)
  --------------
  samples:          10000
  evals/sample:     9

Determinant

In [33]:
# determinant without using Woodbury structure
@benchmark det($Wfull)
Out[33]:
BenchmarkTools.Trial: 
  memory estimate:  7.64 MiB
  allocs estimate:  4
  --------------
  minimum time:     9.539 ms (0.00% GC)
  median time:      15.878 ms (0.00% GC)
  mean time:        15.966 ms (5.72% GC)
  maximum time:     73.556 ms (75.18% GC)
  --------------
  samples:          313
  evals/sample:     1
In [34]:
# determinant using Woodbury structure
@benchmark det($W)
Out[34]:
BenchmarkTools.Trial: 
  memory estimate:  448 bytes
  allocs estimate:  3
  --------------
  minimum time:     462.359 ns (0.00% GC)
  median time:      505.503 ns (0.00% GC)
  mean time:        603.885 ns (8.45% GC)
  maximum time:     258.241 μs (99.74% GC)
  --------------
  samples:          10000
  evals/sample:     195

Easy plus border

Easy plus border: For $\mathbf{A}$ pd and $\mathbf{V}$ full row rank, $$ \begin{pmatrix} \mathbf{A} & \mathbf{V}^T \\ \mathbf{V} & \mathbf{0} \end{pmatrix}^{-1} = \begin{pmatrix} \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\ (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \end{pmatrix}. $$ Anyone interested writing a package?

Orthogonal matrix

Orthogonal $\mathbf{A}$: $n^2$ flops at most. Why? Permutation matrix, Householder matrix, Jacobi matrix, ... take less.

Toeplitz matrix

Toeplitz systems (constant diagonals): $$ \mathbf{T} = \begin{pmatrix} r_0 & r_1 & r_2 & r_3 \\ r_{-1} & r_0 & r_1 & r_2 \\ r_{-2} & r_{-1} & r_0 & r_1 \\ r_{-3} & r_{-2} & r_{-1} & r_0 \end{pmatrix}. $$ $\mathbf{T} \mathbf{x} = \mathbf{b}$, where $\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.

Circulant matrix

Circulant systems: Toeplitz matrix with wraparound $$ C(\mathbf{z}) = \begin{pmatrix} z_0 & z_4 & z_3 & z_2 & z_1 \\ z_1 & z_0 & z_4 & z_3 & z_2 \\ z_2 & z_1 & z_0 & z_4 & z_3 \\ z_3 & z_2 & z_1 & z_0 & z_4 \\ z_4 & z_3 & z_2 & z_1 & z_0 \end{pmatrix}, $$ FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).

Vandermonde matrix

Vandermonde matrix: such as in interpolation and approximation problems $$ \mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_0 & x_1 & \cdots & x_n \\ \vdots & \vdots & & \vdots \\ x_0^n & x_1^n & \cdots & x_n^n \end{pmatrix}. $$ $\mathbf{V} \mathbf{x} = \mathbf{b}$ or $\mathbf{V}^T \mathbf{x} = \mathbf{b}$ can be solved in $O(n^2)$ flops.

Cauchy-like matrix

Cauchy-like matrices: $$ \Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T, $$ where $\Omega = \text{diag}(\omega_1,\ldots,\omega_n)$ and $\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)$. $O(n)$ flops for LU and QR.

Structured-rank matrix

Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ...