Summary of linear regression

In [1]:
versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

Methods for solving linear regression $\widehat \beta = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$:

Method Flops Remarks Software Stability
Sweep $np^2 + p^3$ $(X^TX)^{-1}$ available SAS less stable
Cholesky $np^2 + p^3/3$ less stable
QR by Householder $2np^2 - (2/3)p^3$ R stable
QR by MGS $2np^2$ $Q_1$ available stable
QR by SVD $4n^2p + 8np^2 + 9p^3$ $X = UDV^T$ most stable

Remarks:

  1. When $n \gg p$, sweep and Cholesky are twice faster than QR and need less space.
  2. Sweep and Cholesky are based on the Gram matrix $\mathbf{X}^T \mathbf{X}$, which can be dynamically updated with incoming data. They can handle huge $n$, moderate $p$ data sets that cannot fit into memory.
  3. QR methods are more stable and produce numerically more accurate solution.
  4. Although sweep is slower than Cholesky, it yields standard errors and so on.
  5. MGS appears slower than Householder, but it yields $\mathbf{Q}_1$.

There is simply no such thing as a universal 'gold standard' when it comes to algorithms.

In [2]:
using SweepOperator, BenchmarkTools, LinearAlgebra

linreg_cholesky(y::Vector, X::Matrix) = cholesky!(X'X) \ (X'y)

linreg_qr(y::Vector, X::Matrix) = X \ y

function linreg_sweep(y::Vector, X::Matrix)
    p = size(X, 2)
    xy = [X y]
    tableau = xy'xy
    sweep!(tableau, 1:p)
    return tableau[1:p, end]
end

function linreg_svd(y::Vector, X::Matrix)
    xsvd = svd(X)
    return xsvd.V * ((xsvd.U'y) ./ xsvd.S)
end
Out[2]:
linreg_svd (generic function with 1 method)
In [3]:
using Random

Random.seed!(123) # seed

n, p = 10, 3
X = randn(n, p)
y = randn(n)

# check these methods give same answer
@show linreg_cholesky(y, X)
@show linreg_qr(y, X)
@show linreg_sweep(y, X)
@show linreg_svd(y, X);
linreg_cholesky(y, X) = [0.18663443776194222, 0.3425458137678085, 0.29576444004323654]
linreg_qr(y, X) = [0.18663443776194227, 0.3425458137678087, 0.2957644400432363]
linreg_sweep(y, X) = [0.18663443776194222, 0.3425458137678085, 0.2957644400432365]
linreg_svd(y, X) = [0.18663443776194236, 0.3425458137678091, 0.2957644400432366]
In [4]:
n, p = 1000, 300
X = randn(n, p)
y = randn(n)

@benchmark linreg_cholesky(y, X)
Out[4]:
BenchmarkTools.Trial: 
  memory estimate:  708.31 KiB
  allocs estimate:  8
  --------------
  minimum time:     1.776 ms (0.00% GC)
  median time:      2.160 ms (0.00% GC)
  mean time:        2.219 ms (2.04% GC)
  maximum time:     6.197 ms (54.60% GC)
  --------------
  samples:          2243
  evals/sample:     1
In [5]:
@benchmark linreg_sweep(y, X)
Out[5]:
BenchmarkTools.Trial: 
  memory estimate:  2.99 MiB
  allocs estimate:  7
  --------------
  minimum time:     9.142 ms (0.00% GC)
  median time:      11.585 ms (0.00% GC)
  mean time:        11.756 ms (1.64% GC)
  maximum time:     15.800 ms (17.09% GC)
  --------------
  samples:          425
  evals/sample:     1
In [6]:
@benchmark linreg_qr(y, X)
Out[6]:
BenchmarkTools.Trial: 
  memory estimate:  4.04 MiB
  allocs estimate:  2444
  --------------
  minimum time:     11.211 ms (0.00% GC)
  median time:      11.652 ms (0.00% GC)
  mean time:        12.107 ms (3.30% GC)
  maximum time:     17.719 ms (24.92% GC)
  --------------
  samples:          413
  evals/sample:     1
In [7]:
@benchmark linreg_svd(y, X)
Out[7]:
BenchmarkTools.Trial: 
  memory estimate:  8.06 MiB
  allocs estimate:  16
  --------------
  minimum time:     30.408 ms (0.00% GC)
  median time:      32.282 ms (0.00% GC)
  mean time:        33.501 ms (1.60% GC)
  maximum time:     48.085 ms (6.07% GC)
  --------------
  samples:          150
  evals/sample:     1