BIOSTAT 257: Statistical Computing

What is statistics?

  • Statistics, the science of data analysis, is the applied mathematics in the 21st century.

  • People (scientists, goverment, health professionals, companies) collect data in order to answer certain questions. Statisticians's job is to help them extract knowledge and insights from data.

  • Must-read for (bio)statistics students:

  • If existing software tools readily solve the problem, all the better.

  • Often statisticians need to implement their own methods, test new algorithms, or tailor classical methods to new types of data (big, streaming).

  • This entails at least two essential skills: programming and fundamental knowledge of algorithms.

What is this course about?

  • Not a course on statistical packages. It does not answer questions such as How to fit a linear mixed model in R, Julia, SAS, SPSS, or Stata?

  • Not a pure programming course, although programming is important and we do homework in Julia.
    BIOSTAT 203A (Data Management) in fall quarter focuses on programming in R and SAS.

  • Not a course on data science. The new course BIOSTAT 203B (Introduction to Data Science) in winter quarter focuses on some software tools for data scientists.

  • This course focuses on algorithms, mostly those in numerical linear algebra and numerical optimization.

Learning objectives

  1. Be highly appreciative of this quote by James Gentle

    The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.

    Examples: $\boldsymbol{X}^T \boldsymbol{W} \boldsymbol{X}$, $\operatorname{tr} (\boldsymbol{A} \boldsymbol{B})$, $\operatorname{diag}(\boldsymbol{A} \boldsymbol{B})$, multivariate normal density,...

  2. Become memory-conscious. You care about looping order. You do benchmarking on hot functions fanatically to make sure it's not allocating.

    Image source: https://www.independent.co.uk/news/health/memory-loss-alzheimers-disease-age-of-8-university-college-london-a9178631.html

  3. No inversion mentality. Whenever you see a matrix inverse in mathematical expression, your brain reacts with matrix decomposition, iterative solvers, etc. For R users, that means you almost never use the solve() function.

    Examples: $(\boldsymbol{X}^T \boldsymbol{X})^{-1} \boldsymbol{X}^T \mathbf{y}$, $\mathbf{y}^T \boldsymbol{\Sigma}^{-1} \mathbf{y}$, Newton-Raphson algorithm, ...

    Image source: https://www.yogajournal.com/practice/inversion-inquiry

  4. Know some basic strategies to solve big data problems.

    Examples: how Google solve the PageRank problem with $10^{9}$ webpages, linear regression with $10^7$ observations, etc.

  5. No afraid of optimizations and treat it as a technology. Be able to recognize some major optimization classes and choose the best solver(s) correspondingly.

  6. Be immune to the language fight.

Course logistics