There are some problems with linear regression. It can only capture first-order relationships, but when the signal to noise ratio is .05:1, then there’s not much point in worrying about that. Another problem is that it’s slow if you do it the typical way. Most people will just use Matlab’s backslash operator \, or R’s lm() function. This itself isn’t slow, in fact it’s one of the most highly optimized algorithms out there. The problem is when you’re backtesting and you have to recompute the entire linear regression at every single time step. It’s especially bad when you’re testing multiple values for parameters and have to run it a bunch of times, once for each parameter.

Because this is slow, people resort to hacks like letting the algo look into the future and calculate the regression coeffients on the whole dataset at once, or they avoid crossvalidating parameters (since it’s so slow) and instead overfit them by hand.

A better solution is to use an online algorithm for calculating the regression coefficients. Online algorithms take advantage of the fact that all but one data point (the new observation) is the same as the last time you calculated the regression. The updated regression can be expressed mathematically as combination of the old coefficients and just the single new point. If you have N data points total, this means the number of operations has gone from O(n) to O(1). Now there is no penalty in doing a fully-realistic backtest.

Another good thing about the online setting is that it’s more amenable to adaptive algorithms for machine learning. **It’s one of the most under-emphasized facts about linear regression that weighting the points correctly is crucial. **First of all you should typically weight recent points more than old points, and secondly you must weight points inversely to their volatility/variance. The first is more of a stylized fact about financial modeling. The second is a basic requirement to unbias linear regression when variables are heteroscedastic.

Here’s the difference between an exponentially weighted linear regression and one that’s not:

(This is a plot of the adaptiveness of the regression coefficients to changes in the true input-output relationship. The exponentially-weighted regression are the empty circles and the equal-weighted are the plus-filled-circles. Clearly the equal-weighted model has a lag time in reaching the true coefficients, although it is more stable.)

Online linear regression is very simple. I think it’s simpler than the typical matrix pseudo-inverse batch formulation.

## Pseudo-inverse batch formulation

1. Start with prior estimates of the covariance of the signal and returns and the variance of the signal. These don’t need to be accurate, hopefully within a factor of 10, although 0 is also an ok general prior estimate. Also choose an amount to weight new points. weight=.5 will make new points a lot more influential than weight=.05

2. When you get a new data point, update the covariance with the formula:

covariance += weight*(signal*return – covariance)

Note that we are assuming signal and return have a mean over time of 0, which we can do since returns cannot be far from mean 0.

Update the variance of the signal with the formula:

variance_signal += weight*(signal*signal – variance_signal)

3. Calculate the new regression coefficient:

coef = covariance/variance_signal

Here’s the R code that generated the plot above. It also demonstrates the equality of this custom online algo formulation and R’s canonical weighted lm() implementation. To show the effect of adaptiveness, the signal (x) is a sine function and the returns (y) are a noisy sine function that is only > 0, so for half the phase the returns are white noise and uncorrelated to the signal. (One issue with this example is that the mean of the returns are nonzero- however it still works well enough):

points = 100 window = 20 in_noise = 0.0 out_noise = 0.1 trend = as.matrix(sin((1:points)/10)) # when trend <= trend_visible_above, the y-series will be pure white noise trend_visible_above = 0 x = trend+rnorm(points, m=0, sd=in_noise) y = trend*(trend>trend_visible_above)+rnorm(points, m=0, sd=out_noise) weights = rep(1/window,window) coef_flat = sapply(1:(points-window), function(t)lm.wfit(as.matrix(x[t:(t+window-1)]), as.matrix(y[t:(t+window-1)]), weights)$coefficients[[1]]) decay = 1/2 weights = rev(cumprod(rep(decay,window))) coef_batch = sapply(1:(points-window), function(t)lm.wfit(as.matrix(x[t:(t+window-1)]), as.matrix(y[t:(t+window-1)]), weights)$coefficients[[1]]) # initialize exponential covariance and variance ecov_xy = .5 evar_x = .5 online_linreg = function(t){evar_x<<-evar_x+decay*(x[t]*x[t]-evar_x);ecov_xy<<-ecov_xy+decay*(x[t]*y[t]-ecov_xy); ecov_xy/evar_x} coef_online = sapply(1:points, online_linreg)[window:points] # true coefficients plot(coef_online, main="Exponentially-Weighted vs Flat Regression Adaptiveness", xlab="Time Index", ylab="Regression Coefficient") points(coef_flat, pch=10) # same as coef_online #points(coef_batch, pch=14) points(trend[window:points]>0, pch=20)