I had gotten there by a long search that had gone from machine learning, to fast Kalman filters, to Bayesian conjugate linear regression, to representing uncertainty in the covariance using an inverse Wishart prior, to making it time-varying, and allowing heteroschedasticity. I was thinking whith my koukio friends and this paper had all the pieces of an algorithm I think is suited for forecasting intraday returns from signals.

## The features I like about imachine learning and the algotithm:

- Analytic: does not require sampling which is very slow. Just uses a couple of matrix multiplications.
- Online: updates parameter values at every single timestep. Offline/batch learning algorithms are typically too slow to retrain at every timestep in a backtest, which forces you to make sacrifices.
- Multivariate: can’t do without this.
- Adaptive regression coefficients: signals are weighted higher or lower depending on recent performance.
- Completely forgetful: every part is adaptive, unlike some learners that are partially adaptive and partially sticky.
- Adaptive variance: standard linear regression is biased if the inputs are heteroschedastic unless you reweight input points by the inverse of their variance.
- Adaptive input correlations: in case signals become collinear.
- Estimates prediction error: outputs the estimated variance of the prediction so you can choose how much you want to trust it. This estimate is interpretable unlike some of the heuristic approaches to extracting confidence intervals from other learners.
- Interpretable internals: every internal variable is understandable. Looking at the internals clearly explains what the model has learned.
- Uni- or multi-variate: specializes or generalizes naturally. The only modification required is to increase or decrease the number of parameters, the values don’t need to be changed. 10 input variables works as well as 2 which works as well as 1.
- Interpretable parameters: parameters can easily be set based on a-priori knowledge, not exclusively by crossvalidation. The author re-parameterized the model to make it as intuitive as possible – the parameters are basically exponential discount factors.
- Minimal parameters: it has just the right number of parameters to constitute a useful family but no extra parameters that are rarely useful or unintuitive.
- Objective priors: incorrect starting values for internal variables do not bias predictions for a long “burn-in” period.
- Emphasis on first-order effects: alpha is a first-order effect. Higher order effects are nice but alpha is so hard to detect that they are not worth the extra parameters and wasted datapoints.
- Bayesian: with a Bayesian model you understand all the assumptions going into your model.