Machine Learning Algorithms for Trading

Lesson 1: How Machine Learning is used at a hedge fund

introduce problem early
Overview of use and backtesting
- Out of sample
- Roll forward cross validation
Methods
- Linear regression
- KNN regression
- Decision trees Random Forest regression (considering to drop)
Quiz: which algorithm makes most sense here?
Supervised ML (intent is that the treatment here is light)
- Use: Regression
- Use: Classification
- Model type: Parametric
- Model type: Instance-based
Quiz: What's the next point?
Problems with regression for finance
- Hint at reinforcement learning
Introduce the problem we will focus on in the rest of the class, namely:
- Example data, will learn on over a particular year (2012)
- Will test on over the next two years (2013 2014)
- It will be "easy" data that has obvious patterns
- You will create trades.txt and run them through your backtester

[note: need to create fake stock data that has embedded patterns]

Overview of how it fits into overall trading process
Definition of the problem 1
- Black box diagram
- training: Xtrain, Ytrain
- using: Query with X
Definition of the problem 2: APIs
- constructor
- addEvidence(X,Y)
- query(X)
How to implement linear regression

Now that we have two, (linreg & KNN), let's compare them
- Pros and cons of LinReg versus KNN
Cross validation,
roll forward cross validation
- Use all data versus most recent data
- Online learning
How long to take to learn versus query
Batch versus online
RMS error
Scatterplot predict vs actual
Corrcoef
Overfitting