MC3-Project-2
Contents
Updates / FAQs
Overview
In this project you will transform your regression learner into a stock trading strategy. Overall, you should follow these steps:
- Train a regression learner (KNN or LinReg, or other of your choice with or without bagging) on data from 2008 to 2009. This is your in sample training data.
- For your X values: Identify and implement at least 3 technical features that you believe may be predictive of future return. You should implement them so they output values typically ranging from -1.0 to 1.0. This will help avoid the situation where one feature overwhelms the results. See a few formulae below.
- For your Y values: Don't use price, use future 5 day return. Remember you're trying to PREDICT the future.
- Create a plot that illustrates your training Y values in one color, price in another color and your model's PREDICTED Y in a third color. With this chart we should be able to see how well your learner performs and that your Y values are shifted back 5 days. You may find it convenient to zoom in on a particular time period so this is evident.
- Create a trading policy based on what your learner predicts for future return. As an example you might choose to buy when the forecaster predicts the price will go up more than 1%, then hold for 5 days.
- Create a plot that illustrates entry and exits as vertical lines on a price chart for the in sample period 2008-2009. Show long entries as green lines, short entries as red lines and exits as black lines. You may find it convenient to zoom in on a particular time period so this is evident.
- Now use your code to generate orders and run those orders through your market simulator. Create a chart of this backtest. It should do VERY well for the in sample period 2008-2009.
- Freeze your model based on the 2008-2009 data. Now test it for the year 2010 -- Plot that illustrates entry & exits, generate trades, run through simulator, chart the backtest.
Perform the above steps first using the data ML4T-399.csv, once you've validated success (it should work well), repeat using IBM data over the same dates. Remember 2008-2009 is training, 2010 is testing. You should have one set of charts for each symbol.
Summary of Plots To Create
- Training Y/Price/Predicted Y: Create a plot that illustrates your training Y values in one color, price in another color and PREDICTED Y in a third color. With this chart we should be able to see how well your learner performs and that your Y values are shifted back 5 days. You may find it convenient to zoom in on a particular time period so this is evident.
- Sine Data In Sample Entries/Exits: Create a plot that illustrates entry and exits as vertical lines on a price chart for the in sample period 2008-2009. Show long entries as green lines, short entries as red lines and exits as black lines. You may find it convenient to zoom in on a particular time period so this is evident.
- Sine Data In Sample Backtest
- Sine Data Out of Sample Entries/Exits: Freeze your model based on the 2008-2009 data. Now test it for the year 2010 -- Plot that illustrates entry & exits, generate trades,
- Sine Data Out of Sample Backtest
- IBM Data In Sample Entries/Exits: Create a plot that illustrates entry and exits as vertical lines on a price chart for the in sample period 2008-2009. Show long entries as green lines, short entries as red lines and exits as black lines. You may find it convenient to zoom in on a particular time period so this is evident.
- IBM Data In Sample Backtest
- IBM Data Out of Sample Entries/Exits: Freeze your model based on the 2008-2009 data. Now test it for the year 2010 -- Plot that illustrates entry & exits, generate trades,
- IBM Data Out of Sample Backtest
Template and Data
You will use data in the ML4T/Data directory. In particular files named ML4T-XXX.csv, where XXX are digits.
Choosing Technical Features -- Your X Values
You should have already successfully coded the Bollinger Band feature. Here's a suggestion of how to normalize that feature so that it will typically provide values between -1.0 and 1.0:
bb_value[t] = (price[t] - SMA[t])/(2 * stdev[t])
Two other good features worth considering are momentum and volatility.
Choosing Y
Your code should predict 5 day change in price. You need to build a new Y that reflects the 5 day change and aligns with the current date. Here's pseudo code for the calculation of Y
Y[t] = (price[t+5]/price[t]) - 1.0
If you select Y in this manner and use it for training, your learner will predict 5 day returns.
Contents of Report
- Your report should be no more than 6 pages long. Use 1" margins and no smaller than 10 point font. Your report should contain no more than 8 charts. Penalties will apply if you violate these constraints.
- Include the charts listed in the overview section above.
- Describe each of the indicators you have selected in enough detail that someone else could reproduce them in code.
- Describe your trading policy clearly.
Hints & resources
What to turn in
Turn your project in via t-square.
- Your report as report.pdf
- Your code as code.py
Extra credit up to 3%
Extend your code to simultaneously forecast all the members of the S&P 500. Generate trades accordingly, and backtest the result.