Difference between revisions of "Classification Trader Hints"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
Line 9: Line 9:
  
 
* +1: LONG
 
* +1: LONG
* 0: DO NOTHING
+
* 0: CASH
 
* -1: SHORT
 
* -1: SHORT
  
The X data for each sample (day) are simply the values of your indicators for the stock -- you should have 3 to 5 of them.  The Y data (or classifications) will be based on 21 day return.  You should classify the example as a +1 or "LONG" if the 21 day return exceeds a certain value, let's call it YBUY for the moment.  You should classify the example as a -1 or "SHORT" if the 21 day return is below a certain value we'll call YSELL.  In all other cases the sample should be classified as a 0 or "DO NOTHING."  Note that it is very important that you train your learner with these classification values (not the 21 day returns).   
+
The X data for each sample (day) are simply the values of your indicators for the stock -- you should have 3 to 5 of them.  The Y data (or classifications) will be based on N day return (your choice for N).  You should classify the example as a +1 or "LONG" if the N day return exceeds a certain value, let's call it YBUY for the moment.  You should classify the example as a -1 or "SHORT" if the N day return is below a certain value we'll call YSELL.  In all other cases the sample should be classified as a 0 or "CASH."  Note that it is very important that you train your learner with these classification values (not the N day returns).   
  
 
Note that your X values are calculated each day from the current day's (and earlier) data, but the Y value (classification) is calculated using data from the future.  You may tweak various parameters of your learner to maximize return (more on that below).  Train and test your learning strategy over the in sample period.  
 
Note that your X values are calculated each day from the current day's (and earlier) data, but the Y value (classification) is calculated using data from the future.  You may tweak various parameters of your learner to maximize return (more on that below).  Train and test your learning strategy over the in sample period.  
  
 
'''Important note:''' You must set the leaf_size parameter of your decision tree learner to 5 or larger.  This requirement is intended to avoid a degenerate overfit solution to this problem.
 
'''Important note:''' You must set the leaf_size parameter of your decision tree learner to 5 or larger.  This requirement is intended to avoid a degenerate overfit solution to this problem.
 
Use your ML-based strategy to generate an orders file over the in sample period, then run that file through your market simulator to create a chart that includes the following components over the in sample period:
 
 
* Benchmark (see definition above) normalized to 1.0 at the start: Black line
 
* Value of the rule-based portfolio (normalized to 1.0 at the start): Blue line.
 
* Value of the ML-based portfolio (normalized to 1.0 at the start): Green line.
 
* Vertical green lines indicating LONG entry points.
 
* Vertical red lines indicating SHORT entry points.
 
 
We expect that the ML-based strategy will outperform the manual strategy, however it is possible that it does not.  If it is the case that your manual strategy does better, you should try to explain why in your report.
 
  
 
You should tweak the parameters of your learner to maximize performance during the in sample period.  Here is a partial list of things you can tweak:
 
You should tweak the parameters of your learner to maximize performance during the in sample period.  Here is a partial list of things you can tweak:

Revision as of 23:54, 21 November 2017

Overview

You will utilize your Random Tree learner to train and test a learning trading algorithm. Here are some ideas (gathered from a previous project) that you might find helpful if you are going to use a classification or regression learner for your trader.

ML Trader

Convert your decision tree regression learner into a classification learner. The classifications should be:

  • +1: LONG
  • 0: CASH
  • -1: SHORT

The X data for each sample (day) are simply the values of your indicators for the stock -- you should have 3 to 5 of them. The Y data (or classifications) will be based on N day return (your choice for N). You should classify the example as a +1 or "LONG" if the N day return exceeds a certain value, let's call it YBUY for the moment. You should classify the example as a -1 or "SHORT" if the N day return is below a certain value we'll call YSELL. In all other cases the sample should be classified as a 0 or "CASH." Note that it is very important that you train your learner with these classification values (not the N day returns).

Note that your X values are calculated each day from the current day's (and earlier) data, but the Y value (classification) is calculated using data from the future. You may tweak various parameters of your learner to maximize return (more on that below). Train and test your learning strategy over the in sample period.

Important note: You must set the leaf_size parameter of your decision tree learner to 5 or larger. This requirement is intended to avoid a degenerate overfit solution to this problem.

You should tweak the parameters of your learner to maximize performance during the in sample period. Here is a partial list of things you can tweak:

  • Adjust YSELL and YBUY.
  • Adjust leaf_size.
  • Utilize bagging and adjust the number of bags.

Hints

Overall, I recommend the following steps in the creation of your strategies:

  • Indicator design hints:
    • For your X values: Identify and implement at least 3 technical features that you believe may be predictive of future return.
  • Rule based design:
    • Use a cascade of if statements conditioned on the indicators to identify whether a LONG condition is met.
    • Use a cascade of if statements conditioned on the indicators to identify whether a SHORT condition is met.
    • The conditions for LONG and SHORT should be mutually exclusive.
    • If neither LONG or SHORT is triggered, the result should be DO NOTHING.
    • For debugging purposes, you may find it helpful to plot the value of the rule-based output (-1, 0, 1) versus the stock price.
  • Train a classification learner on in sample training data:
    • For your Y values: Use future 21 day return (not future price). Then classify that return as LONG, SHORT or DO NOTHING. You're trying to predict a relative change that you can use to invest with.
    • For debugging purposes, you may find it helpful to plot the value of the training classification data (-1, 0, 1) versus the stock price in one color.
    • For debugging purposes, you may find it helpful to plot the value of the training classification output (-1, 0, 1) versus the stock price in another color. Ideally, these two lines should be very similar.

Choosing Technical Features -- Your X Values

You should have already successfully coded the Bollinger Band feature:

bb_value[t] = (price[t] - SMA[t])/(stdev[t])

Two other good features worth considering are momentum and volatility.

momentum[t] = (price[t]/price[t-N]) - 1

Volatility is just the stdev of daily returns.

You still need to standardize the resulting values.

Choosing Y

Your code should classify based on 21 day change in price. You need to build a new Y that reflects the 21 day change and aligns with the current date. Here's pseudo code for the calculation of Y

ret = (price[t+21]/price[t]) - 1.0
if ret > YBUY:
    Y[t] = +1 # LONG
else if ret < YSELL:
    Y[t] = -1 # SHORT
else:
    Y[t] = 0

If you select Y in this manner and use it for training, your learner will classify 21 day returns.

Legacy