Classification Trader Hints
Contents
Overview
You will utilize your Random Tree learner to train and test a learning trading algorithm. Here are some ideas (gathered from a previous project) that you might find helpful if you are going to use a classification or regression learner for your trader.
Q & A
- Q: In a previous project there was a constraint of holding a single position until exit. Does that apply to this project? Yes, hold one position til exit.
- Q: Is that 21 calendar days, or 21 trading days (i.e., days when SPY was traded)? A: Always use trading days.
- Q: Are there constraints for Python modules allowed for this project? Can we experiment with modules for optimization or technical analysis and cite or are we expected to write everything from scratch for this project as well? A: The constraints are the same as for the first learning project. You've already written the learners you need.
- Q: Are we required to trade in only 200 share blocks? (and have no more than 200 shares long or short at a time as in some of the previous assignments) A: (update). You can trade up to 400 shares at a time as long as you maintain the requirement of 200, 0 or -200 shares. This will enable comparison between results more easily.
- Q: Are we limited to leverage of 2.0 on the portfolio? A: There is no limit on leverage.
- Q: Are we only allowed one position at a time? A: You can be in one of three states: -200 shares, +200 shares, 0 shares.
Technical Indicators
Develop and describe at least 3 and at most 5 technical indicators. You may find our lecture on time series processing to be helpful. To check your work, for each indicator you should create a single chart that shows the price history of the stock during the in-sample period, "helper data" and the value of the indicator itself. As an example, if you were using price/SMA as an indicator you would want to create a chart with 3 lines: Price, SMA, Price/SMA. In order to facilitate visualization of the indicator you can normalize the data to 1.0 at the start of the date range (i.e. divide price[t] by price[0]).
You should "standardize" or "normalize" your indicators so that they have zero mean and standard deviation 1.0 One way to do this is the standard score transformation as described here: https://en.wikipedia.org/wiki/Standard_score . This transformation will help ensure that all of your indicators are considered with equal importance by your learner.
Best Possible Strategy
In order to make sure your results are within the realm of possibility, try this.
Assume that you can see the future, but that you are constrained by the portfolio size and order limits as specified above. Create a set of trades that represents the best a strategy could possibly do during the in sample period. The holding time requirements described in the next sections do not apply to this exercise. The reason we're having you do this is so that you will have an idea of an upper bound on performance.
The intent is for you to use adjusted close prices with the market simulator that you wrote earlier in the course.
Create a chart that reports:
- Benchmark (see definition above) normalized to 1.0 at the start: Black line
- Value of the best possible portfolio (normalized to 1.0 at the start): Blue line
Manual Rule-Based Trader
As you ramp up for your learning trader, try this exercise.
Devise a set of rules using the indicators you created above. Your rules should be designed to trigger a "long" or "short" entry for a 21 trading day hold. In other words, once an entry is initiated, you must remain in the position for 21 trading days.
You should tweak your rules as best you can to get the best performance possible during the in sample period (do not peek at out of sample performance). Use your rule-based strategy to generate an orders file over the in sample period, then run that file through your market simulator to create a chart that includes the following components over the in sample period:
- Benchmark (see definition above) normalized to 1.0 at the start: Black line
- Value of the rule-based portfolio (normalized to 1.0 at the start): Blue line
- Vertical green lines indicating LONG entry points.
- Vertical red lines indicating SHORT entry points.
Note that each red or green vertical line should be at least 21 days from the preceding line. We expect that your rule-based strategy should outperform the benchmark over the in sample period.
ML Trader
Convert your decision tree regression learner into a classification learner. The classifications should be:
- +1: LONG
- 0: DO NOTHING
- -1: SHORT
The X data for each sample (day) are simply the values of your indicators for the stock -- you should have 3 to 5 of them. The Y data (or classifications) will be based on 21 day return. You should classify the example as a +1 or "LONG" if the 21 day return exceeds a certain value, let's call it YBUY for the moment. You should classify the example as a -1 or "SHORT" if the 21 day return is below a certain value we'll call YSELL. In all other cases the sample should be classified as a 0 or "DO NOTHING." Note that it is very important that you train your learner with these classification values (not the 21 day returns).
Note that your X values are calculated each day from the current day's (and earlier) data, but the Y value (classification) is calculated using data from the future. You may tweak various parameters of your learner to maximize return (more on that below). Train and test your learning strategy over the in sample period.
Important note: You must set the leaf_size parameter of your decision tree learner to 5 or larger. This requirement is intended to avoid a degenerate overfit solution to this problem.
Use your ML-based strategy to generate an orders file over the in sample period, then run that file through your market simulator to create a chart that includes the following components over the in sample period:
- Benchmark (see definition above) normalized to 1.0 at the start: Black line
- Value of the rule-based portfolio (normalized to 1.0 at the start): Blue line.
- Value of the ML-based portfolio (normalized to 1.0 at the start): Green line.
- Vertical green lines indicating LONG entry points.
- Vertical red lines indicating SHORT entry points.
We expect that the ML-based strategy will outperform the manual strategy, however it is possible that it does not. If it is the case that your manual strategy does better, you should try to explain why in your report.
You should tweak the parameters of your learner to maximize performance during the in sample period. Here is a partial list of things you can tweak:
- Adjust YSELL and YBUY.
- Adjust leaf_size.
- Utilize bagging and adjust the number of bags.
Hints
Overall, I recommend the following steps in the creation of your strategies:
- Indicator design hints:
- For your X values: Identify and implement at least 3 technical features that you believe may be predictive of future return.
- Rule based design:
- Use a cascade of if statements conditioned on the indicators to identify whether a LONG condition is met.
- Use a cascade of if statements conditioned on the indicators to identify whether a SHORT condition is met.
- The conditions for LONG and SHORT should be mutually exclusive.
- If neither LONG or SHORT is triggered, the result should be DO NOTHING.
- For debugging purposes, you may find it helpful to plot the value of the rule-based output (-1, 0, 1) versus the stock price.
- Train a classification learner on in sample training data:
- For your Y values: Use future 21 day return (not future price). Then classify that return as LONG, SHORT or DO NOTHING. You're trying to predict a relative change that you can use to invest with.
- For debugging purposes, you may find it helpful to plot the value of the training classification data (-1, 0, 1) versus the stock price in one color.
- For debugging purposes, you may find it helpful to plot the value of the training classification output (-1, 0, 1) versus the stock price in another color. Ideally, these two lines should be very similar.
Choosing Technical Features -- Your X Values
You should have already successfully coded the Bollinger Band feature:
bb_value[t] = (price[t] - SMA[t])/(stdev[t])
Two other good features worth considering are momentum and volatility.
momentum[t] = (price[t]/price[t-N]) - 1
Volatility is just the stdev of daily returns.
You still need to standardize the resulting values.
Choosing Y
Your code should classify based on 21 day change in price. You need to build a new Y that reflects the 21 day change and aligns with the current date. Here's pseudo code for the calculation of Y
ret = (price[t+21]/price[t]) - 1.0 if ret > YBUY: Y[t] = +1 # LONG else if ret < YSELL: Y[t] = -1 # SHORT else: Y[t] = 0
If you select Y in this manner and use it for training, your learner will classify 21 day returns.