Difference between revisions of "MC3-Project-3"

Revision as of 13:15, 31 October 2016

1 Updates / FAQs
2 Overview
3 Data Details, Dates and Rules
4 Part 1: Technical Indicators (20%)
5 Part 2: Manual Rule-Based Trader (30%)
6 Part 3: ML Trader (30%)
7 Part 4: Comparative Analysis (20%)
8 Hints
9 Template and Data
10 Choosing Technical Features -- Your X Values
11 Choosing Y
12 Contents of Report
13 Expectations
14 What to turn in
15 Rubric
16 Required, Allowed & Prohibited
17 Legacy

Updates / FAQs

2016-10-31 Project finalized.

Q: In a previous project there was a constraint of holding a single position until exit. Does that apply to this project? Yes, hold one position til exit.

Q: Is that 10 calendar days, or 10 trading days (i.e., days when SPY was traded)? A: Always use trading days.

Q: Are there constraints for Python modules allowed for this project? Can we experiment with modules for optimization or technical analysis and cite or are we expected to write everything from scratch for this project as well? A: The constraints are the same as for the first learning project. You've already written the learners you need.

Q: I want to read some other values from the data besides just adjusted close, how can I do that? A: Please modify an old version of util.py to do that, include that new util.py with your submission.

Q: Are we required to trade in only 500 share blocks? (and have no more than 500 shares long or short at a time as in some of the previous assignments) A: Yes. This will enable comparison between results more easily.

Q: Are we limited to leverage of 2.0 on the portfolio? A: There is no limit on leverage.

Q: Are we only allowed one position at a time? A: You can be in one of three states: -500 shares, +500 shares, 0 shares.

Overview

In this project you will develop trading strategies using Technical Analysis, and test them using your market simulator. You will then utilize your Random Tree learner to train and test a learning trading algorithm.

Part 1: Develop and describe a set of at least 3 technical indicators. At least one of these indicators must be substantially different from the indicators whose code was presented in class.
Part 2: Devise and test a rule-based trading strategy using your indicators from Part 1. Test its performance in sample using your market simulator.
Part 3: Use you decision tree learner to create a classifier that decides when to trade. Test its performance in sample using your market simulator.
Part 4: Comparative analysis.

In this project we shift from an auto graded format to a report format. For this project your grade will be based on the PDF report you submit, not your code. However, you will also submit your code that will be checked visually to ensure it appropriately matches the report you submit.

Data Details, Dates and Rules

Use the following parameters for Part 2, 3 and 4:

Use only the data provided for this course. You are not allowed to import external data.
Trade only the symbol IBM (however, you may, if you like, use data from other symbols to inform your strategy).
The in sample/training period is January 1, 2006 to December 31 2009.
The out of sample/testing period is January 1, 2010 to December 31 2010.
Starting cash is $100,000.
Allowable positions are: 500 shares long, 500 shares short, 0 shares.
There is no limit on leverage.

Part 1: Technical Indicators (20%)

Develop and describe at least 3 and at most 5 technical indicators. You may find our lecture on time series processing to be helpful. For each indicator you should create a single chart that shows the price history of the stock during the in-sample period, "helper data" and the value of the indicator itself. As an example, if you were using price/SMA as an indicator you would want to create a chart with 3 lines: Price, SMA, Price/SMA

Your report description of each indicator should enable someone to reproduce it just by reading the description. We want a written description here, not code, however, it is OK to augment your written description with a pseudocode figure.

At least one of the indicators you use should be completely different from the ones presented in our lectures.

Deliverables:

Descriptive text (2 to 3 pages with figures).
3 to 5 charts (one for each indicator)
Code: indicators.py

Part 2: Manual Rule-Based Trader (30%)

Devise a set of rules using the indicators you created in Part 1 above. Your rules should be designed to trigger a "long" or "short" entry for a 10 trading day hold. In other words, once an entry is initiated, you must remain in the position for 10 trading days. In your report you must describe your trading rules so that another person could implement them based only on your description. We want a written description here, not code, however, it is OK to augment your written description with a pseudocode figure.

You should tweak your rules as best you can to get the best performance possible from during the in sample period (do not peak at out of sample performance). Use your rule-based strategy to generate an orders file over the in sample period, then run that file through your market simulator to create a chart that includes the following components over the in sample period:

Price of IBM (normalized to 1.0 at the start): Black line
Value of the rule-based portfolio (normalized to 1.0 at the start): Blue line
Vertical green lines indicating LONG entry points.
Vertical red lines indicating SHORT entry points.
Vertical black lines indicating exits (long or short).

Note that each red or green vertical line should be followed by a black line before another entry occurs. We will check for that. We expect that your rule-based strategy should outperform the stock IBM over the in sample period.

Deliverables:

Descriptive text (1 or 2 pages with chart) that provides a compelling justification for rule-based system developed.
Text must describe rule based system in sufficient detail that another person could implement it.
1 chart.
Code: rule_based.py (generates an orders file)

Part 3: ML Trader (30%)

Convert your decision tree regression learner into a classification learner. The classifications should be:

+1: BUY
0: DO NOTHING
-1: SELL

The X data for each sample (day) are simply the values of your indicators for the stock -- you should have 3 to 5 of them. The Y data (or classifications) will be based on 10 day return. You should classify the example as a +1 or "BUY" if the 10 day return exceeds a certain value, let's call it YBUY for the moment. You should classify the example as a -1 or "SELL" if the 10 day return is below a certain value we'll call YSELL. In all other cases the sample should be classified as a 0 or "DO NOTHING."

Note that your X values are calculated each day from the current day's (and earlier) data, but the Y value is calculated using data from the future. You may tweak various parameters of your learner to maximize return (more on that below). Train and test your learning strategy over the in sample period. Whenever a BUY or SELL is encountered, you must enter the corresponding position and hold it for 10 days. That means, for instance, that if you encounter a BUY on day 1, then a SELL on day 2, you must keep the stock still until the 10 days expire, even though you received this conflicting information. The reason for this is that we're trying to provide a way to directly compare the manual strategy versus the ML strategy.

Use your ML-based strategy to generate an orders file over the in sample period, then run that file through your market simulator to create a chart that includes the following components over the in sample period:

Price of IBM (normalized to 1.0 at the start): Black line.
Value of the rule-based portfolio (normalized to 1.0 at the start): Blue line.
Value of the ML-based portfolio (normalized to 1.0 at the start): Green line.
Vertical green lines indicating LONG entry points.
Vertical red lines indicating SHORT entry points.
Vertical black lines indicating exits (long or short).

Note that each red or green vertical line should be followed by a black line before another entry occurs. We will check for that. We expect that the ML-based strategy will outperform the manual strategy, however it is possible that it does not. If it is the case that your manual strategy does better, you should try to explain why in your report.

You should tweak the parameters of your learner to maximize performance during the in sample period. Here is a partial list of things you can tweak:

Adjust YSELL and YBUY.
Adjust leaf_size.
Utilize bagging and adjust the number of bags.

Deliverables:

Descriptive text (1 or 2 pages with chart) that describes your ML approach.
Text must describe ML based system in sufficient detail that another person could implement it.
1 chart
Code: ML_based.py (generates an orders file)
Additional code files as necessary to support ML_based.py (e.g. RTLearner.py and so on).

Part 4: Comparative Analysis (20%)

Evaluate the performance of both of your strategies in the out of sample period. Note that you should not train your learner on this data. You should use the classification learned using the training data only. Create a chart that shows, out of sample:

Performance of the stock: Black line
Performance of manual strategy: Blue line
Performance of the ML strategy: Green line
All three should be normalized to 1.0 at the start.

Create a table that summarizes the performance of the stock, the manual strategy and the ML strategy for both in sample and out of sample periods. Utilize your experience in this class to determine which factors are best to use for comparing these strategies. If performance out of sample is worse than in sample, do your best to explain why. Also if the manual and ML strategies perform substantially differently, explain why. Is one method or the other more or less susceptible to the same underlying flaw? Why or why not?

Deliverables:

Descriptive text (1 or 2 pages including figures)
1 chart

Hints

Overall, I recommend the following steps in the creation of your strategies:

Indicator design hints:
- For your X values: Identify and implement at least 3 technical features that you believe may be predictive of future return.
Rule based design:
- Use a cascade of if statements conditioned on the indicators to identify whether a BUY condition is met.
- Use a cascade of if statements conditioned on the indicators to identify whether a SELL condition is met.
- The conditions for BUY and SELL should be mutually exclusive.
- If neither BUY or SELL is triggered, the result should be DO NOTHING.
- For debugging purposes, you may find it helpful to plot the value of the rule-based output (-1, 0, 1) versus the stock price.
Train a classification learner on in sample training data:
- For your Y values: Use future 10 day return (not future price). Then classify that return as BUY, SELL or DO NOTHING. You're trying to predict a relative change that you can use to invest with.
- For debugging purposes, you may find it helpful to plot the value of the training classification data (-1, 0, 1) versus the stock price in one color.
- For debugging purposes, you may find it helpful to plot the value of the training classification output (-1, 0, 1) versus the stock price in another color. Ideally, these two lines should be very similar.

Template and Data

There is no github template for this project. You should create a directory for your code in ml4t/mc3-p2 and make a copy of util.py there. You should also copy into that directory your learner code and your market simulator code. You will have access to the data in the ML4T/Data directory but you should use ONLY the code in util.py to read it.

Choosing Technical Features -- Your X Values

You should have already successfully coded the Bollinger Band feature. Here's a suggestion of how to normalize that feature so that it will typically provide values between -1.0 and 1.0:

bb_value[t] = (price[t] - SMA[t])/(2 * stdev[t])

Two other good features worth considering are momentum and volatility.

momentum[t] = (price[t]/price[t-N]) - 1

Volatility is just the stdev of daily returns.

Choosing Y

Your code should classify based on 10 day change in price. You need to build a new Y that reflects the 10 day change and aligns with the current date. Here's pseudo code for the calculation of Y

ret = (price[t+10]/price[t]) - 1.0
if ret > YBUY:
    Y[t] = +1 # BUY
else if ret < YSELL:
    Y[t] = -1 # SELL
else:
    Y[t] = 0

If you select Y in this manner and use it for training, your learner will classify 10 day returns.

Contents of Report

Your report should be no more than 3000 words. Your report should contain no more than 12 charts. Penalties will apply if you violate these constraints.
Include charts and text as identified in the sections above.

Expectations

In-sample IBM backtests should perform very well -- The ML version should do better than the manual version.
Out-of-sample IBM backtests should... (you should be able to complete this sentence).

What to turn in

Turn your project in via t-square.

Your report as report.pdf
All of your code, as necessary to run as .py files.
Document how to run your code in readme.txt.
No zip files please.

Rubric

To be determined.

Required, Allowed & Prohibited

Required:

Your project must be coded in Python 2.7.x.
Your code must run on one of the university-provided computers (e.g. buffet02.cc.gatech.edu), or on one of the provided virtual images.
Use only util.py to read data. If you want to read items other than adjusted close, modify util.py to do it, and submit your new version with your code.

Allowed:

You can develop your code on your personal machine, but it must also run successfully on one of the university provided machines or virtual images.
Your code may use standard Python libraries.
You may use the NumPy, SciPy, matplotlib and Pandas libraries. Be sure you are using the correct versions.
You may reuse sections of code (up to 5 lines) that you collected from other students or the internet.
Code provided by the instructor, or allowed by the instructor to be shared.
A herring.

Prohibited:

Any other method of reading data besides util.py
Any libraries not listed in the "allowed" section above.
Any code you did not write yourself (except for the 5 line rule in the "allowed" section).

Difference between revisions of "MC3-Project-3"

Revision as of 13:15, 31 October 2016

Contents

Updates / FAQs

Overview

Data Details, Dates and Rules

Part 1: Technical Indicators (20%)

Part 2: Manual Rule-Based Trader (30%)

Part 3: ML Trader (30%)

Part 4: Comparative Analysis (20%)

Hints

Template and Data

Choosing Technical Features -- Your X Values

Choosing Y

Contents of Report

Expectations

What to turn in

Rubric

Required, Allowed & Prohibited

Legacy

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

QuantSoftware Research Group

Spring 2020

Site

Tools

@@ Line 1: / Line 1: @@
-==Final==
-Part 1, 2, 3 and 4 are finalized.
 ==Updates / FAQs==
-==Overview==
+*'''2016-10-31''' Project finalized.
-In this project you will implement the Q-Learning and Dyna-Q solutions to the reinforcement learning problem.  You will apply them to two problems: 1) Navigation, and 2) Trading.  The reason for working with the navigation problem first is that, as you will see, navigation is an easy problem to work with and understand.  In the last part of the assignment you will apply Q-Learning to stock trading.
+* Q: In a previous project there was a constraint of holding a single position until exit. Does that apply to this project?  Yes, hold one position til exit.
-Note that your Q-Learning code really shouldn't care which problem it is solving.  The difference is that you need to wrap the learner in different code that frames the problem for the learner as necessary.
+* Q: Is that 10 calendar days, or 10 trading days (i.e., days when SPY was traded)? A: Always use trading days.
-For the navigation problem we have created testqlearner.py that automates testing of your Q-Learner in the navigation problem.  We also provide teststrategylearner.py to test your strategy learner.  In order to apply Q-learning to trading you will have to implement an API that calls Q-learning internally.
+* Q: Are there constraints for Python modules allowed for this project? Can we experiment with modules for optimization or technical analysis and cite or are we expected to write everything from scratch for this project as well?  A: The constraints are the same as for the first learning project. You've already written the learners you need.
-Overall, your tasks for this project include:
+* Q: I want to read some other values from the data besides just adjusted close, how can I do that? A: Please modify an old version of util.py to do that, include that new util.py with your submission.
-* Code a Q-Learner
+* Q: Are we required to trade in only 500 share blocks? (and have no more than 500 shares long or short at a time as in some of the previous assignments)  A: Yes.  This will enable comparison between results more easily.
-* Code the Dyna-Q feature of Q-Learning
-* Test/debug the Q-Learner in navigation problems
+* Q: Are we limited to leverage of 2.0 on the portfolio?  A: There is no limit on leverage.
-* Build a strategy learner based on your Q-Learner
-* Test/debug the strategy learner on specific symbol/time period problems
+* Q: Are we only allowed one position at a time?  A: You can be in one of three states: -500 shares, +500 shares, 0 shares.
-Scoring for the project will be allocated as follows:
+==Overview==
-* Navigation test cases: 80% (note that we will check those with dyna = 0)
+In this project you will develop trading strategies using Technical Analysis, and test them using your market simulator. You will then utilize your Random Tree learner to train and test a learning trading algorithm.
-* Dyna implemented: 5% (we will check this with one navigation test case by comparing performance with and without dyna turned on)
-* Trading strategy test cases: 20%
-For this assignment we will test only your code (there is no report component).  Note that the scoring is structured so that you can earn a B (80%) if you implement only Q-Learning, but if you implement everything, the total possible score is 105%.  That means you can earn up to 5% extra credit.
+* Part 1: Develop and describe a set of at least 3 technical indicators.  At least one of these indicators must be substantially different from the indicators whose code was presented in class.
+* Part 2: Devise and test a rule-based trading strategy using your indicators from Part 1.  Test its performance in sample using your market simulator.
+* Part 3: Use you decision tree learner to create a classifier that decides when to trade.  Test its performance in sample using your market simulator.
+* Part 4: Comparative analysis.
-==Template and Data==
+In this project we shift from an auto graded format to a report format. For this project your grade will be based on the PDF report you submit, not your code. However, you will also submit your code that will be checked visually to ensure it appropriately matches the report you submit.
-* Download <tt>'''[[Media:mc3_p3.zip|mc3_p3.zip]]'''</tt>, unzip inside <tt>ml4t/</tt>
+==Data Details, Dates and Rules==
-* Implement the <tt>QLearner</tt> class in <tt>mc3_p3/QLearner.py</tt>.
-* Implement the <tt>StrategyLearner</tt> class in <tt>mc3_p3/StrategyLearner.py</tt>
-* To test your Q-learner, run <tt>'''python testqlearner.py'''</tt> from the <tt>mc3_p3/</tt> directory.
-* To test your strategy learner, run <tt>'''python teststrategylearner.py'''</tt> from the <tt>mc3_p3/</tt> directory.
-* Note that example problems are provided in the <tt>mc3_p3/testworlds</tt> directory
-==Part 1: Implement QLearner==
+Use the following parameters for Part 2, 3 and 4:
-Your QLearner class should be implemented in the file <tt>QLearner.py</tt>.  It should implement EXACTLY the API defined below.  DO NOT import any modules besides those allowed below.  Your class should implement the following methods:
+* Use only the data provided for this course.  You are not allowed to import external data.
+* Trade only the symbol IBM (however, you may, if you like, use data from other symbols to inform your strategy).
+* The in sample/training period is January 1, 2006 to December 31 2009.
+* The out of sample/testing period is January 1, 2010 to December 31 2010.
+* Starting cash is $100,000.
+* Allowable positions are: 500 shares long, 500 shares short, 0 shares.
+* There is no limit on leverage.
-* QLearner(...): Constructor, see argument details below.
+==Part 1: Technical Indicators (20%)==
-* query(s_prime, r): Update Q-table with <s, a, s_prime, r> and return new action for state s_prime, update rar.
-* querysetstate(s): Set state to s, return action for state s, but don't update Q-table or rar.
-Here's an example of the API in use:
+Develop and describe at least 3 and at most 5 technical indicators.  You may find our lecture on time series processing to be helpful.  For each indicator you should create a single chart that shows the price history of the stock during the in-sample period, "helper data" and the value of the indicator itself.  As an example, if you were using price/SMA as an indicator you would want to create a chart with 3 lines: Price, SMA, Price/SMA
-<PRE>
+Your report description of each indicator should enable someone to reproduce it just by reading the description. We want a written description here, not code, however, it is OK to augment your written description with a pseudocode figure.
-import QLearner as ql
-learner = ql.QLearner(num_states = 100, \
+At least one of the indicators you use should be completely different from the ones presented in our lectures.
-    num_actions = 4, \
-    alpha = 0.2, \
-    gamma = 0.9, \
-    rar = 0.98, \
-    radr = 0.999, \
-    dyna = 0, \
-    verbose = False)
-s = 99 # our initial state
+Deliverables:
+* Descriptive text (2 to 3 pages with figures).
+* 3 to 5 charts (one for each indicator)
+* Code: indicators.py
-a = learner.querysetstate(s) # action for state s
+==Part 2: Manual Rule-Based Trader (30%)==
-s_prime = 5 # the new state we end up in after taking action a in state s
+Devise a set of rules using the indicators you created in Part 1 above.  Your rules should be designed to trigger a "long" or "short" entry for a 10 trading day hold.  In other words, once an entry is initiated, you must remain in the position for 10 trading days.  In your report you must describe your trading rules so that another person could implement them based only on your description. We want a written description here, not code, however, it is OK to augment your written description with a pseudocode figure.
-r = 0 # reward for taking action a in state s
+You should tweak your rules as best you can to get the best performance possible from during the in sample period (do not peak at out of sample performance).  Use your rule-based strategy to generate an orders file over the in sample period, then run that file through your market simulator to create a chart that includes the following components over the in sample period:
-next_action = learner.query(s_prime, r)
+* Price of IBM (normalized to 1.0 at the start): Black line
-</PRE>
+* Value of the rule-based portfolio (normalized to 1.0 at the start): Blue line
+* Vertical green lines indicating LONG entry points.
+* Vertical red lines indicating SHORT entry points.
+* Vertical black lines indicating exits (long or short).
-<b>The constructor QLearner()</b> should reserve space for keeping track of Q[s, a] for the number of states and actions.  It should initialize Q[] with uniform random values between -1.0 and 1.0.  Details on the input arguments to the constructor:
+Note that each red or green vertical line should be followed by a black line before another entry occurs.  We will check for that.  We expect that your rule-based strategy should outperform the stock IBM over the in sample period.
-* <tt>num_states</tt> integer, the number of states to consider
+Deliverables:
-* <tt>num_actions</tt>  integer, the number of actions available.
+* Descriptive text (1 or 2 pages with chart) that provides a compelling justification for rule-based system developed.
-* <tt>alpha</tt> float, the learning rate used in the update rule. Should range between 0.0 and 1.0 with 0.2 as a typical value.
+* Text must describe rule based system in sufficient detail that another person could implement it.
-* <tt>gamma</tt> float, the discount rate used in the update rule.  Should range between 0.0 and 1.0 with 0.9 as a typical value.
+* 1 chart.
-* <tt>rar</tt> float, random action rate: the probability of selecting a random action at each step. Should range between 0.0 (no random actions) to 1.0 (always random action) with 0.5 as a typical value.
+* Code: rule_based.py (generates an orders file)
-* <tt>radr</tt> float, random action decay rate, after each update, rar = rar * radr. Ranges between 0.0 (immediate decay to 0) and 1.0 (no decay).  Typically 0.99.
-* <tt>dyna</tt> integer, conduct this number of dyna updates for each regular update.  When Dyna is used, 200 is a typical value.
-* <tt>verbose</tt> boolean, if True, your class is allowed to print debugging statements, if False, all printing is prohibited.
-<b>query(s_prime, r)</b> is the core method of the Q-Learner.  It should keep track of the last state s and the last action a, then use the new information s_prime and r to update the Q table.  The learning instance, or experience tuple is <s, a, s_prime, r>.  query() should return an integer, which is the next action to take.  Note that it should choose a random action with probability rar, and that it should update rar according to the decay rate radr at each step.  Details on the arguments:
+==Part 3: ML Trader (30%)==
-* <tt>s_prime</tt> integer, the the new state.
+Convert your decision tree '''regression''' learner into a '''classification''' learner.  The classifications should be:
-* <tt>r</tt> float, a real valued immediate reward.
-<b>querysetstate(s)</b> A special version of the query method that sets the state to s, and returns an integer action according to the same rules as query() (including choosing a random action sometimes), but it does not execute an update to the Q-table.  It also does not update rar. There are two main uses for this method: 1) To set the initial state, and 2) when using a learned policy, but not updating it.
+* +1: BUY
+* 0: DO NOTHING
+* -1: SELL
-==Part 2: Navigation Problem Test Cases==
+The X data for each sample (day) are simply the values of your indicators for the stock -- you should have 3 to 5 of them.  The Y data (or classifications) will be based on 10 day return.  You should classify the example as a +1 or "BUY" if the 10 day return exceeds a certain value, let's call it YBUY for the moment.  You should classify the example as a -1 or "SELL" if the 10 day return is below a certain value we'll call YSELL.   In all other cases the sample should be classified as a 0 or "DO NOTHING."
-We will test your Q-Learner with a navigation problem as follows.  Note that your Q-Learner does not need to be coded specially for this task.  In fact the code doesn't need to know anything about it.  The code necessary to test your learner with this navigation task is implemented in testqlearner.py for you.  The navigation task takes place in a 10 x 10 grid world.  The particular environment is expressed in a CSV file of integers, where the value in each position is interpreted as follows:
+Note that your X values are calculated each day from the current day's (and earlier) data, but the Y value is calculated using data from the future.  You may tweak various parameters of your learner to maximize return (more on that below).  Train and test your learning strategy over the in sample period.  Whenever a BUY or SELL is encountered, you must enter the corresponding position and hold it for 10 days.  That means, for instance, that if you encounter a BUY on day 1, then a SELL on day 2, you must keep the stock still until the 10 days expire, even though you received this conflicting information.  The reason for this is that we're trying to provide a way to directly compare the manual strategy versus the ML strategy.
-* 0: blank space.
+Use your ML-based strategy to generate an orders file over the in sample period, then run that file through your market simulator to create a chart that includes the following components over the in sample period:
-* 1: an obstacle.
-* 2: the starting location for the robot.
-* 3: the goal location.
-An example navigation problem (CSV file) is shown below.  Following python conventions, [0,0] is upper left, or northwest corner, [9,9] lower right or southeast corner.  Rows are north/south, columns are east/west.
+* Price of IBM (normalized to 1.0 at the start): Black line.
+* Value of the rule-based portfolio (normalized to 1.0 at the start): Blue line.
+* Value of the ML-based portfolio (normalized to 1.0 at the start): Green line.
+* Vertical green lines indicating LONG entry points.
+* Vertical red lines indicating SHORT entry points.
+* Vertical black lines indicating exits (long or short).
-<PRE>
+Note that each red or green vertical line should be followed by a black line before another entry occurs.  We will check for that.  We expect that the ML-based strategy will outperform the manual strategy, however it is possible that it does not.  If it is the case that your manual strategy does better, you should try to explain why in your report.
-,0,0,0,3,0,0,0,0,0
-,0,0,0,0,0,0,0,0,0
-,0,0,0,0,0,0,0,0,0
-,0,1,1,1,1,1,0,0,0
-,0,1,0,0,0,1,0,0,0
-,0,1,0,0,0,1,0,0,0
-,0,1,0,0,0,1,0,0,0
-,0,0,0,0,0,0,0,0,0
-,0,0,0,0,0,0,0,0,0
-,0,0,0,2,0,0,0,0,0
-</PRE>
-In this example the robot starts at the bottom center, and must navigate to the top center.  Note that a wall of obstacles blocks its path.  We map this problem to a reinforcement learning problem as follows:
+You should tweak the parameters of your learner to maximize performance during the in sample period.  Here is a partial list of things you can tweak:
+* Adjust YSELL and YBUY.
+* Adjust leaf_size.
+* Utilize bagging and adjust the number of bags.
-* State: The state is the location of the robot, it is computed (discretized) as: column location * 10 + row location.
+Deliverables:
-* Actions: There are 4 possible actions, 0: move north, 1: move east, 2: move south, 3: move west.
+* Descriptive text (1 or 2 pages with chart) that describes your ML approach.
-* R: The reward is -1.0 unless the action leads to the goal, in which case the reward is +1.0.
+* Text must describe ML based system in sufficient detail that another person could implement it.
-* T: The transition matrix can be inferred from the CSV map and the actions.
+* 1 chart
+* Code: ML_based.py (generates an orders file)
+* Additional code files as necessary to support ML_based.py (e.g. RTLearner.py and so on).
-Note that R and T are not known by or available to the learner.  The testing code <tt>testqlearner.py</tt> will test your code as follows (pseudo code):
+==Part 4: Comparative Analysis (20%)==
-<pre>
+Evaluate the performance of both of your strategies in the out of sample period.   Note that you '''should not''' train your learner on this data.  You should use the classification learned using the training data only.  Create a chart that shows, out of sample:
-Instantiate the learner with the constructor QLearner()
-s = initial_location
-a = querysetstate(s)
-s_prime = new location according to action a
-r = -1.0
-while not converged:
-    a = query(s_prime, r)
-    s_prime = new location according to action a
-    if s_prime == goal:
-        r = +1
-        s_prime = start location
-    else
-        r = -1
-</pre>
-A few things to note about this code: The learner always receives a reward of -1.0 until it reaches the goal, when it receives a reward of +1.0. As soon as the robot reaches the goal, it is immediately returned to the starting location.
+* Performance of the stock: Black line
+* Performance of manual strategy: Blue line
+* Performance of the ML strategy: Green line
+* All three should be normalized to 1.0 at the start.
-Here are example solutions:
+Create a table that summarizes the performance of the stock, the manual strategy and the ML strategy for both in sample and out of sample periods.  Utilize your experience in this class to determine which factors are best to use for comparing these strategies.  If performance out of sample is worse than in sample, do your best to explain why.  Also if the manual and ML strategies perform substantially differently, explain why.  Is one method or the other more or less susceptible to the same underlying flaw?  Why or why not?
-[[mc3_p3_examples]]
+Deliverables:
+* Descriptive text (1 or 2 pages including figures)
+* 1 chart
-[[mc3_p3_dyna_examples]]
+==Hints==
-==Part 3: Implement Dyna==
+Overall, I recommend the following steps in the creation of your strategies:
-Add additional components to your QLearner class so that multiple "hallucinated" experience tuples are used to update the Q-table for each "real" experience.  The addition of this component should speed convergence in terms of the number of calls to query().
+* Indicator design hints:
+** For your X values: Identify and implement at least 3 technical features that you believe may be predictive of future return.
+* Rule based design:
+** Use a cascade of if statements conditioned on the indicators to identify whether a BUY condition is met.
+** Use a cascade of if statements conditioned on the indicators to identify whether a SELL condition is met.
+** The conditions for BUY and SELL should be mutually exclusive.
+** If neither BUY or SELL is triggered, the result should be DO NOTHING.
+** For debugging purposes, you may find it helpful to plot the value of the rule-based output (-1, 0, 1) versus the stock price.
+* Train a classification learner on in sample training data:
+** For your Y values: Use future 10 day return (not future price).  Then classify that return as BUY, SELL or DO NOTHING.  You're trying to predict a relative change that you can use to invest with.
+** For debugging purposes, you may find it helpful to plot the value of the training classification data (-1, 0, 1) versus the stock price in one color.
+** For debugging purposes, you may find it helpful to plot the value of the training classification output (-1, 0, 1) versus the stock price in another color.  Ideally, these two lines should be very similar.
-We will test your code on <tt>world03.csv</tt> with 50 iterations and with dyna = 200.  Our expectation is that with Dyna, the solution should be much better than without.
+==Template and Data==
-==Part 4: Implement Strategy Learner==
+There is no github template for this project.  You should create a directory for your code in ml4t/mc3-p2 and make a copy of util.py there.  You should also copy into that directory your learner code and your market simulator code. You will have access to the data in the ML4T/Data directory but you should use ONLY the code in util.py to read it.
-For this part of the project you should develop a learner that can learn a trading policy using your Q-Learner.  Utilize the template provided in <tt>StrategyLearner.py</tt> Overall the structure of your strategy learner should be arranged like this:
+==Choosing Technical Features -- Your X Values==
-For the policy learning part:
+You should have already successfully coded the Bollinger Band feature.  Here's a suggestion of how to normalize that feature so that it will typically provide values between -1.0 and 1.0:
-* Select several technical features, and compute their values for the training data
-* Discretize the values of the features
-* Instantiate a Q-learner
-* For each day in the training data:
-** Compute the current state (including holding)
-** Compute the reward for the last action
-** Query the learner with the current state and reward to get an action
-** Implement the action the learner returned (BUY, SELL, NOTHING), and update portfolio value
-* Repeat the above loop multiple times until cumulative return stops improving.
-A rule to keep in mind: As in past projects, you can only be long or short 100 shares, so if your learner returns two BUYs in a row, don't double down, same thing with SELLs.
+<PRE>
+bb_value[t] = (price[t] - SMA[t])/(2 * stdev[t])
+</PRE>
-For the policy testing part:
+Two other good features worth considering are momentum and volatility.
-* For each day in the testing data:
-** Compute the current state
-** Query the learner with the current state to get an action
-** Implement the action the learner returned (BUY, SELL, NOTHING), and update portfolio value
-* Return the resulting trades in a data frame (details below).
-Your StrategyLearner should implement the following API:
 <PRE>
-import StrategyLearner as sl
+momentum[t] = (price[t]/price[t-N]) - 1
-learner = sl.StrategyLearner(verbose = False) # constructor
-learner.addEvidence(symbol = "IBM", sd=dt.datetime(2008,1,1), ed=dt.datetime(2009,1,1), sv = 10000) # training step
-df_trades = learner.testPolicy(symbol = "IBM", sd=dt.datetime(2009,1,1), ed=dt.datetime(2010,1,1), sv = 10000) # testing step
 </PRE>
-The input parameters are:
+Volatility is just the stdev of daily returns.
-* verbose: if False do not generate any output
+==Choosing Y==
-* symbol: the stock symbol to train on
-* sd: A datetime object that represents the start date
-* ed: A datetime object that represents the end date
-* sv: Start value of the portfolio
-The output result is:
+Your code should classify based on 10 day change in price.  You need to build a new Y that reflects the 10 day change and aligns with the current date.  Here's pseudo code for the calculation of Y
-* df_trades: A data frame whose values represent trades for each day.  Legal values are +100.0 indicating a BUY of 100 shares, -100.0 indicating a SELL of 100 shares, and 0.0 indicating NOTHING [update, values of +200 and -200 for trades are also legal so long as net holdings are constrained to -100, 0, and 100].
+  ret = (price[t+10]/price[t]) - 1.0
+ if ret > YBUY:
+     Y[t] = +1 # BUY
+ else if ret < YSELL:
+     Y[t] = -1 # SELL
+ else:
+     Y[t] = 0
+If you select Y in this manner and use it for training, your learner will classify 10 day returns.
 ==Contents of Report==
-There is no report component of this assignment.  However, if you would like to impress us with your Machine Learning prowess, you are invited to submit a succinct report.
+* Your report should be no more than 3000 words.  Your report should contain no more than 12 charts.  Penalties will apply if you violate these constraints.
+* Include charts and text as identified in the sections above.
-==Hints & resources==
-This paper by Kaelbling, Littman and Moore, is a good resource for RL in general: http://www.jair.org/media/301/live-301-1562-jair.pdf  See Section 4.2 for details on Q-Learning.
-There is also a chapter in the Mitchell book on Q-Learning.
-For implementing Dyna, you may find the following resources useful:
+==Expectations==
-* https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node96.html
+* In-sample IBM backtests should perform very well -- The ML version should do better than the manual version.
-* http://www-anw.cs.umass.edu/~barto/courses/cs687/Chapter%209.pdf
+* Out-of-sample IBM backtests should... (you should be able to complete this sentence).
 ==What to turn in==
-Turn your project in via t-square.   All of your code must be contained within QLearner.py and StrategyLearner.py.
+Turn your project in via t-square.
-* Your QLearner as <tt>QLearner.py</tt>
+* Your report as <tt>report.pdf</tt>
-* Your StrategyLearner as <tt>StrategyLearner.py</tt>
+* All of your code, as necessary to run as <tt>.py</tt> files.
-* Your report (if any) as <tt>report.pdf</tt>
+* Document how to run your code in <tt>readme.txt</tt>.
-* Do not submit any other files.
+* No zip files please.
 ==Rubric==
-Only your QLearner class will be tested.
+To be determined.
-* For basic Q-Learning (dyna = 0) we will test your learner against 10 test worlds with 500 iterations.  Each test should complete in less than 2 seconds.  For the test to be successful, your learner should find a path to the goal <= 1.5 x the number of steps our reference solution finds.  We will check this by taking the min of all the 500 runs. Each test case is worth 8 points. We will initialize your learner with the following parameter values:
-<Pre>
-    learner = ql.QLearner(num_states=100,\
-        num_actions = 4, \
-        alpha = 0.2, \
-        gamma = 0.9, \
-        rar = 0.98, \
-        radr = 0.999, \
-        dyna = 0, \
-        verbose=False) #initialize the learner
-</PRE>
-* For Dyna-Q, we will set dyna = 200.  We will test your learner against <tt>world03.csv</tt> with 50 iterations.  The test should complete in less than 10 seconds. For the test to be successful, your learner should find a path to the goal <= 1.5 x the number of steps our reference solution finds.  We will check this by taking the min of all 50 runs. The test case is worth 5 points.  We will initialize your learner with the following parameter values:
-<Pre>
-    learner = ql.QLearner(num_states=100,\
-        num_actions = 4, \
-        alpha = 0.2, \
-        gamma = 0.9, \
-        rar = 0.5, \
-        radr = 0.99, \
-        dyna = 200, \
-        verbose=False) #initialize the learner
-</PRE>
-* We will test StrategyLearner in the following situations:
-** Training: Dec 31 2007 to Dec 31 2009
-** Testing: Dec 31 2009 to Dec 31 2011
-** Symbols: ML4T-220, IBM
-** Starting value: $10,000
-** Benchmark: Buy 100 shares on the first trading day, Sell 100 shares on the last day.
-* We expect the following outcomes in testing:
-** For ML4T-220, the trained policy should significantly outperform the benchmark in sample (7 points)
-** For ML4T-220, the trained policy should significantly outperform the benchmark out of sample (7 points)
-** For IBM, the trained policy should significantly outperform the benchmark in sample (7 points)
-Training and testing for each situation should run in less than 30 seconds.  We reserve the right to use different time periods if necessary to reduce auto grading time.
 ==Required, Allowed & Prohibited==
@@ Line 269: / Line 202: @@
 * Your project must be coded in Python 2.7.x.
 * Your code must run on one of the university-provided computers (e.g. buffet02.cc.gatech.edu), or on one of the provided virtual images.
+* Use only util.py to read data.  If you want to read items other than adjusted close, modify util.py to do it, and submit your new version with your code.
 Allowed:
@@ Line 276: / Line 210: @@
 * You may reuse sections of code (up to 5 lines) that you collected from other students or the internet.
 * Code provided by the instructor, or allowed by the instructor to be shared.
-* Use util.py (only) for reading data.
+* A herring.
 Prohibited:
+* Any other method of reading data besides util.py
 * Any libraries not listed in the "allowed" section above.
 * Any code you did not write yourself (except for the 5 line rule in the "allowed" section).
-* Any Classes (other than Random) that create their own instance variables for later use (e.g., learners like kdtree).
-* Print statements outside "verbose" checks (they significantly slow down auto grading).
-* Any method for reading data besides util.py
 ==Legacy==
+*[[MC3-Project-2-Legacy-trader]]
+*[[MC3-Project-2-Legacy]]
 *[[MC3-Project-3-Legacy-Q]]
 *[[MC3-Project-3-Legacy]]