Difference between revisions of "Manual strategy"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
 
(60 intermediate revisions by one other user not shown)
Line 1: Line 1:
==DRAFT==
+
==Finalized==
 
 
This assignment is under revision.  This notice will be removed once it is final.
 
  
 
==Updates / FAQs==
 
==Updates / FAQs==
Line 16: Line 14:
  
 
In this project you will develop a trading strategy using your intuition and Technical Analysis, and test it against a stock using your market simulator. In a later project, you will use your same indicators but with Machine Learning (instead of your intuition) to create a trading strategy. We hope Machine Learning will do better than your intuition, but who knows?
 
In this project you will develop a trading strategy using your intuition and Technical Analysis, and test it against a stock using your market simulator. In a later project, you will use your same indicators but with Machine Learning (instead of your intuition) to create a trading strategy. We hope Machine Learning will do better than your intuition, but who knows?
 +
 +
==Template==
 +
 +
There is no distributed template for this project.  You should create a directory for your code in ml4t/manual_strategy and make a copy of util.py there.  You will have access to the data in the ML4T/Data directory but you should use ONLY the code in util.py to read it.
 +
 +
You should create the following code files for submission.  They should comprise ALL code from you that is necessary to run your evaluations.
 +
 +
* <tt>indicators.py</tt> Your code that implements your indicators as functions that operate on dataframes.  The "main" code in indicators.py should generate the charts that illustrate your indicators in the report.
 +
* <tt>marketsimcode.py</tt> An improved version of your marketsim code that accepts a "trades" data frame (instead of a file).  More info on the trades data frame below.  It is OK not to submit this file if you have subsumed its functionality into one of your other code files.
 +
* <tt>ManualStrategy.py</tt> Code implementing a ManualStrategy object (your manual strategy).  It should implement testPolicy() which returns a trades data frame (see below). The main part of this code should call marketsimcode as necessary to generate the plots used in the report.
 +
* <tt>TheoreticallyOptimalStrategy.py</tt> Code implementing a TheoreticallyOptimalStrategy object (details below).  It should implement testPolicy() which returns a trades data frame (see below). The main part of this code should call marketsimcode as necessary to generate the plots used in the report.
 +
 +
Note that we may not test your code, so we may not know if you didn't organize your code as recommended, but this arrangement will be required for later projects, so it is worthwhile getting it set up this way.  The key requirement is that, if necessary, a TA should be able to run your code on a buffet machine and get the same results (e.g., statistics and charts) that we see in your report.
  
 
==Data Details, Dates and Rules==
 
==Data Details, Dates and Rules==
Line 22: Line 33:
 
* For your report, trade only the symbol JPM. This will enable us to more easily compare results.  
 
* For your report, trade only the symbol JPM. This will enable us to more easily compare results.  
 
* You may use data from other symbols (such as SPY) to inform your strategy.
 
* You may use data from other symbols (such as SPY) to inform your strategy.
* The testing period is January 1, 2008 to December 31 2009.
+
* The in sample/development period is January 1, 2008 to December 31 2009.
 +
* The out of sample/testing period is January 1, 2010 to December 31 2011.
 
* Starting cash is $100,000.
 
* Starting cash is $100,000.
 
* Allowable positions are: 1000 shares long, 1000 shares short, 0 shares.
 
* Allowable positions are: 1000 shares long, 1000 shares short, 0 shares.
 
* Benchmark: The performance of a portfolio starting with $100,000 cash, investing in 1000 shares of JPM and holding that position.
 
* Benchmark: The performance of a portfolio starting with $100,000 cash, investing in 1000 shares of JPM and holding that position.
 
* There is no limit on leverage.
 
* There is no limit on leverage.
* Transaction costs: Commission: $9.95, Impact: 0.005.
+
* Transaction costs for ManualStrategy: Commission: $9.95, Impact: 0.005.
 +
* Transaction costs for TheoreticallyOptimalStrategy: Commission: $0.00, Impact: 0.00.
  
 
==Part 1: Technical Indicators (20 points)==
 
==Part 1: Technical Indicators (20 points)==
Line 39: Line 52:
 
At least one of the indicators you use should be completely different from the ones presented in our lectures. (i.e. something other than SMA, Bollinger Bands, RSI).
 
At least one of the indicators you use should be completely different from the ones presented in our lectures. (i.e. something other than SMA, Bollinger Bands, RSI).
  
==Part 2: Best Possible Strategy (5 points)==
+
==Part 2: Theoretically Optimal Strategy (20 points)==
  
 
Assume that you can see the future, but that you are constrained by the portfolio size and order limits as specified above.  Create a set of trades that represents the best a strategy could possibly do during the in sample period. The reason we're having you do this is so that you will have an idea of an upper bound on performance.   
 
Assume that you can see the future, but that you are constrained by the portfolio size and order limits as specified above.  Create a set of trades that represents the best a strategy could possibly do during the in sample period. The reason we're having you do this is so that you will have an idea of an upper bound on performance.   
Line 48: Line 61:
  
 
* Benchmark (see definition above) normalized to 1.0 at the start: Blue line
 
* Benchmark (see definition above) normalized to 1.0 at the start: Blue line
* Value of the best possible portfolio (normalized to 1.0 at the start): Black line
+
* Value of the theoretically optimal portfolio (normalized to 1.0 at the start): Black line
  
 
You should also report in text:
 
You should also report in text:
Line 56: Line 69:
 
* Mean of daily returns of benchmark and portfolio
 
* Mean of daily returns of benchmark and portfolio
  
==Part 3: Manual Rule-Based Trader (20%)==
+
Your code should implement testPolicy() as follows:
  
Devise a set of rules using the indicators you created in Part 1 above. Your rules should be designed to trigger a "long" or "short" entry for a 21 trading day hold. In other words, once an entry is initiated, you must remain in the position for 21 trading days.  In your report you must describe your trading rules so that another person could implement them based only on your description. We want a written description here, not code, however, it is OK to augment your written description with a pseudocode figure.
+
df_trades = tos.testPolicy(symbol = "AAPL", sd=dt.datetime(2010,1,1), ed=dt.datetime(2011,12,31), sv = 100000)
  
You should tweak your rules as best you can to get the best performance possible during the in sample period (do not peek at out of sample performance).  Use your rule-based strategy to generate an orders file over the in sample period, then run that file through your market simulator to create a chart that includes the following components over the in sample period:
+
The input parameters are:
  
* Benchmark (see definition above) normalized to 1.0 at the start: Black line
+
* symbol: the stock symbol to act on
* Value of the rule-based portfolio (normalized to 1.0 at the start): Blue line
+
* sd: A datetime object that represents the start date
* Vertical green lines indicating LONG entry points.
+
* ed: A datetime object that represents the end date
* Vertical red lines indicating SHORT entry points.
+
* sv: Start value of the portfolio
  
Note that each red or green vertical line should be at least 21 days from the preceding line.  We will check for that.  We expect that your rule-based strategy should outperform the benchmark over the in sample period. 
+
The output result is:
  
Deliverables:
+
* df_trades: A data frame whose values represent trades for each day.  Legal values are +1000.0 indicating a BUY of 1000 shares, -1000.0 indicating a SELL of 1000 shares, and 0.0 indicating NOTHING. Values of +2000 and -2000 for trades are also legal so long as net holdings are constrained to -1000, 0, and 1000.
* Descriptive text (1 or 2 pages with chart) that provides a compelling justification for the rule-based system developed.
 
* Text must describe rule based system in sufficient detail that another person could implement it.
 
* 1 chart.
 
* Code: rule_based.py (generates an orders file)
 
  
==Part 4: ML Trader (30%)==
+
==Part 3: Manual Rule-Based Trader (50 points)==
  
Convert your decision tree '''regression''' learner into a '''classification''' learnerThe classifications should be:
+
In <tt>ManualStrategy.py</tt> implement a set of rules using the indicators you created in Part 1 above.  Devise some simple logic using your indicators to enter and exit positions in the stock.   
  
* +1: LONG
+
A recommended approach is to create a single logical expression that yields a -1, 0, or 1, corresponding to a "short," "out" or "long" position.  Example usage this signal: If you are out of the stock, then a 1 would signal a BUY 1000 order. If you are long, a -1 would signal a SELL 2000 order.  You don't have to follow this advice though, so long as you follow the trading rules outlined above.
* 0: DO NOTHING
 
* -1: SHORT
 
  
The X data for each sample (day) are simply the values of your indicators for the stock -- you should have 3 to 5 of them.  The Y data (or classifications) will be based on 21 day return.  You should classify the example as a +1 or "LONG" if the 21 day return exceeds a certain value, let's call it YBUY for the moment.  You should classify the example as a -1 or "SHORT" if the 21 day return is below a certain value we'll call YSELL.  In all other cases the sample should be classified as a 0 or "DO NOTHING."  Note that it is very important that you train your learner with these classification values (not the 21 day returns).  We will check for this.
+
For the report we want a written description, not code, however, it is OK to augment your written description with a pseudocode figure.  
  
Note that your X values are calculated each day from the current day's (and earlier) data, but the Y value (classification) is calculated using data from the future.  You may tweak various parameters of your learner to maximize return (more on that below).  Train and test your learning strategy over the in sample period.  Whenever a LONG or SHORT is encountered, you must enter the corresponding position and hold it for 21 days.  That means, for instance, that if you encounter a LONG on day 1, then a SHORT on day 2, you must keep the stock still until the 21 days expire, even though you received this conflicting information.  The reason for this is that we're trying to provide a way to directly compare the manual strategy versus the ML strategy.
+
You should tweak your rules as best you can to get the best performance possible during the in sample period (do not peek at out of sample performance).  Use your rule-based strategy to generate an orders dataframe over the in sample period, then run that dataframe through your market simulator to create a chart that includes the following components over the in sample period:
  
'''Important note:''' You must set the leaf_size parameter of your decision tree learner to 5 or larger.  This requirement is intended to avoid a degenerate overfit solution to this problem.
+
* Benchmark (see definition above) normalized to 1.0 at the start: Blue line
 
+
* Value of the rule-based portfolio (normalized to 1.0 at the start): Black line
Use your ML-based strategy to generate an orders file over the in sample period, then run that file through your market simulator to create a chart that includes the following components over the in sample period:
 
 
 
* Benchmark (see definition above) normalized to 1.0 at the start: Black line
 
* Value of the rule-based portfolio (normalized to 1.0 at the start): Blue line.
 
* Value of the ML-based portfolio (normalized to 1.0 at the start): Green line.
 
 
* Vertical green lines indicating LONG entry points.
 
* Vertical green lines indicating LONG entry points.
 
* Vertical red lines indicating SHORT entry points.
 
* Vertical red lines indicating SHORT entry points.
  
We expect that the ML-based strategy will outperform the manual strategy, however it is possible that it does not.  If it is the case that your manual strategy does better, you should try to explain why in your report.
+
We expect that your rule-based strategy should outperform the benchmark over the in sample period.
  
You should tweak the parameters of your learner to maximize performance during the in sample period.  Here is a partial list of things you can tweak:
+
Your code should implement the same API as above for theoretically optimal:
* Adjust YSELL and YBUY.
 
* Adjust leaf_size.
 
* Utilize bagging and adjust the number of bags.
 
  
Deliverables:
+
df_trades = ms.testPolicy(symbol = "AAPL", sd=dt.datetime(2010,1,1), ed=dt.datetime(2011,12,31), sv = 100000)
* Descriptive text (1 or 2 pages with chart) that describes your ML approach.
 
* Text must describe ML based system in sufficient detail that another person could implement it.
 
* 1 chart
 
* Code: ML_based.py (generates an orders file)
 
* Additional code files as necessary to support ML_based.py (e.g. RTLearner.py and so on).
 
  
==Part 5: Visualization of data (15%)==
+
==Part 4: Comparative Analysis (10 points)==
  
Choose two of your indicators, call them X1 and X2.  Create 3 scatter plots where each point in each plot is located according to the indicator values on that day at X1, X2Color each dot according to the following scheme:
+
Evaluate the performance of your strategy in the out of sample period.  Note that you '''should not''' train or tweak your approach on this dataYou should use the classification learned using the in sample data only.  Create a chart that shows, out of sample:
  
* Green if the factors on that day satisfy "LONG" conditions.
+
* Benchmark (see definition above) normalized to 1.0 at the start: Blue line
* Red if the factors satisfy "SHORT" conditions.
+
* Performance of manual strategy: Black line
* Black if neither "LONG" or "SHORT" are satisfied.
+
* Both should be normalized to 1.0 at the start.
  
The scale for the scatter plot should be set to +-1.5 in both dimensions.  This will help us check that you have standardized your indicators.
+
Create a table that summarizes the performance of the stock, and the manual strategy for both in sample and out of sample periods.  Explain WHY these differences occur.
 
 
The 3 plots should be based on the in sample period (about 500 points):
 
 
 
# Your rule-based strategy.
 
# The training data for your ML strategy.
 
# Response of your learner when queried with the same data (after training).
 
 
 
==Part 6: Comparative Analysis (10%)==
 
 
 
Evaluate the performance of both of your strategies in the out of sample period.  Note that you '''should not''' train or tweak your learner on this data.  You should use the classification learned using the training data only.  Create a chart that shows, out of sample:
 
 
 
* Benchmark (see definition above) normalized to 1.0 at the start: Black line
 
* Performance of manual strategy: Blue line
 
* Performance of the ML strategy: Green line
 
* All three should be normalized to 1.0 at the start.
 
 
 
Create a table that summarizes the performance of the stock, the manual strategy and the ML strategy for both in sample and out of sample periods.  Utilize your experience in this class to determine which factors are best to use for comparing these strategies.  If performance out of sample is worse than in sample, do your best to explain why.  Also if the manual and ML strategies perform substantially differently, explain why. Is one method or the other more or less susceptible to the same underlying flaw?  Why or why not?
 
 
 
Deliverables:
 
* Descriptive text (1 or 2 pages including figures)
 
* 1 chart
 
  
 
==Hints==
 
==Hints==
Line 154: Line 127:
 
** If neither LONG or SHORT is triggered, the result should be DO NOTHING.
 
** If neither LONG or SHORT is triggered, the result should be DO NOTHING.
 
** For debugging purposes, you may find it helpful to plot the value of the rule-based output (-1, 0, 1) versus the stock price.
 
** For debugging purposes, you may find it helpful to plot the value of the rule-based output (-1, 0, 1) versus the stock price.
* Train a classification learner on in sample training data:
 
** For your Y values: Use future 21 day return (not future price).  Then classify that return as LONG, SHORT or DO NOTHING.  You're trying to predict a relative change that you can use to invest with.
 
** For debugging purposes, you may find it helpful to plot the value of the training classification data (-1, 0, 1) versus the stock price in one color.
 
** For debugging purposes, you may find it helpful to plot the value of the training classification output (-1, 0, 1) versus the stock price in another color.  Ideally, these two lines should be very similar.
 
  
 
'''Choosing Technical Features -- Your X Values'''
 
'''Choosing Technical Features -- Your X Values'''
Line 164: Line 133:
  
 
<PRE>
 
<PRE>
bb_value[t] = (price[t] - SMA[t])/(stdev[t])
+
bb_value[t] = (price[t] - SMA[t])/(2 * stdev[t])
 
</PRE>
 
</PRE>
  
Line 175: Line 144:
 
Volatility is just the stdev of daily returns.
 
Volatility is just the stdev of daily returns.
  
You still need to standardize the resulting values.
+
It is usually worthwhile to standardize the resulting values (see https://en.wikipedia.org/wiki/Standard_score).
  
'''Choosing Y'''
+
==Contents of Report==
  
Your code should classify based on 21 day change in price.  You need to build a new Y that reflects the 21 day change and aligns with the current dateHere's pseudo code for the calculation of Y
+
Describe each indicator you use in sufficient detail that someone else could reproduce it.  You should also provide a compelling description regarding why that indicator might work and how it could be usedYou should also provide one or more charts that convey how each indicator works in a compelling way. (up to 8 charts).
  
  ret = (price[t+21]/price[t]) - 1.0
+
For the best possible strategy, describe how you created it and any assumptions you had to make to make it work. Provide a chart that illustrates its performance versus the benchmark.
if ret > YBUY:
 
    Y[t] = +1 # LONG
 
else if ret < YSELL:
 
    Y[t] = -1 # SHORT
 
else:
 
    Y[t] = 0
 
  
If you select Y in this manner and use it for training, your learner will classify 21 day returns.
+
For your manual strategy, describe how you combined your indicators to create an overall signal.  How do you decide to enter and exit your positions and why?  Why do you believe (or not) that this is an effective strategy? Provide a chart.
  
==Template and Data==
+
Compare the performance of your manual strategy versus the benchmark for the in sample and out of sample time periods. Provide a chart.
 
 
There is no github template for this project.  You should create a directory for your code in ml4t/mc3-p3 and make a copy of util.py there.  You should also copy into that directory your learner code and your market simulator code. You will have access to the data in the ML4T/Data directory but you should use ONLY the code in util.py to read it.
 
 
 
==Contents of Report==
 
  
* Your report should be no more than 3000 words.  Your report should contain no more than 14 charts.  Penalties will apply if you violate these constraints.
+
Your report should be no more than 3000 words.  Your report should contain no more than 14 charts.  Penalties will apply if you violate these constraints.
* Include charts and text as identified in the sections above.
 
  
 
==Expectations==
 
==Expectations==
  
* In-sample AAPL backtests should perform very well -- The ML version should do better than the manual version.
+
* In-sample backtests should perform very well.
* Out-of-sample AAPL backtests should... (you should be able to complete this sentence).
+
* Out-of-sample backtests should... (you should be able to complete this sentence).
  
 
==What to turn in==
 
==What to turn in==
  
Turn your project in via t-square.   
+
Turn your project in via Canvas.   
  
 
* Your report as <tt>report.pdf</tt>
 
* Your report as <tt>report.pdf</tt>
Line 216: Line 174:
 
==Rubric==
 
==Rubric==
  
Start with 100%, deductions as follows:
+
Start with 100 points, deductions as follows:
  
Indicators (up to 20% potential deductions):
+
Neatness (up to 5 points deduction if not).
* Is each indicator described in sufficient detail that someone else could reproduce it? (-5% for each if not)
 
* Is there a chart for each indicator that properly illustrates its operation? (-5% for each if not)
 
* Is at least one indicator different from those provided by the instructor's code (i.e., another indicator that is not SMA, Bollinger Bands or RSI) (-10% if not)
 
* Does the submitted code <tt>indicators.py</tt> properly reflect the indicators provided in the report (-20% if not)
 
  
Best possible (up to 5% potential deductions):
+
Bonus for exceptionally well written reports (up to 2 points)
* Is the chart correct (dates and equity curve) (-5% for if not)
 
* Is the reported performance correct within 5% (-1% for each item if not)
 
  
Manual rule-based trader (up to 20% deductions):
+
Indicators (up to 20 points potential deductions):
* Is the trading strategy described with clarity and in sufficient detail that someone else could reproduce it? (-10%)
+
* Is there a compelling description why each indicator might work (-2 for each, up to a total of 6 off)
* Does the provided chart include:
+
* Is each indicator described in sufficient detail that someone else could reproduce it? (-5 points for each if not)
** Historic value of benchmark normalized to 1.0 with black line (-5% if not)
+
* Is there a chart for each indicator that properly illustrates its operation? (-5 points for each if not)
** Historic value of portfolio normalized to 1.0 with blue line (-10% if not)
+
* Is at least one indicator different from those provided by the instructor's code (i.e., another indicator that is not SMA, Bollinger Bands or RSI) (-10 points if not)
** Are the appropriate date ranges covered? (-5% if not)
+
* Does the submitted code <tt>indicators.py</tt> properly reflect the indicators provided in the report (-20 points if not)
** Are vertical lines included to indicate entries (-10% if not)
 
* Does the submitted code <tt>rule_based.py</tt> properly reflect the strategy provided in the report? (-20% if not)
 
* Does the manual trading system provide higher cumulative return than the benchmark over the in-sample time period? (-5% if not)
 
  
ML-based trader (up to 30% deductions):
+
Theoretically optimal (up to 20 points potential deductions):
* Is the ML strategy described with clarity and in sufficient detail that someone else could reproduce it? (-10%)
+
* Is the methodology described correct and convincing? (-10 points if not)
* Are modifications/tweaks to the basic decision tree learner fully described (-10%)
+
* Is the chart correct (dates and equity curve) (-10 points if not)
* Does the methodology utilize a classification-based learner? (-30%)
+
* Is the chart correct (dates and equity curve) (-10 points if not)
* Does the provided chart include:
+
* Historic value of benchmark normalized to 1.0 with blue line (-5 if not)
** Historic value of benchmark normalized to 1.0 with black line (-5% if not)
+
* Historic value of portfolio normalized to 1.0 with black line (-5 if not)
** Historic value of rule-based portfolio normalized to 1.0 with blue line (-5% if not)
+
* Are the reported performance criteria correct ? (-2 points for each item if not)
** Historic value of ML-based portfolio normalized to 1.0 with green line (-10% if not)
 
** Are the appropriate date ranges covered? (-5% if not)
 
** Are vertical lines included to indicate entry (-10% if not)
 
* Does the submitted code <tt>ML_based.py</tt> properly reflect the strategy provided in the report? (-30% if not)
 
* Does the ML trading system provide 1.5x higher cumulative return or than the benchmark over the in-sample time period? (-5% if not)
 
  
Data visualization (up to 15% deductions):
+
Manual rule-based trader (up to 50 points deductions):
* Is the X data reported in all three charts the same? (-5% if not)
+
* Is the trading strategy described with clarity and in sufficient detail that someone else could reproduce it? (-20)
* Is the X data standardized? (-5% if not)
+
* Does the provided chart(s) include:
* Is the Y data in the train and query plots similar (-5% if not)
+
** Historic value of benchmark normalized to 1.0 with blue line (-10 if not)
 +
** Historic value of portfolio normalized to 1.0 with black line (-10 if not)
 +
** Are the appropriate date ranges covered? (-10 if not)
 +
** Are vertical lines included to indicate entries (-10 if not)
 +
* Does the submitted code <tt>ManualStrategy.py</tt> properly reflect the strategy provided in the report? (-20 if not)
 +
* Does the submitted code and report reflect an understanding of the subject matter (up to -30 if not)
 +
* Does the manual trading system provide higher cumulative return than the benchmark over the in-sample time period? (-10 if not)
 +
* Did the student use the correct symbol? (-10 if not)
 +
* Did the student use the date periods? (-10 if not)
 +
* Does the strategy obey holding constraints (-5 if not)
  
Comparative analysis (up to 10% deductions):
+
Comparative analysis (up to 10 points deductions):
* Is the appropriate chart provided (-5% for each missing element, up to a maximum of -10%)
+
* Is the appropriate chart provided (-5 for each missing element, up to a maximum of -10)
* Is there a table that reports in-sample and out-of-sample data for the baseline (just the stock), rule-based, and ML-based strategies? (-5% for each missing element)
+
* Are differences between the in-sample and out-of-sample performances appropriately explained (-5)
* Are differences between the in-sample and out-of-sample performances appropriately explained (-5%)
+
* Does the submitted code and report reflect an understanding of the subject matter (up to -5 if not)
 +
* Is the required table present and correct (up to -5 if not)
  
 
==Required, Allowed & Prohibited==
 
==Required, Allowed & Prohibited==
Line 266: Line 220:
 
* Your project must be coded in Python 2.7.x.
 
* Your project must be coded in Python 2.7.x.
 
* Your code must run on one of the university-provided computers (e.g. buffet02.cc.gatech.edu).
 
* Your code must run on one of the university-provided computers (e.g. buffet02.cc.gatech.edu).
* Use only util.py to read data. If you want to read items other than adjusted close, modify util.py to do it, and submit your new version with your code.
+
* Use only util.py to read data.  
 +
* All charts must be generated in Python, and you must provide the code you used.
  
 
Allowed:
 
Allowed:
Line 272: Line 227:
 
* Your code may use standard Python libraries.
 
* Your code may use standard Python libraries.
 
* You may use the NumPy, SciPy, matplotlib and Pandas libraries.  Be sure you are using the correct versions.
 
* You may use the NumPy, SciPy, matplotlib and Pandas libraries.  Be sure you are using the correct versions.
* You may reuse sections of code (up to 5 lines) that you collected from other students or the internet.
 
 
* Code provided by the instructor, or allowed by the instructor to be shared.
 
* Code provided by the instructor, or allowed by the instructor to be shared.
 
* A herring.
 
* A herring.
  
 
Prohibited:
 
Prohibited:
 +
* Generating charts using a method other than Python.
 
* Any other method of reading data besides util.py
 
* Any other method of reading data besides util.py
 
* Any libraries not listed in the "allowed" section above.
 
* Any libraries not listed in the "allowed" section above.
* Any code you did not write yourself (except for the 5 line rule in the "allowed" section).
+
* Any code you did not write yourself.
  
 
==Legacy==
 
==Legacy==

Latest revision as of 23:21, 30 October 2018

Finalized

Updates / FAQs

  • Q: I want to read some other values from the data besides just adjusted close, how can I do that? A: Look carefully at util.py and you will see that you can query for other values.
  • Q: Are we only allowed one position at a time? A: You can be in one of three states: -1000 shares, +1000 shares, 0 shares.
  • Q: Are we required to trade in only 1000 share blocks? (and have no more than 1000 shares long or short at a time? A: You can trade up to 2000 shares at a time as long as you maintain the requirement of holding 1000, 0 or -1000 shares.
  • Q: Are we limited to leverage of 2.0 on the portfolio? A: There is no limit on leverage.

Overview

In this project you will develop a trading strategy using your intuition and Technical Analysis, and test it against a stock using your market simulator. In a later project, you will use your same indicators but with Machine Learning (instead of your intuition) to create a trading strategy. We hope Machine Learning will do better than your intuition, but who knows?

Template

There is no distributed template for this project. You should create a directory for your code in ml4t/manual_strategy and make a copy of util.py there. You will have access to the data in the ML4T/Data directory but you should use ONLY the code in util.py to read it.

You should create the following code files for submission. They should comprise ALL code from you that is necessary to run your evaluations.

  • indicators.py Your code that implements your indicators as functions that operate on dataframes. The "main" code in indicators.py should generate the charts that illustrate your indicators in the report.
  • marketsimcode.py An improved version of your marketsim code that accepts a "trades" data frame (instead of a file). More info on the trades data frame below. It is OK not to submit this file if you have subsumed its functionality into one of your other code files.
  • ManualStrategy.py Code implementing a ManualStrategy object (your manual strategy). It should implement testPolicy() which returns a trades data frame (see below). The main part of this code should call marketsimcode as necessary to generate the plots used in the report.
  • TheoreticallyOptimalStrategy.py Code implementing a TheoreticallyOptimalStrategy object (details below). It should implement testPolicy() which returns a trades data frame (see below). The main part of this code should call marketsimcode as necessary to generate the plots used in the report.

Note that we may not test your code, so we may not know if you didn't organize your code as recommended, but this arrangement will be required for later projects, so it is worthwhile getting it set up this way. The key requirement is that, if necessary, a TA should be able to run your code on a buffet machine and get the same results (e.g., statistics and charts) that we see in your report.

Data Details, Dates and Rules

  • Use only the data provided for this course. You are not allowed to import external data.
  • For your report, trade only the symbol JPM. This will enable us to more easily compare results.
  • You may use data from other symbols (such as SPY) to inform your strategy.
  • The in sample/development period is January 1, 2008 to December 31 2009.
  • The out of sample/testing period is January 1, 2010 to December 31 2011.
  • Starting cash is $100,000.
  • Allowable positions are: 1000 shares long, 1000 shares short, 0 shares.
  • Benchmark: The performance of a portfolio starting with $100,000 cash, investing in 1000 shares of JPM and holding that position.
  • There is no limit on leverage.
  • Transaction costs for ManualStrategy: Commission: $9.95, Impact: 0.005.
  • Transaction costs for TheoreticallyOptimalStrategy: Commission: $0.00, Impact: 0.00.

Part 1: Technical Indicators (20 points)

Develop and describe at least 3 and at most 5 technical indicators. You may find our lecture on time series processing to be helpful. For each indicator you should create a single, compelling chart that illustrates the indicator.

As an example, you might create a chart that shows the price history of the stock, along with "helper data" (such as upper and lower bollinger bands) and the value of the indicator itself. Another example: If you were using price/SMA as an indicator you would want to create a chart with 3 lines: Price, SMA, Price/SMA. In order to facilitate visualization of the indicator you might normalize the data to 1.0 at the start of the date range (i.e. divide price[t] by price[0]).

Your report description of each indicator should enable someone to reproduce it just by reading the description. We want a written description here, not code, however, it is OK to augment your written description with a pseudocode figure.

At least one of the indicators you use should be completely different from the ones presented in our lectures. (i.e. something other than SMA, Bollinger Bands, RSI).

Part 2: Theoretically Optimal Strategy (20 points)

Assume that you can see the future, but that you are constrained by the portfolio size and order limits as specified above. Create a set of trades that represents the best a strategy could possibly do during the in sample period. The reason we're having you do this is so that you will have an idea of an upper bound on performance.

The intent is for you to use adjusted close prices with the market simulator that you wrote earlier in the course. For this activity, use $0.00, and 0.0 for commissions and impact respectively.

Provide a chart that reports:

  • Benchmark (see definition above) normalized to 1.0 at the start: Blue line
  • Value of the theoretically optimal portfolio (normalized to 1.0 at the start): Black line

You should also report in text:

  • Cumulative return of the benchmark and portfolio
  • Stdev of daily returns of benchmark and portfolio
  • Mean of daily returns of benchmark and portfolio

Your code should implement testPolicy() as follows:

df_trades = tos.testPolicy(symbol = "AAPL", sd=dt.datetime(2010,1,1), ed=dt.datetime(2011,12,31), sv = 100000) 

The input parameters are:

  • symbol: the stock symbol to act on
  • sd: A datetime object that represents the start date
  • ed: A datetime object that represents the end date
  • sv: Start value of the portfolio

The output result is:

  • df_trades: A data frame whose values represent trades for each day. Legal values are +1000.0 indicating a BUY of 1000 shares, -1000.0 indicating a SELL of 1000 shares, and 0.0 indicating NOTHING. Values of +2000 and -2000 for trades are also legal so long as net holdings are constrained to -1000, 0, and 1000.

Part 3: Manual Rule-Based Trader (50 points)

In ManualStrategy.py implement a set of rules using the indicators you created in Part 1 above. Devise some simple logic using your indicators to enter and exit positions in the stock.

A recommended approach is to create a single logical expression that yields a -1, 0, or 1, corresponding to a "short," "out" or "long" position. Example usage this signal: If you are out of the stock, then a 1 would signal a BUY 1000 order. If you are long, a -1 would signal a SELL 2000 order. You don't have to follow this advice though, so long as you follow the trading rules outlined above.

For the report we want a written description, not code, however, it is OK to augment your written description with a pseudocode figure.

You should tweak your rules as best you can to get the best performance possible during the in sample period (do not peek at out of sample performance). Use your rule-based strategy to generate an orders dataframe over the in sample period, then run that dataframe through your market simulator to create a chart that includes the following components over the in sample period:

  • Benchmark (see definition above) normalized to 1.0 at the start: Blue line
  • Value of the rule-based portfolio (normalized to 1.0 at the start): Black line
  • Vertical green lines indicating LONG entry points.
  • Vertical red lines indicating SHORT entry points.

We expect that your rule-based strategy should outperform the benchmark over the in sample period.

Your code should implement the same API as above for theoretically optimal:

df_trades = ms.testPolicy(symbol = "AAPL", sd=dt.datetime(2010,1,1), ed=dt.datetime(2011,12,31), sv = 100000)

Part 4: Comparative Analysis (10 points)

Evaluate the performance of your strategy in the out of sample period. Note that you should not train or tweak your approach on this data. You should use the classification learned using the in sample data only. Create a chart that shows, out of sample:

  • Benchmark (see definition above) normalized to 1.0 at the start: Blue line
  • Performance of manual strategy: Black line
  • Both should be normalized to 1.0 at the start.

Create a table that summarizes the performance of the stock, and the manual strategy for both in sample and out of sample periods. Explain WHY these differences occur.

Hints

Overall, I recommend the following steps in the creation of your strategies:

  • Indicator design hints:
    • For your X values: Identify and implement at least 3 technical features that you believe may be predictive of future return.
  • Rule based design:
    • Use a cascade of if statements conditioned on the indicators to identify whether a LONG condition is met.
    • Use a cascade of if statements conditioned on the indicators to identify whether a SHORT condition is met.
    • The conditions for LONG and SHORT should be mutually exclusive.
    • If neither LONG or SHORT is triggered, the result should be DO NOTHING.
    • For debugging purposes, you may find it helpful to plot the value of the rule-based output (-1, 0, 1) versus the stock price.

Choosing Technical Features -- Your X Values

You should have already successfully coded the Bollinger Band feature:

bb_value[t] = (price[t] - SMA[t])/(2 * stdev[t])

Two other good features worth considering are momentum and volatility.

momentum[t] = (price[t]/price[t-N]) - 1

Volatility is just the stdev of daily returns.

It is usually worthwhile to standardize the resulting values (see https://en.wikipedia.org/wiki/Standard_score).

Contents of Report

Describe each indicator you use in sufficient detail that someone else could reproduce it. You should also provide a compelling description regarding why that indicator might work and how it could be used. You should also provide one or more charts that convey how each indicator works in a compelling way. (up to 8 charts).

For the best possible strategy, describe how you created it and any assumptions you had to make to make it work. Provide a chart that illustrates its performance versus the benchmark.

For your manual strategy, describe how you combined your indicators to create an overall signal. How do you decide to enter and exit your positions and why? Why do you believe (or not) that this is an effective strategy? Provide a chart.

Compare the performance of your manual strategy versus the benchmark for the in sample and out of sample time periods. Provide a chart.

Your report should be no more than 3000 words. Your report should contain no more than 14 charts. Penalties will apply if you violate these constraints.

Expectations

  • In-sample backtests should perform very well.
  • Out-of-sample backtests should... (you should be able to complete this sentence).

What to turn in

Turn your project in via Canvas.

  • Your report as report.pdf
  • All of your code, as necessary to run as .py files.
  • Document how to run your code in readme.txt.
  • No zip files please.

Rubric

Start with 100 points, deductions as follows:

Neatness (up to 5 points deduction if not).

Bonus for exceptionally well written reports (up to 2 points)

Indicators (up to 20 points potential deductions):

  • Is there a compelling description why each indicator might work (-2 for each, up to a total of 6 off)
  • Is each indicator described in sufficient detail that someone else could reproduce it? (-5 points for each if not)
  • Is there a chart for each indicator that properly illustrates its operation? (-5 points for each if not)
  • Is at least one indicator different from those provided by the instructor's code (i.e., another indicator that is not SMA, Bollinger Bands or RSI) (-10 points if not)
  • Does the submitted code indicators.py properly reflect the indicators provided in the report (-20 points if not)

Theoretically optimal (up to 20 points potential deductions):

  • Is the methodology described correct and convincing? (-10 points if not)
  • Is the chart correct (dates and equity curve) (-10 points if not)
  • Is the chart correct (dates and equity curve) (-10 points if not)
  • Historic value of benchmark normalized to 1.0 with blue line (-5 if not)
  • Historic value of portfolio normalized to 1.0 with black line (-5 if not)
  • Are the reported performance criteria correct ? (-2 points for each item if not)

Manual rule-based trader (up to 50 points deductions):

  • Is the trading strategy described with clarity and in sufficient detail that someone else could reproduce it? (-20)
  • Does the provided chart(s) include:
    • Historic value of benchmark normalized to 1.0 with blue line (-10 if not)
    • Historic value of portfolio normalized to 1.0 with black line (-10 if not)
    • Are the appropriate date ranges covered? (-10 if not)
    • Are vertical lines included to indicate entries (-10 if not)
  • Does the submitted code ManualStrategy.py properly reflect the strategy provided in the report? (-20 if not)
  • Does the submitted code and report reflect an understanding of the subject matter (up to -30 if not)
  • Does the manual trading system provide higher cumulative return than the benchmark over the in-sample time period? (-10 if not)
  • Did the student use the correct symbol? (-10 if not)
  • Did the student use the date periods? (-10 if not)
  • Does the strategy obey holding constraints (-5 if not)

Comparative analysis (up to 10 points deductions):

  • Is the appropriate chart provided (-5 for each missing element, up to a maximum of -10)
  • Are differences between the in-sample and out-of-sample performances appropriately explained (-5)
  • Does the submitted code and report reflect an understanding of the subject matter (up to -5 if not)
  • Is the required table present and correct (up to -5 if not)

Required, Allowed & Prohibited

Required:

  • Your project must be coded in Python 2.7.x.
  • Your code must run on one of the university-provided computers (e.g. buffet02.cc.gatech.edu).
  • Use only util.py to read data.
  • All charts must be generated in Python, and you must provide the code you used.

Allowed:

  • You can develop your code on your personal machine, but it must also run successfully on one of the university provided machines or virtual images.
  • Your code may use standard Python libraries.
  • You may use the NumPy, SciPy, matplotlib and Pandas libraries. Be sure you are using the correct versions.
  • Code provided by the instructor, or allowed by the instructor to be shared.
  • A herring.

Prohibited:

  • Generating charts using a method other than Python.
  • Any other method of reading data besides util.py
  • Any libraries not listed in the "allowed" section above.
  • Any code you did not write yourself.

Legacy