Q Trader Hints - Revision history

Tucker at 14:58, 22 November 2017

2017-11-22T14:58:22Z

Tucker: /* Overview */

2017-11-21T21:16:28Z

Overview

Dave: /* Implement Strategy Learner */

2017-07-18T17:02:58Z

Implement Strategy Learner

Tucker: Created page with "==Overview== In this project you will apply the Q-Learner you developed earlier to the trading problem. It is not required, but we recommend that you reuse the indicators th..."

2017-07-18T01:38:45Z

Created page with "==Overview== In this project you will apply the Q-Learner you developed earlier to the trading problem. It is not required, but we recommend that you reuse the indicators th..."

New page

==Overview==

In this project you will apply the Q-Learner you developed earlier to the trading problem. It is not required, but we recommend that you reuse the indicators that you developed in the previous project for this one. Note that there is no regression or classification learning in this project (so no use of RTLearner or LinRegLearner). The indicators define most of the "state" for your learner, the additional component of state is whether or not you are currently holding a position long or short. The actions are BUY, NOTHING, SELL.

Overall, your tasks for this project include:

* Build a strategy learner based on your Q-Learner and previously developed indicators.
* Test/debug the strategy learner on specific symbol/time period problems

Scoring for the project will be based on trading strategy test cases.

==Implement Strategy Learner==

For this part of the project you should develop a learner that can learn a trading policy using your Q-Learner. You should be able to use your Q-Learner from the earlier project directly, with no changes. You will need to write code in <tt>StrategyLearner.py</tt> to "wrap" your Q-Learner appropriately to frame the trading problem for it. Utilize the template provided in <tt>StrategyLearner.py</tt> Overall the structure of your strategy learner should be arranged as below. Note that this is a suggestion, not a requirement:

For the policy learning part:
* Select several technical features, and compute their values for the training data
* Discretize the values of the features
* Instantiate a Q-learner
* For each day in the training data:
** Compute the current state (including holding)
** Compute the reward for the last action
** Query the learner with the current state and reward to get an action
** Implement the action the learner returned (BUY, SELL, NOTHING), and update portfolio value
* Repeat the above loop multiple times until cumulative return stops improving.

A rule to keep in mind: As in past projects, you can only be long or short 200 shares, so if your learner returns two BUYs in a row, don't double down, same thing with SELLs.

For the policy testing part:
* For each day in the testing data:
** Compute the current state
** Query the learner with the current state to get an action
** Implement the action the learner returned (BUY, SELL, NOTHING), and update portfolio value
* Return the resulting trades in a data frame (details below).

We expect the following outcomes in evaluating your system:
* For ML4T-220, the trained policy should provide a cumulative return greater than 100% in sample (20 points)
* For ML4T-220, the trained policy should provide a cumulative return greater than 100% out of sample (20 points)
* For AAPL, the trained policy should significantly outperform the benchmark in sample (20 points)
* For SINE_FAST_NOISE, the trained policy should provide a cumulative return greater than 200% in sample (20 points)
* For UNH, the trained policy should significantly outperform the benchmark in sample (20 points)
* Additional test in which we train your learner with one data set, then test it out of sample with another data set. The out of sample performance should be worse than in sample. Not counted for now.

Training and testing for each situation should run in less than 30 seconds. We reserve the right to use different time periods if necessary to reduce auto grading time.

==Legacy==

*[[MC3-Project-4-Legacy-Q-trader]]
*[[MC3-Project-2-Legacy-trader]]
*[[MC3-Project-2-Legacy]]

@@ Line 1: / Line 1: @@
 ==Overview==
-In this project you will apply the Q-Learner you developed earlier to the trading problem.  It is not required, but we recommend that you reuse the indicators that you developed in the previous project for this one.  Note that there is no regression or classification learning in this project (so no use of RTLearner or LinRegLearner). The indicators define most of the "state" for your learner, the additional component of state is whether or not you are currently holding a position long or short. The recommended actions are LONG, CASH, SHORT.
+In this project you will apply the Q-Learner you developed earlier to the trading problem.   Note that there is no regression or classification learning in this project (so no use of RTLearner or LinRegLearner). The indicators define most of the "state" for your learner, an additional component of state you may use is whether or not you are currently holding a position long or short. The recommended actions are LONG, CASH, SHORT.
 Overall, your tasks for this project include:
@@ Line 7: / Line 7: @@
 * Build a strategy learner based on your Q-Learner and previously developed indicators.
 * Test/debug the strategy learner on specific symbol/time period problems
 ==Implement Strategy Learner==
@@ Line 22: / Line 20: @@
 ** Compute the reward for the last action
 ** Query the learner with the current state and reward to get an action
-** Implement the action the learner returned (BUY, SELL, NOTHING), and update portfolio value
+** Implement the action the learner returned (LONG, CASH, SHORT), and update portfolio value
 * Repeat the above loop multiple times until cumulative return stops improving.
 For the policy testing part:
@@ Line 31: / Line 27: @@
 ** Compute the current state
 ** Query the learner with the current state to get an action
-** Implement the action the learner returned (BUY, SELL, NOTHING), and update portfolio value
+** Implement the action the learner returned (LONG, CASH, SHORT), and update portfolio value
-* Return the resulting trades in a data frame (details below).
+** DO NOT UPDATE Q -- learning must be turned off in this phase
 Training and testing for each situation should run in less than 30 seconds.  We reserve the right to use different time periods if necessary to reduce auto grading time.

@@ Line 35: / Line 35: @@
 We expect the following outcomes in evaluating your system:
-* For ML4T-220, the trained policy should provide a cumulative return greater than 100% in sample (20 points)
+* For ML4T-220, the trained policy should provide a cumulative return greater than 100% in sample
-* For ML4T-220, the trained policy should provide a cumulative return greater than 100% out of sample (20 points)
+* For ML4T-220, the trained policy should provide a cumulative return greater than 100% out of sample
-* For AAPL, the trained policy should significantly outperform the benchmark in sample (20 points)
+* For AAPL, the trained policy should significantly outperform the benchmark in sample
-* For SINE_FAST_NOISE, the trained policy should provide a cumulative return greater than 200% in sample (20 points)
+* For SINE_FAST_NOISE, the trained policy should provide a cumulative return greater than 200% in sample
-* For UNH, the trained policy should significantly outperform the benchmark in sample (20 points)
+* For UNH, the trained policy should significantly outperform the benchmark in sample
-* Additional test in which we train your learner with one data set, then test it out of sample with another data set.  The out of sample performance should be worse than in sample. Not counted for now.
+* Additional test in which we train your learner with one data set, then test it out of sample with another data set.  The out of sample performance should be worse than in sample.
 Training and testing for each situation should run in less than 30 seconds.  We reserve the right to use different time periods if necessary to reduce auto grading time.