Difference between revisions of "Summer 2016 Project 5"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
(Added more content)
Line 33: Line 33:
 
* To test your strategy learner, run <tt>'''python teststrategylearner.py'''</tt> from the <tt>mc3_p3/</tt> directory.
 
* To test your strategy learner, run <tt>'''python teststrategylearner.py'''</tt> from the <tt>mc3_p3/</tt> directory.
 
* Note that example problems are provided in the <tt>mc3_p3/testworlds</tt> directory
 
* Note that example problems are provided in the <tt>mc3_p3/testworlds</tt> directory
 +
 +
 +
==Part 1: Implement QLearner==
 +
 +
Your QLearner class should be implemented in the file <tt>QLearner.py</tt>.  It should implement EXACTLY the API defined below.  DO NOT import any modules besides those allowed below.  Your class should implement the following methods:
 +
 +
* QLearner(...): Constructor, see argument details below.
 +
* query(s_prime, r): Update Q-table with <s, a, s_prime, r> and return new action for state s_prime, update rar.
 +
* querysetstate(s): Set state to s, return action for state s, but don't update Q-table or rar.
 +
 +
Here's an example of the API in use:

Revision as of 20:16, 19 July 2016

Overview

In this project you will implement the Q-Learning and Dyna-Q solutions to the reinforcement learning problem. You will apply them to two problems: 1) Navigation, and 2) Trading. The reason for working with the navigation problem first is that, as you will see, navigation is an easy problem to work with and understand. In the last part of the assignment you will apply Q-Learning to stock trading.

Note that your Q-Learning code really shouldn't care which problem it is solving. The difference is that you need to wrap the learner in different code that frames the problem for the learner as necessary.

For the navigation problem we have created testqlearner.py that automates testing of your Q-Learner in the navigation problem. We also provide teststrategylearner.py to test your strategy learner. In order to apply Q-learning to trading you will have to implement an API that calls Q-learning internally.

Overall, your tasks for this project include:

  • Code a Q-Learner
  • Code the Dyna-Q feature of Q-Learning
  • Test/debug the Q-Learner in navigation problems
  • Build a strategy learner based on your Q-Learner
  • Test/debug the strategy learner on specific symbol/time period problems

Scoring for the project will be allocated as follows:

  • Navigation test cases: 80% (note that we will check those with dyna = 0)
  • Dyna implemented: 5% (we will check this with one navigation test case by comparing performance with and without dyna turned on)
  • Trading strategy test cases: 20%

For this assignment we will test only your code (there is no report component). Note that the scoring is structured so that you can earn a B (80%) if you implement only Q-Learning, but if you implement everything, the total possible score is 105%. That means you can earn up to 5% extra credit on this project ( == 1% extra credit on the final course grade).


Template and Data

  • Download mc3_p3.zip, unzip inside ml4t/
  • Implement the QLearner class in mc3_p3/QLearner.py.
  • Implement the StrategyLearner class in mc3_p3/StrategyLearner.py
  • To test your Q-learner, run python testqlearner.py from the mc3_p3/ directory.
  • To test your strategy learner, run python teststrategylearner.py from the mc3_p3/ directory.
  • Note that example problems are provided in the mc3_p3/testworlds directory


Part 1: Implement QLearner

Your QLearner class should be implemented in the file QLearner.py. It should implement EXACTLY the API defined below. DO NOT import any modules besides those allowed below. Your class should implement the following methods:

  • QLearner(...): Constructor, see argument details below.
  • query(s_prime, r): Update Q-table with <s, a, s_prime, r> and return new action for state s_prime, update rar.
  • querysetstate(s): Set state to s, return action for state s, but don't update Q-table or rar.

Here's an example of the API in use: