Summer 2016 Project 5

From Quantitative Analysis Software Courses
Revision as of 20:23, 19 July 2016 by Dave (talk | contribs)
Jump to navigation Jump to search

Overview

In this project you will implement the Q-Learning and Dyna-Q solutions to the reinforcement learning problem. You will apply them to two problems: 1) Navigation, and 2) Trading. The reason for working with the navigation problem first is that, as you will see, navigation is an easy problem to work with and understand. In the last part of the assignment you will apply Q-Learning to stock trading.

Note that your Q-Learning code really shouldn't care which problem it is solving. The difference is that you need to wrap the learner in different code that frames the problem for the learner as necessary.

For the navigation problem we have created testqlearner.py that automates testing of your Q-Learner in the navigation problem. We also provide teststrategylearner.py to test your strategy learner. In order to apply Q-learning to trading you will have to implement an API that calls Q-learning internally.

Overall, your tasks for this project include:

  • Code a Q-Learner
  • Code the Dyna-Q feature of Q-Learning
  • Test/debug the Q-Learner in navigation problems
  • Build a strategy learner based on your Q-Learner
  • Test/debug the strategy learner on specific symbol/time period problems

Scoring for the project will be allocated as follows:

  • Navigation test cases: 80% (note that we will check those with dyna = 0)
  • Dyna implemented: 5% (we will check this with one navigation test case by comparing performance with and without dyna turned on)
  • Trading strategy test cases: 20%

For this assignment we will test only your code (there is no report component). Note that the scoring is structured so that you can earn a B (80%) if you implement only Q-Learning, but if you implement everything, the total possible score is 105%. That means you can earn up to 5% extra credit on this project ( == 1% extra credit on the final course grade).

Template and Data

  • Download mc3_p3.zip, unzip inside ml4t/
  • Implement the QLearner class in mc3_p3/QLearner.py.
  • Implement the StrategyLearner class in mc3_p3/StrategyLearner.py
  • To test your Q-learner, run python testqlearner.py from the mc3_p3/ directory.
  • To test your strategy learner, run python teststrategylearner.py from the mc3_p3/ directory.
  • Note that example problems are provided in the mc3_p3/testworlds directory

Part 1: Implement QLearner

Your QLearner class should be implemented in the file QLearner.py. It should implement EXACTLY the API defined below. DO NOT import any modules besides those allowed below. Your class should implement the following methods:

  • QLearner(...): Constructor, see argument details below.
  • query(s_prime, r): Update Q-table with <s, a, s_prime, r> and return new action for state s_prime, update rar.
  • querysetstate(s): Set state to s, return action for state s, but don't update Q-table or rar.

Here's an example of the API in use:

Prohibited:

  • Any libraries not listed in the "allowed" section above.
  • Any code you did not write yourself
  • Any Classes (other than Random) that create their own instance variables for later use (e.g., learners like kdtree).
  • Print statements outside "verbose" checks (they significantly slow down auto grading).
  • Any method for reading data besides util.py