Difference between revisions of "Summer 2016 Project 5"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
Line 25: Line 25:
  
 
==Template and Data==
 
==Template and Data==
 +
 +
* Download <tt>'''[[Media:mc3_p3.zip|mc3_p3.zip]]'''</tt>, unzip inside <tt>ml4t/</tt>
 +
* Implement the <tt>QLearner</tt> class in <tt>mc3_p3/QLearner.py</tt>.
 +
* Implement the <tt>StrategyLearner</tt> class in <tt>mc3_p3/StrategyLearner.py</tt>
 +
* To test your Q-learner, run <tt>'''python testqlearner.py'''</tt> from the <tt>mc3_p3/</tt> directory.
 +
* To test your strategy learner, run <tt>'''python teststrategylearner.py'''</tt> from the <tt>mc3_p3/</tt> directory.
 +
* Note that example problems are provided in the <tt>mc3_p3/testworlds</tt> directory
  
 
==Part 1: Implement QLearner==
 
==Part 1: Implement QLearner==

Revision as of 19:34, 19 July 2016

Overview

In this project you will implement the Q-Learning and Dyna-Q solutions to the reinforcement learning problem. You will apply them to two problems: 1) Navigation, and 2) Trading. The reason for working with the navigation problem first is that, as you will see, navigation is an easy problem to work with and understand. In the last part of the assignment you will apply Q-Learning to stock trading.

Note that your Q-Learning code really shouldn't care which problem it is solving. The difference is that you need to wrap the learner in different code that frames the problem for the learner as necessary.

For the navigation problem we have created testqlearner.py that automates testing of your Q-Learner in the navigation problem. We also provide teststrategylearner.py to test your strategy learner. In order to apply Q-learning to trading you will have to implement an API that calls Q-learning internally.

Overall, your tasks for this project include:

  • Code a Q-Learner
  • Code the Dyna-Q feature of Q-Learning
  • Test/debug the Q-Learner in navigation problems
  • Build a strategy learner based on your Q-Learner
  • Test/debug the strategy learner on specific symbol/time period problems

Scoring for the project will be allocated as follows:

  • Navigation test cases: 80% (note that we will check those with dyna = 0)
  • Dyna implemented: 5% (we will check this with one navigation test case by comparing performance with and without dyna turned on)
  • Trading strategy test cases: 20%

For this assignment we will test only your code (there is no report component). Note that the scoring is structured so that you can earn a B (80%) if you implement only Q-Learning, but if you implement everything, the total possible score is 105%. That means you can earn up to 5% extra credit on this project ( == 1% extra credit on the final course grade).

Template and Data

  • Download mc3_p3.zip, unzip inside ml4t/
  • Implement the QLearner class in mc3_p3/QLearner.py.
  • Implement the StrategyLearner class in mc3_p3/StrategyLearner.py
  • To test your Q-learner, run python testqlearner.py from the mc3_p3/ directory.
  • To test your strategy learner, run python teststrategylearner.py from the mc3_p3/ directory.
  • Note that example problems are provided in the mc3_p3/testworlds directory

Part 1: Implement QLearner

Part 2: Navigation Problem Test Cases

Part 3: Implement Dyna

Part 4: Implement Strategy Learner

Contents of Report

Hints & Resources

What to turn in

Rubric

Required, Allowed, & Prohibited