Difference between revisions of "MC3-Project-3"
Line 116: | Line 116: | ||
==Hints & resources== | ==Hints & resources== | ||
+ | |||
+ | This paper by Kaelbling, Littman and Moore, is a good resource for RL in general: http://www.jair.org/media/301/live-301-1562-jair.pdf See Section 4.2 for details on Q-Learning. | ||
+ | |||
+ | There is also a chapter in the Mitchell book on Q-Learning. | ||
+ | |||
+ | For implementing Dyna, you may find the following resources useful: | ||
+ | |||
+ | * https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node96.html | ||
+ | * http://www-anw.cs.umass.edu/~barto/courses/cs687/Chapter%209.pdf | ||
==What to turn in== | ==What to turn in== |
Revision as of 01:19, 28 November 2015
Contents
- 1 Draft
- 2 Updates / FAQs
- 3 Overview
- 4 Template and Data
- 5 Part 1: Implement QLearner (90%)
- 6 The Navigation Problem
- 7 Contents of Report
- 8 Hints & resources
- 9 What to turn in
- 10 Extra credit up to 3%
- 11 Rubric
- 12 Required, Allowed & Prohibited
- 13 Updates / FAQs
- 14 Overview
- 15 Template and Data
- 16 Part 1: Implement KNNLearner (30%)
- 17 Part 2: Implement BagLearner (20%)
- 18 Part 3: Experiments and report (50%)
- 19 Hints & resources
- 20 What to turn in
- 21 Extra credit up to 3%
- 22 Rubric
- 23 Required, Allowed & Prohibited
Draft
This is a draft version of the assignment. This "draft" statement will be removed once the assignment definition is final.
Updates / FAQs
Overview
In this project you will implement and assess Q-Learning. Because of the limited time available for this project, we're going to have you first test your Q-Learning implementation to solve a navigation problem. Applying Q-Learning to stock trading is offered as an extra credit assignment. Note that your Q-Learning code really shouldn't care which problem it is solving, but in order to apply it to trading, you will have to re-work the testqlearner.py code.
Template and Data
- Download mc3_p3.zip, unzip inside ml4t/
- Implement the QLearner class in mc3_p3/QLearner.py.
- To test your learner, run python testqlearner.py from the mc3_p3/ directory.
- Note that example problems are provided in the mc3_p3/testworlds directory
Part 1: Implement QLearner (90%)
Your QLearner class should be implemented in the file QLearner.py. It should implement EXACTLY the API defined below. DO NOT import any modules besides those allowed below. Your class should implement the following methods:
- QLearner(...): Constructor, see argument details below.
- query(s_prime, r): Update Q-table with <s, a, s_prime, r> and return new action for state s.
- querysetstate(s): Set state to s, return action for state s, but don't update Q-table.
Here's an example of the API in use:
import QLearner as ql learner = ql.QLearner(num_states = 100, \ num_actions = 4, \ alpha = 0.2, \ gamma = 0.9, \ rar = 0.5, \ radr = 0.99, \ dyna = 0) s = 99 # our initial state a = learner.querysetstate(s) # action for state s s_prime = 5 # the new state we end up in after taking action a in state s r = 0 # reward for taking action a in state s next_action = learner.query(s_prime, r)
The constructor QLearner() should reserve space for keeping track of Q[s, a] for the number of states and actions. It should initialize Q[] with uniform random values between -1.0 and 1.0. Details on the input arguments to the constructor:
- num_states integer, the number of states to consider
- num_actions integer, the number of actions available.
- alpha float, the learning rate used in the update rule. Should range between 0.0 and 1.0 with 0.2 as a typical value.
- gamma float, the discount rate used in the update rule. Should range between 0.0 and 1.0 with 0.9 as a typical value.
- rar float, random action rate: the probability of selecting a random action at each step. Should range between 0.0 (no random actions) to 1.0 (always random action) with 0.5 as a typical value.
- radr float, random action decay rate, after each update, rar = rar * radr. Ranges between 0.0 (immediate decay to 0) and 1.0 (no decay). Typically 0.99.
- dyna integer, conduct this number of dyna updates for each regular update. When Dyna is used, 200 is a typical value.
query(s_prime, r) is the core method of the Q-Learner. It should keep track of the last state s and the last action a, then use the new information s_prime and r to update the Q table. The learning instance, or experience tuple is <s, a, s_prime, r>. query() should return an integer, which is the next action to take. Note that it should choose a random action with probability rar, and that it should update rar according to the decay rate radr at each step.
querysetstate(s) A special version of the query method that finds an action according to the same rules as query(), but it does not execute an update to the Q-table. This method is typically only used once, to set the initial state.
We will test your Q-Learner with a navigation problem as follows. Note that your Q-Learner does not need to be coded specially for this task. In fact the code doesn't need to know anything about it. The code necessary to test your learner with this navigation task is implemented in testqlearner.py for you. The navigation task takes place in a 10 x 10 grid world. The particular environment is expressed in a CSV file of integers, where the value in each position is interpreted as follows:
- 0: blank space.
- 1: an obstacle.
- 2: the starting location for the robot.
- 3: the goal location.
An example navigation problem (CSV file) is shown below:
0,0,0,0,3,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0 0,0,1,1,1,1,1,0,0,0 0,0,1,0,0,0,1,0,0,0 0,0,1,0,0,0,1,0,0,0 0,0,1,0,0,0,1,0,0,0 0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0 0,0,0,0,2,0,0,0,0,0
The robot starts at the bottom center, and must navigate to the top center. Note that a wall of obstacles blocks its path. We map this problem to a reinforcement learning problem as follows:
- State: The state is the location of the robot, it is computed (discretized) as: horizontal location * 10 + vertical location.
- Actions: There are 4 possible actions, 0: move north, 1: move east, 2: move south, 3: move west.
- R: The reward is -1.0 unless the action leads to the goal, in which case the reward is +1.0.
- T: The transition matrix can be inferred from the CSV map and the actions.
Note that R and T are not known by or available to the learner. The testing code testqlearner.py will test your code as follows (pseudo code):
Instantiate the learner with the constructor QLearner() s = initial_location a = querysetstate(s) s_prime = new location according to action a r = -1.0 while not converged: a = query(s_prime, r) s_prime = new location according to action a if s_prime == goal: r = +1 s_prime = start location else r = -1
A few things to note about this code: The learner always receives a reward of -1.0 until it reaches the goal, when it receives a reward of +1.0. As soon as the robot reaches the goal, it is immediately returned to the starting location.
Contents of Report
Hints & resources
This paper by Kaelbling, Littman and Moore, is a good resource for RL in general: http://www.jair.org/media/301/live-301-1562-jair.pdf See Section 4.2 for details on Q-Learning.
There is also a chapter in the Mitchell book on Q-Learning.
For implementing Dyna, you may find the following resources useful:
- https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node96.html
- http://www-anw.cs.umass.edu/~barto/courses/cs687/Chapter%209.pdf
What to turn in
Turn your project in via t-square.
- Your report as report.pdf
- Your code as code.py
Extra credit up to 3%
Rubric
Required, Allowed & Prohibited
Required:
- Your project must be coded in Python 2.7.x.
- Your code must run on one of the university-provided computers (e.g. buffet02.cc.gatech.edu), or on one of the provided virtual images.
- Your code must run in less than 30 seconds on one of the university-provided computers.
Allowed:
- You can develop your code on your personal machine, but it must also run successfully on one of the university provided machines or virtual images.
- Your code may use standard Python libraries.
- You may use the NumPy, SciPy and Pandas libraries. Be sure you are using the correct versions.
- You may reuse sections of code (up to 5 lines) that you collected from other students or the internet.
- Code provided by the instructor, or allowed by the instructor to be shared.
Prohibited:
- Any libraries not listed in the "allowed" section above.
- Any code you did not write yourself (except for the 5 line rule in the "allowed" section).
- Any Classes (other than Random) that create their own instance variables for later use (e.g., learners like kdtree).
- Print statements (they significantly slow down auto grading).
Updates / FAQs
Q: Can I use an ML library or do I have to write the code myself? A: You must write the KNN and bagging code yourself. For the LinRegLearner you are allowed to make use of NumPy or SciPy libraries but you must "wrap" the library code to implement the APIs defined below. Do not uses other libraries or your code will fail the auto grading test cases.
2015-11-07
Draft version posted.
2015-11-10
- Q: Which libraries am I allowed to use? Which library calls are prohibited?
- A: The general idea is that the use of classes that create and maintain their own data structures are prohibited. So for instance, use of scipy.spatial.KDTree is not allowed because it builds a tree and keeps that data structure around for reference later. The intent for this project is that YOU should be building and maintaining the data structures necessary. You can, however, use most methods that return immediate results and do not retain data structures
- Examples of things that are allowed: sqrt(), sort(), argsort() -- note that these methods return an immediate value and do not retain data structures for later use.
- Examples of things that are prohibited: any scikit add on library, scipy.spatial.KDTree, importing things from libraries other than pandas, numpy or scipy.
2015-11-12
Clarification regarding dataset generation: Your strategy for defeating KNNLearner and LinRegLearner should not depend on they way you select training data versus testing data. The relationship of one learner performing better than another should persist regardless of which 60% of the data is selected for training and which 40% is selected for testing.
Overview
You are to implement and evaluate three learning algorithms as Python classes: A KNN learner, a Linear Regression learner (provided) and a Bootstrap Aggregating learner. The classes should be named KNNLearner, LinRegLearner, and BagLearner respectively. We are considering this a regression problem (not classification). So the goal is to return a continuous numerical result (not a discrete result).
In this project we are training & testing with static spatial data. In the next project we will make the transition to time series data.
You must write your own code for KNN and bagging. You are NOT allowed to use other peoples' code to implement KNN or bagging.
The project has two main components: The code for your learners, which will be auto graded, and your report, report.pdf that should include the components listed below.
Template and Data
Instructions:
- Download mc3_p1.zip, unzip inside ml4t/
You will find these files in the mc3_p1 directory
- Data/: Contains data for you to test your learning code on.
- LinRegLearner.py: An implementation of the LinRegLearner class. You can use it as a template for implementing your learner classes.
- __init__.py: Tells Python that you can import classes while in this directory.
- testlearner.py: Helper code to test a learner class.
In the Data/ directory there are three files:
- 3_groups.csv
- ripple_.csv
- simple.csv
We will mainly be working with ripple and 3_groups. Each data file contains 3 columns: X1, X2, and Y. In most cases you should use the first 60% of the data for training, and the remaining 40% for testing.
Part 1: Implement KNNLearner (30%)
Your KNNLearner class should be implemented in the file KNNLearner.py. It should implement EXACTLY the API defined below. DO NOT import any modules besides those from numpy, scipy, or the basic Python libraries. You should implement the following functions/methods:
import KNNLearner as knn learner = knn.KNNLearner(k = 3) # constructor learner.addEvidence(Xtrain, Ytrain) # training step Y = learner.query(Xtest) # query
Where "k" is the number of nearest neighbors to find. Xtrain and Xtest should be ndarrays (numpy objects) where each row represents an X1, X2, X3... XN set of feature values. The columns are the features and the rows are the individual example instances. Y and Ytrain are single dimension ndarrays that indicate the value we are attempting to predict with X.
Use Euclidean distance. Take the mean of the closest k points' Y values to make your prediction. If there are multiple equidistant points on the boundary of being selected or not selected, you may use whatever method you like to choose among them.
Part 2: Implement BagLearner (20%)
Implement Bootstrap Aggregating as a Python class named BagLearner. Your BagLearner class should be implemented in the file BagLearner.py. It should implement EXACTLY the API defined below. DO NOT import any modules besides those from numpy, scipy, or the basic Python libraries. You should implement the following functions/methods:
import BagLearner as bl learner = bl.BagLearner(learner = knn.KNNLearner, kwargs = {"k":3}, bags = 20, boost = False) learner.addEvidence(Xtrain, Ytrain) Y = learner.query(Xtest)
Where learner is the learning class to use with bagging. kwargs are keyword arguments to be passed on to the learner's constructor and they vary according to the learner (see hints below). "bags" is the number of learners you should train using Bootstrap Aggregation. If boost is true, then you should implement boosting.
Notes: See hints section below for example code you might use to instantiate your learners. Boosting is an extra credit topic and not required. There's a citation below in the Resources section that outlines a method of implementing bagging. If the training set contains n data items, each bag should contain n items as well. Note that because you should sample with replacement, some of the data items will be repeated.
Part 3: Experiments and report (50%)
Create a report that addresses the following issues/questions. The report should be submitted as report.pdf in PDF format. Do not submit word docs or latex files. Include data as tables or charts to support each your answers. I expect that this report will be 4 to 10 pages.
- Create your own dataset generating code (call it best4linreg.py) that creates data that performs significantly better with LinRegLearner than KNNLearner. Explain your data generating algorithm, and explain why LinRegLearner performs better. Your data should include at least 2 dimensions in X, and at least 1000 points. (Don't use bagging for this section).
- Create your own dataset generating code (call it best4KNN.py) that creates data that performs significantly better with KNNLearner than LinRegLearner. Explain your data generating algorithm, and explain why KNNLearner performs better. Your data should include at least 2 dimensions in X, and at least 1000 points. (Don't use bagging for this section).
- Consider the dataset ripple with KNN. For which values of K does overfitting occur? (Don't use bagging).
- Now use bagging in conjunction with KNN with the ripple dataset. How does performance vary as you increase the number of bags? Does overfitting occur with respect to the number of bags?
- Can bagging reduce or eliminate overfitting with respect to K for the ripple dataset?
Hints & resources
Some external resources that might be useful for this project:
- You may be interested to take a look at Andew Moore's slides on instance based learning.
- A definition of correlation which we'll use to assess the quality of the learning.
- Bootstrap Aggregating
- AdaBoost
- numpy corrcoef
- numpy argsort
- RMS error
You can use code like the below to instantiate several learners with the parameters listed in kwargs:
learners = [] kwargs = {"k":10} for i in range(0,bags): learners.append(learner(**kwargs))
What to turn in
Be sure to follow these instructions diligently!
Via T-Square, submit as attachment (no zip files; refer to schedule for deadline):
- Your code as KNNLearner.py, BagLearner.py, best4linreg.py, best4KNN.py
- Your report as report.pdf
DO NOT submit extra credit work as part of this submission. Submit it separately to the "Extra credit" assignment on t-square.
Unlimited resubmissions are allowed up to the deadline for the project.
Extra credit up to 3%
Implement boosting as part of BagLearner. How does boosting affect performance for ripple and 3_groups data?
Does overfitting occur for either of these datasets as the number of bags with boosting increases?
Create your own dataset for which overfitting occurs as the number of bags with boosting increases.
Submit your report report.pdf that focuses just on your extra credit work to the "extra credit" assignment on t-square.
Rubric
- KNNLearner, auto grade 10 test cases (including ripple.csv and 3_groups.csv), 3 points each: 30 points
- BagLearner, auto grade 10 test cases (including ripple.csv and 3_groups.csv), 2 points each: 20 points
- best4linreg.py (15 points)
- Code submitted (OK if not Python): -5 if absent
- Description complete -- Sufficient that someone else could implement it: -5 if not
- Description compelling -- The reasoning that linreg should do better is understandable and makes sense. Graph of the data helps but is not required if the description is otherwise compelling: -5 if not
- Train and test data drawn from same distribution: -5 if not
- Performance demonstrates that linreg does better: -10 if not
- best4KNN.py (15 points)
- Code submitted (OK if not Python): -5 if absent
- Description complete -- Sufficient that someone else could implement it: -5 if not
- Description compelling -- The reasoning that linreg should do better is understandable and makes sense. Graph of the data helps but is not required if the description is otherwise compelling: -5 if not
- Train and test data drawn from same distribution: -5 if not
- Performance demonstrates that linreg does better: -10 if not
- Overfitting (10 points)
- Is the region of overfitting correctly identified?: 5 points
- Is conclusion supported with data (table or chart): 5 points
- Bagging (10 points)
- Correct conclusion regarding overfitting as bags increase, supported with tables or charts: 5 points
- Correct conclusion regarding overfitting as K increases, supported with tables or charts: 5 points
Required, Allowed & Prohibited
Required:
- Your project must be coded in Python 2.7.x.
- Your code must run on one of the university-provided computers (e.g. buffet02.cc.gatech.edu), or on one of the provided virtual images.
- Your code must run in less than 5 seconds on one of the university-provided computers.
Allowed:
- You can develop your code on your personal machine, but it must also run successfully on one of the university provided machines or virtual images.
- Your code may use standard Python libraries.
- You may use the NumPy, SciPy and Pandas libraries. Be sure you are using the correct versions.
- You may reuse sections of code (up to 5 lines) that you collected from other students or the internet.
- Code provided by the instructor, or allowed by the instructor to be shared.
- Cheese.
Prohibited:
- Any libraries not listed in the "allowed" section above.
- Any code you did not write yourself (except for the 5 line rule in the "allowed" section).
- Any Classes (other than Random) that create their own instance variables for later use (e.g., learners like kdtree).