Difference between revisions of "Spring 2020 Project 3: Assess Learners"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
Line 6: Line 6:
  
 
==Tasks==
 
==Tasks==
 +
 +
===Hints & Resources===
  
 
===Implement DTLearner (15 points)===
 
===Implement DTLearner (15 points)===
 +
 +
Implement a Decision Tree learner class named DTLearner in the file DTLearner.py.  You should follow the algorithm outlined in the presentation here [http://quantsoftware.gatech.edu/images/4/4e/How-to-learn-a-decision-tree.pdf decision tree slides]. 
 +
 +
* We define "best feature to split on" as the feature (Xi) that has the highest absolute value correlation with Y.
 +
 +
The algorithm outlined in those slides is based on the paper by  [https://link.springer.com/content/pdf/10.1007/BF00116251.pdf JR Quinlan] which you may also want to review as a reference.  Note that Quinlan's paper is focused on creating classification trees, while we're creating regression trees here, so you'll need to consider the differences.
 +
 +
For this part of the project, your code should build a single tree only (not a forest).  We'll get to forests later in the project. Your code should support exactly the API defined below.  DO NOT import any modules besides those listed in the allowed section below.  You should implement the following functions/methods:
 +
 +
import DTLearner as dt
 +
learner = dt.DTLearner(leaf_size = 1, verbose = False) # constructor
 +
learner.addEvidence(Xtrain, Ytrain) # training step
 +
Y = learner.query(Xtest) # query
 +
 +
Where "leaf_size" is the maximum number of samples to be aggregated at a leaf.  While the tree is being constructed recursively, if there are leaf_size or fewer elements at the time of the recursive call, the data should be aggregated into a leaf.  Xtrain and Xtest should be ndarrays (numpy objects) where each row represents an X1, X2, X3... XN set of feature values.  The columns are the features and the rows are the individual example instances.  Y and Ytrain are single dimension ndarrays that indicate the value we are attempting to predict with X.
 +
 +
If "verbose" is True, your code can print out information for debugging.  If verbose = False your code should not generate ANY output.  When we test your code, verbose will be False.
 +
 +
This code should not generate statistics or charts.
  
 
===Implement RTLearner (15 points)===
 
===Implement RTLearner (15 points)===
Line 16: Line 37:
  
 
===Implement author MethodUp to 10 point penalty)====
 
===Implement author MethodUp to 10 point penalty)====
 +
 +
===Extra Credit (0 points)===
  
 
===Experiments and Report (50 points)===
 
===Experiments and Report (50 points)===

Revision as of 21:12, 12 January 2020

Revisions

Overview

Template

Tasks

Hints & Resources

Implement DTLearner (15 points)

Implement a Decision Tree learner class named DTLearner in the file DTLearner.py. You should follow the algorithm outlined in the presentation here decision tree slides.

  • We define "best feature to split on" as the feature (Xi) that has the highest absolute value correlation with Y.

The algorithm outlined in those slides is based on the paper by JR Quinlan which you may also want to review as a reference. Note that Quinlan's paper is focused on creating classification trees, while we're creating regression trees here, so you'll need to consider the differences.

For this part of the project, your code should build a single tree only (not a forest). We'll get to forests later in the project. Your code should support exactly the API defined below. DO NOT import any modules besides those listed in the allowed section below. You should implement the following functions/methods:

import DTLearner as dt
learner = dt.DTLearner(leaf_size = 1, verbose = False) # constructor
learner.addEvidence(Xtrain, Ytrain) # training step
Y = learner.query(Xtest) # query

Where "leaf_size" is the maximum number of samples to be aggregated at a leaf. While the tree is being constructed recursively, if there are leaf_size or fewer elements at the time of the recursive call, the data should be aggregated into a leaf. Xtrain and Xtest should be ndarrays (numpy objects) where each row represents an X1, X2, X3... XN set of feature values. The columns are the features and the rows are the individual example instances. Y and Ytrain are single dimension ndarrays that indicate the value we are attempting to predict with X.

If "verbose" is True, your code can print out information for debugging. If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.

This code should not generate statistics or charts.

Implement RTLearner (15 points)

Implement BagLearner (20 points)

Implement InsaneLearner (Up to 10 point penalty)

Implement author MethodUp to 10 point penalty)=

Extra Credit (0 points)

Experiments and Report (50 points)

What to turn in

Rubric

Report

Code

Required, Allowed & Prohibited