MC3-Homework-1

From Quantitative Analysis Software Courses
Jump to navigation Jump to search

Draft

The description for this assignment has not been created yet. Once it is finalized, this notice will be removed.

Updates / FAQs

Overview

You will also write some code to generate your own datasets. That part of the project will test your understanding of the strengths and weaknesses of various learners.

Template and Data

Generate your own datasets

Create a Python script called gen_data.py that implements two functions. The two functions should be named as follows, and support the following API:

X1, Y1 = best4LinReg()
X2, Y2 = best4KNN()

best4LinReg() should return data that performs significantly better with LinRegLearner than KNNLearner. best4KNN() should return data that performs significantly better with KNNLearner than LinRegLearner.

Each data set should include at least 2 columns in X, and one column in Y. The data should contain from 10 (minimum) to 1000 (maximum) rows.

What to turn in

Be sure to follow these instructions diligently!

Via T-Square, submit as attachment (no zip files; refer to schedule for deadline):

  • Your code as gen_data.py

Unlimited resubmissions are allowed up to the deadline for the project.

Rubric

Required, Allowed & Prohibited

Required:

  • Your project must be coded in Python 2.7.x.
  • Your code must run on one of the university-provided computers (e.g. buffet02.cc.gatech.edu), or on one of the provided virtual images.
  • Your code must run in less than 5 seconds on one of the university-provided computers.
  • The code you submit should NOT include any data reading routines. The provided testlearner.py code reads data for you.
  • The code you submit should NOT generate any output: No prints, no charts, etc.

Allowed:

  • You can develop your code on your personal machine, but it must also run successfully on one of the university provided machines or virtual images.
  • Your code may use standard Python libraries.
  • You may use the NumPy, SciPy, matplotlib and Pandas libraries. Be sure you are using the correct versions.
  • You may reuse sections of code (up to 5 lines) that you collected from other students or the internet.
  • Code provided by the instructor, or allowed by the instructor to be shared.
  • Cheese.

Prohibited:

  • Any other method of reading data besides testlearner.py
  • Any libraries not listed in the "allowed" section above.
  • Any code you did not write yourself (except for the 5 line rule in the "allowed" section).
  • Any Classes (other than Random) that create their own instance variables for later use (e.g., learners like kdtree).
  • Code that includes any data reading routines. The provided testlearner.py code reads data for you.
  • Code that generates any output when verbose = False: No prints, no charts, etc.

Legacy

MC3-Homework-1-legacy