Difference between revisions of "MC2-Homework-1"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
(Created page with "==Overview== The purpose of this assignment is to help you study for the midterm by involving you in the creation of the midterm. The TAs and the instructors will select the...")
 
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Overview==
+
==Overview: Machine Learning Question==
  
The purpose of this assignment is to help you study for the midterm by involving you in the creation of the midterm.  The TAs and the instructors will select the best 70 or so questions for the actual exam.  If your question is selected you will get full credit for this homework.
+
The purpose of this assignment is to help you study for the midterm by involving you in the creation of the midterm.  The TAs and the instructors will select the best questions from this pool to be added to the actual exam.  Overall, the exam is expected to consist of 10 Python questions, 25 ML questions, and 25 Finance questions.
  
 
==Task==
 
==Task==
  
You are to create a multiple choice question regarding Python/Numpy/Pandas for the midterm.  You should provide:
+
You are to create a multiple choice question regarding the ML content of the course up to and including MC3-Project-1 for the midterm.  You should provide:
  
 
* The question itself.
 
* The question itself.
 
* 4 possible answers labeled a) through d)
 
* 4 possible answers labeled a) through d)
* Short, complete, real Python transcript that "proves" the correct answer.
+
* Short, complete, explanation for the correct answer.
  
Your 4 answers should include one unambiguously correct response and at least one other attractive answer that might be selected if the student is not well informed. The intent is that these questions should be easy if the student has been doing his own programming and hard if they have not. I do NOT want these to be trick questions, or questions that require encyclopedic knowledge.   
+
Your 4 answers should include one unambiguously correct response and at least one other attractive answer that might be selected if the test taker is not well informed. The intent is that these questions should be easy if the student has been following along in the class and hard if they have not. I do NOT want these to be trick questions, or questions that require encyclopedic knowledge.   
  
Submit your response as text only <tt>question.txt</tt>.  We do not want PDFs, image files or word documents.
+
Submit your response as text only via survey monkey [https://www.surveymonkey.com/r/8XBB9S8].  We do not want PDFs, image files or word documents.
  
==Two types of questions==
+
==Topic for your question==
  
We're looking for two primary forms of questions:  
+
The topic for your question depends on the first letter of your last name:
  
* Type 1: What is the output of this python code? In this example, we provide Python code, and then several potential example answers.
+
* S (except Sa): Types of learning problems: Regression versus Classification.
* Type 2: Fill the blank in above to cause this Python code to give the following output.
+
* L: Supervised versus Unsupervised.
 +
* C,X,M,V: Compare properties of kNN versus, decision trees, and linear regression (training cost, query cost, prediction accuracy).
 +
* W,F,K,N: Compare different methods of building a decision tree.
 +
* D,E,I: Parameterized models versus instance-based models.
 +
* G, P, H,J,O: Overfitting.
 +
* B,T,Sa: Measuring the quality of predictions: RMSE, correlation, other?
 +
* A,Y,Q: Bagging.
 +
* Z,R : Boosting.
  
 
==Disclaimer==
 
==Disclaimer==
Line 28: Line 35:
 
==What to turn in==
 
==What to turn in==
  
* Submit your question as a single file <tt>question.txt</tt> via t-square. It is essential that you use that name exactly.
+
* Submit your response as text only via survey monkey [https://www.surveymonkey.com/r/8XBB9S8].
* Do not submit other files.
 
* Don not submit word documents, image files, zip files or PDFs.
 
* Make sure your file is named correctly.
 
* Under no circumstance should you submit a word document.
 
  
 
==Sharing and discussing questions==
 
==Sharing and discussing questions==
Line 40: Line 43:
 
==Rubric==
 
==Rubric==
  
The question will be scored from 0 to 95%. 10% will be deducted for each criteria not met.   
+
The question will be scored from 0 to 100%. 20% will be deducted for each criteria not met.   
  
 
For the question:
 
For the question:
Line 52: Line 55:
 
* Python questions must be validated with transcripts of actual python code and output.  The example code should be completely self contained, including import statements, etc.
 
* Python questions must be validated with transcripts of actual python code and output.  The example code should be completely self contained, including import statements, etc.
  
If acceptable overall for use as an exam question:
+
Other penalties:
* +5%
+
* Wrong topic -50%
 +
* Question is fundamentally wrong -50%
 +
* No answer is provided -50%
 +
* No python transcript (if the question is python related) -50%
  
 
Note that even if the question is "good enough" for use in the exam it may not actually be used.
 
Note that even if the question is "good enough" for use in the exam it may not actually be used.
Line 60: Line 66:
  
 
<PRE>
 
<PRE>
How should section A be filled in to complete code that will cause the following output:
+
Which is a better measure of portfolio performance, and why: Sharpe Ratio or cumulative return?
  
Code:
+
a) Sharpe Ratio is better because it considers P/E ratio and book value.
 +
b) Cumulative return is better because it includes consideration of risk.
 +
c) Cumulative return is better because risk does not matter.
 +
d) Sharpe Ratio is better because it considers risk and return.
  
import numpy as np
+
Correct answer is d) because Sharpe Ratio = sqrt(sampling_frequency) * mean(daily_returns - rfr) / stdev(daily_returns)
j = np.random.random([2,2])
+
</PRE>
print j
 
print _A_
 
 
 
Output:
 
 
 
[[ 0.70774499  0.99293455]
 
[ 0.0762406  0.81082289]]
 
[[  1.          13.02369813]
 
[  0.10772326  10.635054  ]]
 
 
 
Select one answer:
 
a) j / j[0,:]
 
b) j / j[:,0]
 
c) j * j[:,1]
 
d) j / j[:,1]
 
  
Correct answer: b)
+
==Legacy==
  
Python transcript:
+
[[MC2-Homework-1-Legacy]]
 
 
>>> import numpy as np
 
>>> j = np.random.random([2,2])
 
>>> print j
 
[[ 0.70774499  0.99293455]
 
[ 0.0762406  0.81082289]]
 
>>> print j/j[:,0]
 
[[ 1.          13.02369813]
 
[  0.10772326  10.635054  ]]
 
</PRE>
 

Latest revision as of 22:07, 29 September 2016

Overview: Machine Learning Question

The purpose of this assignment is to help you study for the midterm by involving you in the creation of the midterm. The TAs and the instructors will select the best questions from this pool to be added to the actual exam. Overall, the exam is expected to consist of 10 Python questions, 25 ML questions, and 25 Finance questions.

Task

You are to create a multiple choice question regarding the ML content of the course up to and including MC3-Project-1 for the midterm. You should provide:

  • The question itself.
  • 4 possible answers labeled a) through d)
  • Short, complete, explanation for the correct answer.

Your 4 answers should include one unambiguously correct response and at least one other attractive answer that might be selected if the test taker is not well informed. The intent is that these questions should be easy if the student has been following along in the class and hard if they have not. I do NOT want these to be trick questions, or questions that require encyclopedic knowledge.

Submit your response as text only via survey monkey [1]. We do not want PDFs, image files or word documents.

Topic for your question

The topic for your question depends on the first letter of your last name:

  • S (except Sa): Types of learning problems: Regression versus Classification.
  • L: Supervised versus Unsupervised.
  • C,X,M,V: Compare properties of kNN versus, decision trees, and linear regression (training cost, query cost, prediction accuracy).
  • W,F,K,N: Compare different methods of building a decision tree.
  • D,E,I: Parameterized models versus instance-based models.
  • G, P, H,J,O: Overfitting.
  • B,T,Sa: Measuring the quality of predictions: RMSE, correlation, other?
  • A,Y,Q: Bagging.
  • Z,R : Boosting.

Disclaimer

If your question is selected for use in the exam, we may not use it verbatim. It might be modified slightly for clarity, the parameters might be changed slightly, or it may be modified to make it more suitable for the exam format.

What to turn in

  • Submit your response as text only via survey monkey [2].

Sharing and discussing questions

Unlike other assignments in this class it is OK to post and discuss your prospective "answer" to this assignment on piazza. However, keep in mind that if you copy someone else's question from piazza, it will of course be considered plagiarism.

Rubric

The question will be scored from 0 to 100%. 20% will be deducted for each criteria not met.

For the question:

  • Is the question unambiguous? There should be only one possible interpretation of the meaning of the question.
  • Are there multiple plausible answers? If one made a wrong assumption or math mistake they might choose the alternative, wrong answer.
  • There should be only one correct answer.
  • The question should not be too hard. i.e., it should not require memorization of Pandas API calls, or complex calculations.
  • The question should not be too easy. i.e., it should not be trivial.

For the answer part:

  • Python questions must be validated with transcripts of actual python code and output. The example code should be completely self contained, including import statements, etc.

Other penalties:

  • Wrong topic -50%
  • Question is fundamentally wrong -50%
  • No answer is provided -50%
  • No python transcript (if the question is python related) -50%

Note that even if the question is "good enough" for use in the exam it may not actually be used.

Example

Which is a better measure of portfolio performance, and why: Sharpe Ratio or cumulative return?

a) Sharpe Ratio is better because it considers P/E ratio and book value.
b) Cumulative return is better because it includes consideration of risk.
c) Cumulative return is better because risk does not matter.
d) Sharpe Ratio is better because it considers risk and return.

Correct answer is d) because Sharpe Ratio = sqrt(sampling_frequency) * mean(daily_returns - rfr) / stdev(daily_returns)

Legacy

MC2-Homework-1-Legacy