Difference between revisions of "MC2-Homework-1"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 17: Line 17:
 
==Topic for your question==
 
==Topic for your question==
  
If your last name begins with these letters: Your topic should be drawn from this book chapter
+
The topic for your question depends on the first letter of your last name:
  
* KEG: Chapter 4: Market-Making Mechanics or Chapter 2: So You Want to Be a Hedge Fund Manager (your choice)
+
* S (except Sa): Types of learning problems: Regression versus Classification.
* HALF: Chapter 5: Introduction to Company Valuation
+
* L: Supervised versus Unsupervised.
* WORD: Chapter 7: Framework for Investing: The Capital Asset Pricing Model (CAPM)
+
* C,X,M,V: Compare properties of kNN versus, decision trees, and linear regression (training cost, query cost, prediction accuracy).
* ZINC: Chapter 8: The Efficient Market Hypothesis (EMH)-Its Three Versions
+
* W,F,K,N: Compare different methods of building a decision tree.
* PUB: Chapter 9: The Fundamental Law of Active Portfolio Management
+
* D,E,I: Parameterized models versus instance-based models.
* STYX: Chapter 10: Modern Portfolio Theory: The Efficient Frontier and Portfolio Optimization
+
* G, P, H,J,O: Overfitting.
* QVMJ: Chapter 12: Overcoming Data Quirks to Design Trading Strategies (your choice)
+
* B,T,Sa: Measuring the quality of predictions: RMSE, correlation, other?
 +
* A,Y,Q: Bagging.
 +
* Z,R : Boosting.
  
 
==Disclaimer==
 
==Disclaimer==
Line 33: Line 35:
 
==What to turn in==
 
==What to turn in==
  
* Submit your question as a single file <tt>question.txt</tt> via t-square. It is essential that you use that name exactly.
+
* Submit your response as text only via survey monkey [https://www.surveymonkey.com/r/8XBB9S8].
* Do not submit other files.
 
* Don not submit word documents, image files, zip files or PDFs.
 
* Make sure your file is named correctly.
 
* Under no circumstance should you submit a word document.
 
  
 
==Sharing and discussing questions==
 
==Sharing and discussing questions==
Line 45: Line 43:
 
==Rubric==
 
==Rubric==
  
The question will be scored from 0 to 100%. 10% will be deducted for each criteria not met.   
+
The question will be scored from 0 to 100%. 20% will be deducted for each criteria not met.   
  
 
For the question:
 
For the question:

Latest revision as of 21:07, 29 September 2016

Overview: Machine Learning Question

The purpose of this assignment is to help you study for the midterm by involving you in the creation of the midterm. The TAs and the instructors will select the best questions from this pool to be added to the actual exam. Overall, the exam is expected to consist of 10 Python questions, 25 ML questions, and 25 Finance questions.

Task

You are to create a multiple choice question regarding the ML content of the course up to and including MC3-Project-1 for the midterm. You should provide:

  • The question itself.
  • 4 possible answers labeled a) through d)
  • Short, complete, explanation for the correct answer.

Your 4 answers should include one unambiguously correct response and at least one other attractive answer that might be selected if the test taker is not well informed. The intent is that these questions should be easy if the student has been following along in the class and hard if they have not. I do NOT want these to be trick questions, or questions that require encyclopedic knowledge.

Submit your response as text only via survey monkey [1]. We do not want PDFs, image files or word documents.

Topic for your question

The topic for your question depends on the first letter of your last name:

  • S (except Sa): Types of learning problems: Regression versus Classification.
  • L: Supervised versus Unsupervised.
  • C,X,M,V: Compare properties of kNN versus, decision trees, and linear regression (training cost, query cost, prediction accuracy).
  • W,F,K,N: Compare different methods of building a decision tree.
  • D,E,I: Parameterized models versus instance-based models.
  • G, P, H,J,O: Overfitting.
  • B,T,Sa: Measuring the quality of predictions: RMSE, correlation, other?
  • A,Y,Q: Bagging.
  • Z,R : Boosting.

Disclaimer

If your question is selected for use in the exam, we may not use it verbatim. It might be modified slightly for clarity, the parameters might be changed slightly, or it may be modified to make it more suitable for the exam format.

What to turn in

  • Submit your response as text only via survey monkey [2].

Sharing and discussing questions

Unlike other assignments in this class it is OK to post and discuss your prospective "answer" to this assignment on piazza. However, keep in mind that if you copy someone else's question from piazza, it will of course be considered plagiarism.

Rubric

The question will be scored from 0 to 100%. 20% will be deducted for each criteria not met.

For the question:

  • Is the question unambiguous? There should be only one possible interpretation of the meaning of the question.
  • Are there multiple plausible answers? If one made a wrong assumption or math mistake they might choose the alternative, wrong answer.
  • There should be only one correct answer.
  • The question should not be too hard. i.e., it should not require memorization of Pandas API calls, or complex calculations.
  • The question should not be too easy. i.e., it should not be trivial.

For the answer part:

  • Python questions must be validated with transcripts of actual python code and output. The example code should be completely self contained, including import statements, etc.

Other penalties:

  • Wrong topic -50%
  • Question is fundamentally wrong -50%
  • No answer is provided -50%
  • No python transcript (if the question is python related) -50%

Note that even if the question is "good enough" for use in the exam it may not actually be used.

Example

Which is a better measure of portfolio performance, and why: Sharpe Ratio or cumulative return?

a) Sharpe Ratio is better because it considers P/E ratio and book value.
b) Cumulative return is better because it includes consideration of risk.
c) Cumulative return is better because risk does not matter.
d) Sharpe Ratio is better because it considers risk and return.

Correct answer is d) because Sharpe Ratio = sqrt(sampling_frequency) * mean(daily_returns - rfr) / stdev(daily_returns)

Legacy

MC2-Homework-1-Legacy