MC2-Homework-1
Contents
Overview: Machine Learning Question
The purpose of this assignment is to help you study for the midterm by involving you in the creation of the midterm. The TAs and the instructors will select the best questions from this pool to be added to the actual exam. Overall, the exam is expected to consist of 10 Python questions, 25 ML questions, and 25 Finance questions.
Task
You are to create a multiple choice question regarding the ML content of the course up to and including MC3-Project-1 for the midterm. You should provide:
- The question itself.
- 4 possible answers labeled a) through d)
- Short, complete, explanation for the correct answer.
Your 4 answers should include one unambiguously correct response and at least one other attractive answer that might be selected if the test taker is not well informed. The intent is that these questions should be easy if the student has been following along in the class and hard if they have not. I do NOT want these to be trick questions, or questions that require encyclopedic knowledge.
Submit your response as text only via survey monkey [1]. We do not want PDFs, image files or word documents.
Topic for your question
The topic for your question depends on the first letter of your last name:
- S (except Sa): Types of learning problems: Regression versus Classification.
- L: Supervised versus Unsupervised.
- C,X,M,V: Compare properties of kNN versus, decision trees, and linear regression (training cost, query cost, prediction accuracy).
- W,F,K,N: Compare different methods of building a decision tree.
- D,E,I: Parameterized models versus instance-based models.
- G, P, H,J,O: Overfitting.
- B,T,Sa: Measuring the quality of predictions: RMSE, correlation, other?
- A,Y,Q: Bagging.
- Z,R : Boosting.
Disclaimer
If your question is selected for use in the exam, we may not use it verbatim. It might be modified slightly for clarity, the parameters might be changed slightly, or it may be modified to make it more suitable for the exam format.
What to turn in
- Submit your response as text only via survey monkey [2].
Sharing and discussing questions
Unlike other assignments in this class it is OK to post and discuss your prospective "answer" to this assignment on piazza. However, keep in mind that if you copy someone else's question from piazza, it will of course be considered plagiarism.
Rubric
The question will be scored from 0 to 100%. 20% will be deducted for each criteria not met.
For the question:
- Is the question unambiguous? There should be only one possible interpretation of the meaning of the question.
- Are there multiple plausible answers? If one made a wrong assumption or math mistake they might choose the alternative, wrong answer.
- There should be only one correct answer.
- The question should not be too hard. i.e., it should not require memorization of Pandas API calls, or complex calculations.
- The question should not be too easy. i.e., it should not be trivial.
For the answer part:
- Python questions must be validated with transcripts of actual python code and output. The example code should be completely self contained, including import statements, etc.
Other penalties:
- Wrong topic -50%
- Question is fundamentally wrong -50%
- No answer is provided -50%
- No python transcript (if the question is python related) -50%
Note that even if the question is "good enough" for use in the exam it may not actually be used.
Example
Which is a better measure of portfolio performance, and why: Sharpe Ratio or cumulative return? a) Sharpe Ratio is better because it considers P/E ratio and book value. b) Cumulative return is better because it includes consideration of risk. c) Cumulative return is better because risk does not matter. d) Sharpe Ratio is better because it considers risk and return. Correct answer is d) because Sharpe Ratio = sqrt(sampling_frequency) * mean(daily_returns - rfr) / stdev(daily_returns)