Difference between revisions of "Machine Learning for Trading Course"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
 
(77 intermediate revisions by 6 users not shown)
Line 8: Line 8:
 
* Mini-course 2: [[Computational Investing]]
 
* Mini-course 2: [[Computational Investing]]
 
* Mini-course 3: [[Machine Learning Algorithms for Trading]]
 
* Mini-course 3: [[Machine Learning Algorithms for Trading]]
 +
 +
A set of course notes and example code can be found here: [[http://quantsoftware.gatech.edu/images/ML4TNotes2016.zip]]
 +
 +
==Video Content==
 +
 +
The video content for this course is available for free at  [[https://www.udacity.com/course/machine-learning-for-trading--ud501 Udacity]].
  
 
==Important note==
 
==Important note==
Line 18: Line 24:
 
Professor, Interactive Computing at Georgia Tech<BR>
 
Professor, Interactive Computing at Georgia Tech<BR>
 
CS 7646 Course Designer<BR>
 
CS 7646 Course Designer<BR>
CS 7646 Instructor: Spring 2016, Fall 2016<BR>
+
CS 7646 Instructor: Spring 2016, Fall 2016, Spring 2017, Summer 2017 (online), Fall 2017, Spring 2018, Summer 2018, Fall 2018<BR>
 +
CIOS reviews [[Media:2017SummerCIOS.pdf]]<BR>
 +
 
 +
Maria Hybinette<BR>
 +
Associate Professor, Computer Science, University of Georgia<BR>
 +
CS 4646 Instructor: Summer 2018<BR>
  
 
David Byrd<BR>
 
David Byrd<BR>
 
Research Scientist, Interactive Media Technology Center at Georgia Tech<BR>
 
Research Scientist, Interactive Media Technology Center at Georgia Tech<BR>
CS 7646 Instructor: Summer 2016<BR>
+
CS 7646 On Campus Instructor: Summer 2016, Summer 2017, Spring 2018, Fall 2019<BR>
CS 7646 Head TA: Spring 2016<BR>
+
CS 7646 Head TA: Spring 2016, Fall 2016, Fall 2017<BR>
  
==Syllabus==
+
David Joyner<BR>
 +
CS 7646 Online Instructor: Spring 2019, Summer 2019, Fall 2019, Spring 2020<BR>
  
* [[CS7646_Spring_2016]]
+
Joshua Fox<BR>
* [[CS7646_Summer_2016]]
+
CS 7646 Head TA: Fall 2019, Spring 2020<BR>
  
==2016 Spring Schedule==
+
==Syllabi and schedule for specific semesters==
  
* [[https://docs.google.com/spreadsheets/d/16FPrMo7iUXJHKUjbSt3gMgzxYDC7vnr_CdVMOqh53TY/pubhtml?gid=0&single=true 2016 Spring Schedule]]
+
* [[CS7646_Fall_2019]] (also CS 4646)
 +
* [[CS7646_Summer_2019]]
 +
* [[CS7646_Spring_2019]]
 +
* [[CS7646_Fall_2018]] also includes details for CS 4646
 +
* [[http://cobweb.cs.uga.edu/~maria/classes/0-4646-Summer-2018/schedule.html Undergrad summer 2018]]
 +
* [[CS7646_Summer_2018]]
 +
* [[CS7646_Spring_2018]]
 +
* [[CS7646_Fall_2017]]
 +
* [[CS7646_Summer_2017]]
 +
* [[CS7646_Spring_2017]]
 +
* [[CS7646_Fall_2016]]
 +
* [[CS7646_Summer_2016]]
 +
* [[CS7646_Spring_2016]]
  
==Assignments==
+
==Textbooks, Software & Other Resources==
 
 
* [[http://quantsoftware.gatech.edu/MC1-Homework-1 MC1-Homework-1: Implement standard deviation]]
 
* [[http://quantsoftware.gatech.edu/MC1-Homework-2 MC1-Homework-2: Install VM]]
 
* [[http://quantsoftware.gatech.edu/MC1-Project-1 MC1-Project-1: Assess portfolio]]
 
* [[http://quantsoftware.gatech.edu/MC1-Project-2 MC1-Project-2: Optimize a portfolio]]
 
* [[http://quantsoftware.gatech.edu/MC1-Homework-3 MC1-Homework-3: Create a Python midterm question]]
 
* [[http://quantsoftware.gatech.edu/MC2-Project-1 MC2-Project-1: Build a market simulator]]
 
* [[http://quantsoftware.gatech.edu/MC2-Project-2 MC2-Project-2: Implement bollinger bands, and create a simple trading strategy]]
 
* [[http://quantsoftware.gatech.edu/MC2-Homework-1 MC3-Homework-1: Create a Finance midterm question]]
 
 
 
* [[Midterm Study Guide]]
 
 
 
* [[http://quantsoftware.gatech.edu/MC3-Project-1 MC3-Project-1]]
 
* [[http://quantsoftware.gatech.edu/MC3-Project-2 MC3-Project-2]]
 
* [[http://quantsoftware.gatech.edu/MC3-Project-3 MC3-Project-3]]
 
 
 
==Textbooks & Other Resources==
 
  
 
We will use the following textbooks:
 
We will use the following textbooks:
Line 61: Line 68:
 
** Buy a paperback version for $61.78. IMPORTANT WARNINGS: 1) They only ship to the US 2) It takes them 3 weeks to print the book.  If you order from outside the US they will quietly accept your money but never ship the book: [http://shop.mheducation.com/mhshop/productDetails?isbn=1259712346 less expensive version at mcgraw hill]
 
** Buy a paperback version for $61.78. IMPORTANT WARNINGS: 1) They only ship to the US 2) It takes them 3 weeks to print the book.  If you order from outside the US they will quietly accept your money but never ship the book: [http://shop.mheducation.com/mhshop/productDetails?isbn=1259712346 less expensive version at mcgraw hill]
 
** Buy a paperback international version for $19.10. I am not certain about the reliability of this company: [http://www.abebooks.com/servlet/BookDetailsPL?bi=18281220882&searchurl=x%3D50%26y%3D7%26sts%3Dt%26tn%3DMachine%2520Learning%26an%3DMitchell%252C%2520Tom%26vci%3D59667732 international]
 
** Buy a paperback international version for $19.10. I am not certain about the reliability of this company: [http://www.abebooks.com/servlet/BookDetailsPL?bi=18281220882&searchurl=x%3D50%26y%3D7%26sts%3Dt%26tn%3DMachine%2520Learning%26an%3DMitchell%252C%2520Tom%26vci%3D59667732 international]
 +
 +
Software:
 +
 +
* Follow these instructions to set up the software: [[ML4T_Software_Setup]]
  
 
Other resources:
 
Other resources:
  
* Pandas documentation: [[http://pandas.pydata.org/pandas-docs/version/0.16.2/index.html pandas.pydata.org]]
+
* Course notes developed by Octavian Blaga [[https://docs.google.com/document/d/1BpDrMJDqx3sGt5-hoSTF3hJZVOf04hO_8ERdLHWP5A0/edit?usp=sharing docs.google.com]]
 +
* Pandas documentation: [[https://pandas.pydata.org/pandas-docs/version/0.24/index.html pandas.pydata.org]]
 +
* David Byrd's slides on how to vectorize technical analysis methods: [[media:CDB_vectorize_me.pptx]]
  
 
==Prerequisites/Co-requisites==
 
==Prerequisites/Co-requisites==
Line 75: Line 88:
 
* Do you understand the difference between geometric mean and arithmetic mean?
 
* Do you understand the difference between geometric mean and arithmetic mean?
 
* Do you have strong programming skills? Take this quiz [[compinvesti-prog-quiz]] if you would like help determining the strength of your programming skills.
 
* Do you have strong programming skills? Take this quiz [[compinvesti-prog-quiz]] if you would like help determining the strength of your programming skills.
 +
* Are you competent with the Unix command line?
  
 
Who this course is for: The course is intended for people with strong software programming experience and introductory level knowledge of investment practice. A primary prerequisite is an interest and excitement about the stock market.  
 
Who this course is for: The course is intended for people with strong software programming experience and introductory level knowledge of investment practice. A primary prerequisite is an interest and excitement about the stock market.  
Line 80: Line 94:
 
Software we'll use: In order to complete the programming assignments you will need to a development environment that you're comfortable with.  We use Unix, but you can also work with Windows and Mac OS environments.  You must download and install a set of Python modules to your computer (including NumPy, SciPy, and Pandas).
 
Software we'll use: In order to complete the programming assignments you will need to a development environment that you're comfortable with.  We use Unix, but you can also work with Windows and Mac OS environments.  You must download and install a set of Python modules to your computer (including NumPy, SciPy, and Pandas).
  
How to install the software: [[ML4T Software Installation]]
+
How to install the software: [[ML4T Software Setup]]
  
 
==Logistics==
 
==Logistics==
  
* We will use Udacity for lecture videos.
+
* OMSCS: We will use Udacity for lecture videos.
 
** Login here using your GT account: [https://login.gatech.edu/cas/login?service=http%3A%2F%2Fweb.iam.gatech.edu%2Fudacity-login%2F GT-Udacity Login] ([https://www.youtube.com/watch?v=pyqirZW_sT8 instruction video])<br />'''Note''': DO NOT log in using your personal Udacity account, in case you have one.
 
** Login here using your GT account: [https://login.gatech.edu/cas/login?service=http%3A%2F%2Fweb.iam.gatech.edu%2Fudacity-login%2F GT-Udacity Login] ([https://www.youtube.com/watch?v=pyqirZW_sT8 instruction video])<br />'''Note''': DO NOT log in using your personal Udacity account, in case you have one.
 
** Go to the course on Udacity (or navigate through My Courses): https://www.udacity.com/course/viewer#!/c-ud501
 
** Go to the course on Udacity (or navigate through My Courses): https://www.udacity.com/course/viewer#!/c-ud501
* We will use T-Square for submission of code and reports: [https://t-square.gatech.edu/portal T-Square] (pick appropriate course site)
+
* If you have have trouble accessing Udacity content, please share your problem via email with  gtech-support@udacity.com
* We will use Piazza for interaction and discussion: [https://piazza.com/ Fall 2016 Piazza forum]
+
* We will use Canvas for ALL submissions: [https://gatech.instructure.com/ Canvas] (pick appropriate course site)
 +
* We will use Piazza for interaction and discussion. Consult the page for the current semester for a link.
  
 
==Grading==
 
==Grading==
 +
* A: 90.0% and above
 +
* B: 80.0% and above
 +
* C: 70.0% and above
 +
* D: 60.0% and above
 +
* F: below 60.0%
  
* Mini-course 1: Two homework assignments and two programming projects.
+
Students taking the course Pass/Fail must earn at least a 75% to pass.
* Mini-course 2: Two programming projects, and a midterm.
 
* Mini-course 3: Three programming projects (no final).
 
  
Weightings:
+
We do not encourage "audit" students.  If you are in the course on audit status, you must earn at least a "B" on the midterm.
  
*MC1-Homework-1: 2.5%
+
See semester syllabus for assignment weights.
*MC1-Homework-2: 2.5%
 
*MC1-Homework-3: 2.5%
 
*MC1-Project-1: 5%
 
*MC1-Project-2: 5%
 
*MC2-Project-1: 15%
 
*MC2-Project-2: 10%
 
*MC3-Homework-1: 2.5%
 
*Midterm: 20%
 
*MC3-Project-1: 10%
 
*MC3-Project-2: 10%
 
*MC3-Project-3: 15%
 
 
 
Thresholds:
 
 
 
* A: 90% and above
 
* B: 80% and above
 
* C: 70% and above
 
* D: 60% and above
 
* F: below 60%
 
  
 
==Minimum technical requirements==
 
==Minimum technical requirements==
Line 125: Line 124:
 
* Hardware: A computer with at least 4GB of RAM and CPU speed of at least 2.5GHz.
 
* Hardware: A computer with at least 4GB of RAM and CPU speed of at least 2.5GHz.
  
* OS:
+
* For code development and testing, these three configurations will work
 
** PC: Windows XP or higher with latest updates installed
 
** PC: Windows XP or higher with latest updates installed
 
** Mac: OS X 10.6 or higher with latest updates installed
 
** Mac: OS X 10.6 or higher with latest updates installed
 
** Linux: Any recent distribution that has the supported browsers installed
 
** Linux: Any recent distribution that has the supported browsers installed
 +
 +
* For online test taking (proctortrack) you will need one of:
 +
** PC: Windows XP or higher with latest updates installed
 +
** Mac: OS X 10.6 or higher with latest updates installed
 +
** Linux is NOT supported.
  
 
==Office hours==
 
==Office hours==
Line 137: Line 141:
  
 
In most cases I expect that all submitted code will be written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write-ups you provide should have been created by you alone.
 
In most cases I expect that all submitted code will be written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write-ups you provide should have been created by you alone.
 +
 +
If we discover that you have submitted assignment material created by another student, either from a previous semester or in the current session, you will be assigned a 0 for the relevant project.
  
 
==Class Policies==
 
==Class Policies==
  
* Official communication is by email: We use piazza for discussions, but it is not an official communications channel. All official communications to you will be sent via t-square to your official GT email address.
+
* For Pass/Fail students: Your overall grade must be 75% or higher to get a passing grade.
  
* Student responsibilities: Be aware of the deadlines posted on the schedule.  Read your GT email every day.  Start work on projects even if they are not open on t-square.
+
* Official communication is by email: We use piazza for discussions, but it is not an official communications channel. 
 +
 
 +
* Student responsibilities: Be aware of the deadlines posted on the schedule.  Start work on projects even if they are not open on Canvas.  
  
 
* Grade contest period: After a project grade is released you have 7 days to contest the grade.  After that time projects will not be reevaluated. You must have a very specific issue with a compelling argument as to why your grade is incorrect.  Example compelling argument: "The TA took 10 points off because I was missing a chart, but the chart is visible on page 5."  Example not compelling argument: "I think I should have gotten more points, please regrade my project."
 
* Grade contest period: After a project grade is released you have 7 days to contest the grade.  After that time projects will not be reevaluated. You must have a very specific issue with a compelling argument as to why your grade is incorrect.  Example compelling argument: "The TA took 10 points off because I was missing a chart, but the chart is visible on page 5."  Example not compelling argument: "I think I should have gotten more points, please regrade my project."
  
* Grade contest process: You must enter your request for reevaluation via the online form, which is here.  You will be contacted by your TA about it later on.  Do not request reevaluations on piazza.
+
* Grade contest process: Instruction to be released prior to Project 1 grade release.
 +
 
 +
* Late policy: Late assignments will not be graded unless a ''prior'' arrangement has been made with the instructors. See [[CS7646_Fall_2019#Late_Work]]
 +
 
 +
* Exam scheduling: Exams will be held on specific days at specific times.  If there is an emergency or other issue that requires changing the date of an exam for you, you will need to have it approved by the Dean of Students.  You can apply for that here: http://www.deanofstudents.gatech.edu (under Resources -> Class Absences)
 +
 
 +
* Each project for this course has it's own page on this wikiThat description includes a list of specific deliverables and usually a rubric.  Be sure to double check your submission against those so you don't miss anything.
  
* Student responsibilities: Monitor the schedule so that you know when assignments are dueThe due dates are usually set early in the semester.
+
* <s>Many of the projects will be revised somewhatThey are finalized when they are released on Canvas.</s>
  
* Late policy: Assignments are due at 11:55PM Eastern Time on the assignment due date.  Assignments turned in after 11:55PM are considered late.  Assignments may be turned in up to one day late with a 10% penalty.
+
* We require that your code run properly on one of the servers we have set up at GT.  
  
* Exam scheduling: Exams will be held on specific days at specific times. If there is an emergency or other issue that requires changing the date of an exam for you, you will need to have it approved by the Dean of StudentsYou can apply for that here: http://www.deanofstudents.gatech.edu (under Resources -> Class Absences)
+
* If a problem exists with your submitted code we will not consider reassessing it if it has not been tested as described above.
 +
 
 +
* Most projects will be accompanied with template code and grading code that you can use to test your project. It is necessary that your code passes the grading checks we provide, but the final batch tests may be more rigorousBe sure to examine the rubrics in the project description to be sure your code meets them.
 +
 
 +
* Once you are satisfied with your code, submit the EXACT same working code via Canvas.
 +
 
 +
* It is a good idea to submit a version of your working code early (before the deadline) in case some problem arises with your internet connection or Canvas.
  
==Legacy==
+
* The latest timestamp on any part of your submission will be used as the time of submission for your whole project.  Accordingly, do not resubmit anything after the deadline, or it will be considered late.
  
* Legacy: [[https://docs.google.com/spreadsheets/d/1JlhlQ1D4bmwP6THAcVKGo8PNxWjbUQazFeJl8Mz-ZSw/pubhtml old schedule]]
+
* After the submission deadline we will test your code on one of our servers which is configured identically to the ones available for your test.

Latest revision as of 21:30, 5 January 2020

Overview

This course introduces students to the real world challenges of implementing machine learning based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, Q-Learning, KNN and regression trees and how to apply them to actual stock trading situations.

This course is composed of three mini-courses:

A set of course notes and example code can be found here: [[1]]

Video Content

The video content for this course is available for free at [Udacity].

Important note

This course ramps up in difficulty towards the end. The projects in the final 1/3 of the course are challenging. Be prepared.

Instructor information

Tucker Balch, Ph.D.
Professor, Interactive Computing at Georgia Tech
CS 7646 Course Designer
CS 7646 Instructor: Spring 2016, Fall 2016, Spring 2017, Summer 2017 (online), Fall 2017, Spring 2018, Summer 2018, Fall 2018
CIOS reviews Media:2017SummerCIOS.pdf

Maria Hybinette
Associate Professor, Computer Science, University of Georgia
CS 4646 Instructor: Summer 2018

David Byrd
Research Scientist, Interactive Media Technology Center at Georgia Tech
CS 7646 On Campus Instructor: Summer 2016, Summer 2017, Spring 2018, Fall 2019
CS 7646 Head TA: Spring 2016, Fall 2016, Fall 2017

David Joyner
CS 7646 Online Instructor: Spring 2019, Summer 2019, Fall 2019, Spring 2020

Joshua Fox
CS 7646 Head TA: Fall 2019, Spring 2020

Syllabi and schedule for specific semesters

Textbooks, Software & Other Resources

We will use the following textbooks:

  • For Mini-course 1: Python for Finance by Yves Hilpisch amazon.com (optional)
  • For Mini-course 2: What Hedge Funds Really Do by Romero and Balch amazon.com (required)
  • For Mini-course 3: Machine Learning by Tom Mitchell (optional)
    • Buy it for $218.00 at: amazon.com
    • Buy a paperback version for $61.78. IMPORTANT WARNINGS: 1) They only ship to the US 2) It takes them 3 weeks to print the book. If you order from outside the US they will quietly accept your money but never ship the book: less expensive version at mcgraw hill
    • Buy a paperback international version for $19.10. I am not certain about the reliability of this company: international

Software:

Other resources:

Prerequisites/Co-requisites

All types of students are welcome! The Machine Learning topics might be "review" for CS students, while finance parts will be review for finance students. However, even if you have experience in these topics, you will find that we consider them in a different way than you might have seen before, in particular with an eye towards implementation for trading.

If you answer "no" to the following questions, it may be beneficial to refresh your knowledge of the prerequisite material prior to taking CS 7646:

  • Do you have a working knowledge of basic statistics, including probability distributions (such as normal and uniform), calculation and differences between mean, median and mode
  • Do you understand the difference between geometric mean and arithmetic mean?
  • Do you have strong programming skills? Take this quiz compinvesti-prog-quiz if you would like help determining the strength of your programming skills.
  • Are you competent with the Unix command line?

Who this course is for: The course is intended for people with strong software programming experience and introductory level knowledge of investment practice. A primary prerequisite is an interest and excitement about the stock market.

Software we'll use: In order to complete the programming assignments you will need to a development environment that you're comfortable with. We use Unix, but you can also work with Windows and Mac OS environments. You must download and install a set of Python modules to your computer (including NumPy, SciPy, and Pandas).

How to install the software: ML4T Software Setup

Logistics

  • OMSCS: We will use Udacity for lecture videos.
  • If you have have trouble accessing Udacity content, please share your problem via email with gtech-support@udacity.com
  • We will use Canvas for ALL submissions: Canvas (pick appropriate course site)
  • We will use Piazza for interaction and discussion. Consult the page for the current semester for a link.

Grading

  • A: 90.0% and above
  • B: 80.0% and above
  • C: 70.0% and above
  • D: 60.0% and above
  • F: below 60.0%

Students taking the course Pass/Fail must earn at least a 75% to pass.

We do not encourage "audit" students. If you are in the course on audit status, you must earn at least a "B" on the midterm.

See semester syllabus for assignment weights.

Minimum technical requirements

  • Browser and connection speed: An up-to-date version of Chrome or Firefox is strongly recommended. We also support Internet Explorer 9 and the desktop versions of Internet Explorer 10 and above (not the metro versions). 2+ Mbps recommended; at minimum 0.768 Mbps download speed.
  • Hardware: A computer with at least 4GB of RAM and CPU speed of at least 2.5GHz.
  • For code development and testing, these three configurations will work
    • PC: Windows XP or higher with latest updates installed
    • Mac: OS X 10.6 or higher with latest updates installed
    • Linux: Any recent distribution that has the supported browsers installed
  • For online test taking (proctortrack) you will need one of:
    • PC: Windows XP or higher with latest updates installed
    • Mac: OS X 10.6 or higher with latest updates installed
    • Linux is NOT supported.

Office hours

To be determined.

Plagiarism

In most cases I expect that all submitted code will be written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write-ups you provide should have been created by you alone.

If we discover that you have submitted assignment material created by another student, either from a previous semester or in the current session, you will be assigned a 0 for the relevant project.

Class Policies

  • For Pass/Fail students: Your overall grade must be 75% or higher to get a passing grade.
  • Official communication is by email: We use piazza for discussions, but it is not an official communications channel.
  • Student responsibilities: Be aware of the deadlines posted on the schedule. Start work on projects even if they are not open on Canvas.
  • Grade contest period: After a project grade is released you have 7 days to contest the grade. After that time projects will not be reevaluated. You must have a very specific issue with a compelling argument as to why your grade is incorrect. Example compelling argument: "The TA took 10 points off because I was missing a chart, but the chart is visible on page 5." Example not compelling argument: "I think I should have gotten more points, please regrade my project."
  • Grade contest process: Instruction to be released prior to Project 1 grade release.
  • Late policy: Late assignments will not be graded unless a prior arrangement has been made with the instructors. See CS7646_Fall_2019#Late_Work
  • Exam scheduling: Exams will be held on specific days at specific times. If there is an emergency or other issue that requires changing the date of an exam for you, you will need to have it approved by the Dean of Students. You can apply for that here: http://www.deanofstudents.gatech.edu (under Resources -> Class Absences)
  • Each project for this course has it's own page on this wiki. That description includes a list of specific deliverables and usually a rubric. Be sure to double check your submission against those so you don't miss anything.
  • Many of the projects will be revised somewhat. They are finalized when they are released on Canvas.
  • We require that your code run properly on one of the servers we have set up at GT.
  • If a problem exists with your submitted code we will not consider reassessing it if it has not been tested as described above.
  • Most projects will be accompanied with template code and grading code that you can use to test your project. It is necessary that your code passes the grading checks we provide, but the final batch tests may be more rigorous. Be sure to examine the rubrics in the project description to be sure your code meets them.
  • Once you are satisfied with your code, submit the EXACT same working code via Canvas.
  • It is a good idea to submit a version of your working code early (before the deadline) in case some problem arises with your internet connection or Canvas.
  • The latest timestamp on any part of your submission will be used as the time of submission for your whole project. Accordingly, do not resubmit anything after the deadline, or it will be considered late.
  • After the submission deadline we will test your code on one of our servers which is configured identically to the ones available for your test.