Undergrad ML4T
Contents
Overview
This course introduces students to the real world challenges of implementing machine learning based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, Q-Learning, KNN and regression trees.
Roughly half of the course is spent on machine learning topics and half on computational finance topics. We start with some very basic computational finance, then cover basic ML, then more advanced finance, and finally more advanced ML with a final project that actually applies machine learning to trading.
Important note
This course ramps up in difficulty towards the end. The projects in the final 1/3 of the course are challenging. Be prepared.
Instructor
David Byrd
Research Scientist, Interactive Media Technology Center at Georgia Tech
Course Designer
Tucker Balch, Ph.D.
Professor, Interactive Computing at Georgia Tech
Syllabi and schedule for specific semesters
Textbooks, Software & Other Resources
There is one required textbook for the class:
- What Hedge Funds Really Do by Romero and Balch [1] - Kindle version usually ~$10
The following textbooks are also helpful:
- Python for Finance by Yves Hilpisch [2]
- Machine Learning by Tom Mitchell
Software:
- Follow these instructions to set up the software: ML4T_Software_Setup
Other resources:
- Course notes developed by Octavian Blaga [docs.google.com]
- Pandas documentation: [pandas.pydata.org]
- David Byrd's slides on how to vectorize technical analysis methods: media:CDB_vectorize_me.pptx
Prerequisites & Target Audience
This is the new undergraduate flavor of ML4T, CS 4646. In order to better progress the class, we make some assumptions about what a CS undergraduate student should know before taking our course. Specifically, you should have taken CS 1332 (data structures) and CS 3600 (intro AI). For example, we expect you should understand tree-based data structures (from 1332), object-oriented programming (from 1331), basic Markov Decision Processes (from 3600), and basic probability and statistics (from 3600 or a math class).
If you have previously taken a machine learning class, you may find much of the ML material to be review. Likewise if you have taken quantitative finance classes. However, even if you have experience in these topics, you will find that we consider them in a different way than you might have seen before, in particular with an eye towards implementation for trading.
If you answer "no" to the following questions, it may be beneficial to refresh your knowledge of the prerequisite material prior to taking CS 4646:
- Do you have a working knowledge of basic statistics, including probability distributions (such as normal and uniform), calculation and differences between mean, median and mode
- Do you understand the difference between geometric mean and arithmetic mean?
- Do you have strong programming skills? Take this quiz compinvesti-prog-quiz if you would like help determining the strength of your programming skills.
- Are you competent with the Unix command line?
Who this course is for: The course is intended for people with strong software programming experience and introductory level knowledge of investment practice. A primary prerequisite is an interest and excitement about the stock market.
Software we'll use: In order to complete the programming assignments you will need to a development environment that you're comfortable with. We use Unix, but you can also work with Windows and Mac OS environments. You must download and install a set of Python modules to your computer (including NumPy, SciPy, and Pandas).
How to install the software: ML4T Software Setup
Logistics
- Refer to this wiki for assignment instructions, syllabus, and course information.
- We will use T-Square for ALL submissions: T-Square (pick appropriate course site)
- We will use Piazza for interaction and discussion. Consult the page for the current semester for a link.
Grading
- A: 90.0% and above
- B: 80.0% and above
- C: 70.0% and above
- D: 60.0% and above
- F: below 60.0%
Students taking the course Pass/Fail must earn at least a 75% to pass.
See semester syllabus for assignment weights.
Minimum technical requirements
- Browser and connection speed: An up-to-date version of Chrome or Firefox is strongly recommended. We also support Internet Explorer 9 and the desktop versions of Internet Explorer 10 and above (not the metro versions). 2+ Mbps recommended; at minimum 0.768 Mbps download speed.
- Hardware: A computer with at least 4GB of RAM and CPU speed of at least 2.5GHz.
- For code development and testing, these three configurations will work
- PC: Windows XP or higher with latest updates installed
- Mac: OS X 10.6 or higher with latest updates installed
- Linux: Any recent distribution that has the supported browsers installed
Office hours
To be determined.
Plagiarism
In most cases I expect that all submitted code will be written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write-ups you provide should have been created by you alone.
Collaboration is permitted only at the "whiteboard level". You may discuss general approaches and algorithms, including high-level pseudocode. You should not share code or line-by-line level pseudocode with other students.
Do not turn in code you found on the web. Do not push your code to any public repository. I also use github and StackOverflow...
There are no group projects in this class.
Class Policies
- For Pass/Fail students: Your overall grade must be 75% or higher to get a passing grade.
- Official communication is by email: We use piazza for discussions, but it is not an official communications channel. All official communications to you will be sent via t-square to your official GT email address. Similarly, you should communicate important items to us by email as well.
- Student responsibilities: Be aware of the deadlines posted on the schedule. Read your GT email every day. Start work on projects even if they are not open on t-square.
- Grade contest period: After a project grade is released you have 7 days to contest the grade. After that time projects will not be reevaluated. You must have a very specific issue with a compelling argument as to why your grade is incorrect. Example compelling argument: "The TA took 10 points off because I was missing a chart, but the chart is visible on page 5." Example not compelling argument: "I think I should have gotten more points, please regrade my project."
- Grade contest process: Email your TA about the situation within 7 days of grades being released.
- Late policy: Assignments are due at 11:55PM Eastern Time on the assignment due date. We do not use other timezones or GMT. Don't go by the time on your machine or by the time on some other way you have configured t-square. Assignments turned in after 11:55PM ET are considered late. Late assignments will not be graded unless a prior arrangement has been made with the instructors.
- Exam scheduling: Exams will be held on specific days at specific times. If there is an emergency or other issue that requires changing the date of an exam for you, you will need to have it approved by the Dean of Students. You can apply for that here: http://www.deanofstudents.gatech.edu (under Resources -> Class Absences)
- Each project for this course has it's own page on this wiki. That description includes a list of specific deliverables and usually a rubric. Be sure to double check your submission against those so you don't miss anything.
- Many of the projects will be revised somewhat. While they are under revision, they will have a "DRAFT" note on the wiki. Once we're done with any revisions we will remove the "DRAFT" note and open submissions on t-square.
- We require that your code run properly on one of the servers we have set up at GT. To assist you with this, and to furthermore help you test your code for each assignment we have equipped these servers with a server process that will run your code against a set of test cases.
- If a problem crops up with your submitted code we will not consider reassessing it if it has not been tested as described above.
- Once you are satisfied with your code, submit the EXACT same working code via t-square.
- It is a good idea to submit a version of your working code early (before the deadline) in case some problem arises with your internet connection or t-square.
- If you submit your code multiple times (perfectly fine) it is very important that you first delete the files that are there, then submit your new code. If you don't our grading software won't know which files to use.
- The latest timestamp on any part of your submission will be used as the time of submission for your whole project. Accordingly, do not resubmit anything after the deadline, or it will be considered late.
- After the submission deadline we will test your code on one of our servers which is configured identically to the ones available for your test.