CS7646 Summer 2016
Contents
Overview
This course introduces students to the real world challenges of implementing machine learning based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, Q-Learning, KNN and regression trees and how to apply them to actual stock trading situations.
This summer, the course will follow this broad outline:
- Brief introduction to Manipulating Financial Data in Python
- Introduction to Machine Learning
- Computational Investing
- Machine Learning Algorithms for Trading
Instructor information
David Byrd
Research Scientist, Interactive Media Technology Center at Georgia Tech
Course Designer
Tucker Balch, Ph.D.
Professor, Interactive Computing at Georgia Tech
2016 Summer Schedule
Class meets TTH 2-3:45 in College of Computing 102
Week 1
2016-05-17 (Tuesday)
Course Overview/Admin, Machine Learning Overview, Finance Overview
2016-05-19 (Thursday)
Project 1 assigned (Analyze a Portfolio)
Market Price/Data Basics
Working with Pandas
Reading CSV files
Formatting custom DataFrames
Building complex DataFrames with .join()
Basic plotting with Pandas and PyPlot (matplotlib)
Week 2
2016-05-24 (Tuesday)
Common Market Data Visualizations (OHLC, Candlesticks)
What is "the price" of a stock? (quote vs trade)
Simple comparative plotting (normalizing time series)
Working with Numpy
Time Series
Incomplete Data
Week 3
Plots
Portfolio Statistics
Regression vs Classification
Supervised vs Unsupervised ML
Assessing Learners
Cross-Validation
Batch vs Online Learning
RMS, Pearson's r
Overfitting
Project 1 Due Tuesday (2016-05-31)
Week 4
2016-06-07 (Tuesday)
P1 brief discussion
Linear Regression (pseudo-code)
KNN (pseudo-code)
Decision Trees (with pseudo-code)
2016-06-09 (Thursday)
Finish CART Decision Trees
Bagging
Boosting
Week 5
2016-06-14 (Tuesday)
Market History, Actors
Brokers, Market Makers
Order Book Intro
Quiz 2
2016-06-16 (Thursday)
Order Types
Moving the Market
Shady Dealers
Visualizing Price Movement
Project 2 Due Tuesday (2016-06-14)
Week 6
2016-06-21 (Tuesday)
Markets, Orders, Crashes, Valuation
Time Value of Money
2016-06-23 (Thursday)
Intrinsic Value
Market Capitalization
Quiz 3
Week 7
2016-06-28 (Tuesday)
MIDTERM
2016-06-30 (Thursday)
Book Value
Three types of Valuation
Call / Put Options
Buying / Writing Options
Covered Call Strategy
Leverage
Project 3 Due Tuesday (2016-06-28)
Week 8
2016-07-05 (Tuesday)
HOLIDAY
2016-07-07 (Thursday)
Review Midterm
Technical Analysis
Options (CALL/PUT)
Week 9
2016-07-12 (Tuesday)
OHLC/Candlestick Chart Patterns
Capital Assets Pricing Model
2016-07-14 (Thursday)
CAPM Part 2
Efficient Market Hypothesis
Project 4 Due Thursday (2016-07-14)
Week 10
2016-07-12 (Tuesday)
Fundamental Law of Active Portfolio Management
Efficient Frontier
Finite Automata
Pushdown Automata
Turing Machine
2016-07-14 (Thursday)
Markov Decision Problems
Value Iteration
Reinforcement Learning
Q-Learning Gridland
Week 11
Final Instruction Days
2016-07-19 (Tuesday)
Final day of class
Dyna
Q-Learning Gridland
Time permitting:
Random Forests? PERT?
Advanced options strategies?
Week 12
Finals (no final exam in this class)
Project 5 Due
Assignments
This is a project-heavy class (with no final exam). There will be 6 projects this semester, due every two weeks. Assignment details will be added here.
Project 1: Assess a Portfolio (Due Tuesday 2016-05-31)
Project 2: Instance / Ensemble Learners (Due Tuesday 2016-06-14)
Project 3: Market Simulator (Due Thursday 2016-06-30)
Project 4: Trading Learners (Due Sunday 2016-07-17)
Project EX: Bollinger Bands / Simple Trading Strategy (Due Sunday 2016-07-17)
Project 5: Reinforcement Learning (Due Thursday 2016-07-28)
Textbooks & Other Resources
Required Textbook:
What Hedge Funds Really Do by Romero and Balch amazon.com
Optional Textbooks:
Python for Finance by Yves Hilpisch amazon.com (optional)
Machine Learning by Tom Mitchell (optional)
- Buy it for $218.00 at: amazon.com
- Buy a paperback version for $61.78. IMPORTANT WARNINGS: 1) They only ship to the US 2) It takes them 3 weeks to print the book. If you order from outside the US they will quietly accept your money but never ship the book: less expensive version at mcgraw hill
- Buy a paperback international version for $19.10. I am not certain about the reliability of this company: international
Other resources:
- Pandas documentation: [pandas.pydata.org]
Prerequisites/Co-requisites
All types of students are welcome! The Machine Learning topics might be "review" for CS students, while finance parts will be review for finance students. However, even if you have experience in these topics, you will find that we consider them in a different way than you might have seen before, in particular with an eye towards implementation for trading.
If you answer "no" to the following questions, it may be beneficial to refresh your knowledge of the prerequisite material prior to taking CS 7646:
- Do you have a working knowledge of basic statistics, including probability distributions (such as normal and uniform), calculation and differences between mean, median and mode
- Do you understand the difference between geometric mean and arithmetic mean?
- Do you have strong programming skills? Take this quiz compinvesti-prog-quiz if you would like help determining the strength of your programming skills.
Who this course is for: The course is intended for people with strong software programming experience and introductory level knowledge of investment practice. A primary prerequisite is an interest and excitement about the stock market.
Software we'll use: In order to complete the programming assignments you will need to a development environment that you're comfortable with. We use Unix (which includes Mac OS these days), but you can also work with Windows environments. You must download and install a set of Python modules to your computer (including NumPy, SciPy, and Pandas).
You may develop your software however you like, but you must test it using the provided VM or campus UNIX machine prior to turning it in. If your code does not run in our environment, you will be penalized. Improve your chances by ensuring you turn in raw python text files only (.py) and do not import libraries not explicitly allowed.
How to install the software: ML4T Software Installation
Logistics
- Lectures take place in College of Computing 101 on Tues/Thurs at 2:00 PM.
- First day of class is May 17, 2016.
- We will use T-Square for submission of code and reports: T-Square (pick appropriate course site)
- We will use Piazza for interaction and discussion: CS 7646 Summer 2016 at Piazza
- Official communication will come via your @gatech.edu e-mail address only.
Grading
Weighting:
- Projects: 65%
- 5 projects, 10% - 20% each
- Analyze a Portfolio (10%)
- Instance/Ensemble Learners (10%)
- Market Simulator (10%)
- Trading with Learners (15%)
- Q-Learner (20%)
- 5 projects, 10% - 20% each
- Midterm: 20%
- Week 7 (Tuesday), will cover first 6 weeks, all ML, Finance, Python topics
- Quizzes: 10%
- ~5-6, lowest will be dropped
- Participation: 5%
- Show up, pay attention
Note: The final project is quite challenging and takes the place of a Final Exam in this class. As such, it will be due either during Final Instruction Days or at the time of the Final Exam Period for this class.
Thresholds:
- A: 90% and above
- B: 80% and above
- C: 70% and above
- D: 60% and above
- F: below 60%
Note: Due to the project-oriented nature of the class, CS 7646 takes quite a bit of effort, but the grade distribution is normally quite high. In the past, most (> 50%) students have received an "A" with no grade adjustment. You should not expect there to be a curve in the class. (89.99% == B)
Minimum technical requirements
- You must have a computer that meets the typical requirements for a CS course at Ga Tech.
- Hardware: A computer with at least 4GB of RAM and CPU speed of at least 2.5GHz.
- OS:
- PC: Windows XP or higher with latest updates installed
- Mac: OS X 10.6 or higher with latest updates installed
- Linux: Any recent distribution that can support the required code libraries.
- Your computer must run Python, Numpy, Scipy, Pandas.
Office hours
Malcolm Haynes (TA), Tues/Thurs 1PM-2PM, CCB 1F commons area
David Byrd (Instructor), Tues/Thurs 4PM-5PM, CCB 1F commons area
Plagiarism
In most cases I expect that all submitted code will be written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write-ups you provide should have been created by you alone.
Collaboration is permitted only at the "whiteboard level". You may discuss general approaches and algorithms, including high-level pseudocode. You should not share code or line-by-line level pseudocode with other students.
Do not turn in code you found on the web. I also use github and StackOverflow...
There are no group projects in this class.
Class Policies
- Official communication is by email: We use piazza for discussions, but it is not an official communications channel. All official communications to you will be sent via t-square to your official GT email address. Similarly, you should communicate important items to us by email as well.
- Student responsibilities: Be aware of the deadlines posted on the schedule. Read your GT email every day. Start work on projects when they are announced, even if they are not open on t-square.
- Grade contest period: After a project grade is released you have 7 days to contest the grade. After that time projects will not be reevaluated. You must have a very specific issue with a compelling argument as to why your grade is incorrect. Example compelling argument: "The TA took 10 points off because I was missing a chart, but the chart is visible on page 5." Example not compelling argument: "I think I should have gotten more points, please regrade my project."
- Grade contest process: you must contact your grading TA via e-mail to contest a grade. Your TA will promptly let you know that your request has been received, but the resolution may take some time. Your assignment will be completely re-evaluated and your revised grade could be higher or lower. For auto-graded code assignments, if the TA corrects errors in your code or submission to make it run properly, you will be penalized a minimum of 10 points per error. For reports, only grading errors by the TA may be contested.
- Late policy: Assignments are due at 11:55PM Eastern Time on the assignment due date. Assignments turned in after 11:55PM are considered late. Assignments may be turned in up to three days late with a 10% penalty per day (1m - 23h59m late == one "day" late, etc). Assignments more than 72 hours late will not be accepted. There is no grace period for assignments already three days late.
- Exam scheduling: Exams will be held on specific days at specific times. If there is an emergency or other issue that requires changing the date of an exam for you, you will need to have it approved by the Dean of Students. You can apply for that here: http://www.deanofstudents.gatech.edu (under Resources -> Class Absences)