Difference between revisions of "Manipulating Financial Data in Python"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
− | ==Module 1: Reading, | + | ==Module 1: Reading, slicing and plotting stock data== |
* Overview of data we'll be working with: AAPL.csv, SPY.csv (note date order) | * Overview of data we'll be working with: AAPL.csv, SPY.csv (note date order) | ||
** Meaning of various columns | ** Meaning of various columns | ||
Line 10: | Line 10: | ||
* Plot | * Plot | ||
− | ==Module 2: Building | + | ==Module 2: Building a dataframe with lots of stocks== |
− | * | + | * What we want to end up with: Rows: Dates, Columns: Symbols |
* Step by step how to build it | * Step by step how to build it | ||
* SPY.csv will be our reference -- it trades every day the market is open. | * SPY.csv will be our reference -- it trades every day the market is open. | ||
Line 23: | Line 23: | ||
* [quiz: normalize at a different date] | * [quiz: normalize at a different date] | ||
− | ==Module 3: Numpy | + | ==Module 3: Numpy fundamentals== |
+ | *Numpy relationship to Pandas | ||
*Creating Arrays | *Creating Arrays | ||
**empty, zeros, ones | **empty, zeros, ones | ||
Line 35: | Line 36: | ||
*See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1 | *See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1 | ||
− | ==Module | + | ==Module 4: Statistical analysis of time series data== |
− | * | + | *Rolling statistics example |
− | * | + | **Read SPY |
− | * | + | **20 day rolling average |
− | * | + | **+- 20 day rolling stdev * 2 |
+ | **Plot above as Bollinger bands | ||
+ | *Discussion of daily returns, what they are, how to calculate | ||
+ | *[quiz: compute and plot SPY and XOM daily returns] | ||
+ | *Scatter plot (plot SPY vs XOM) | ||
+ | *Compare SPY vs GLD. How can we quantify these differences? | ||
+ | *Fit a line and plot it, print slope and corrcoef for SPY & XOM | ||
+ | *Discussion of correlation not the same as slope | ||
− | ==Module | + | ==Module 5: Incomplete data== |
− | * | + | *[for this lesson: need to create 4 assets: SPY no missing, X ends midway, Y begins midway, Z has periodic outages] |
− | * | + | *Read SPY |
− | * | + | **20 day rolling average |
− | * | + | **Plot |
− | * | + | *Attempt above with X, what happens? (incomplete data) |
− | * | + | *Look at data; NaN! |
+ | *What to do? | ||
+ | *Discussion & drawing of the types of incomplete data characterized by 4 examples above. | ||
+ | *What is the proper way to handle? | ||
+ | *[quiz: implement and plot fill forward on X] | ||
+ | *Show Pandas methods for forward fill and backward fill, plot rolling averages | ||
− | + | ==Module 6: Date and time== | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | ==Module 6: Date | ||
*Creating Date and Time | *Creating Date and Time | ||
*Date Mathematics | *Date Mathematics | ||
*Time Series Plotting | *Time Series Plotting | ||
+ | *Pandas date time features | ||
+ | *Slice dataframes at different frequencies (daily, weekly, monthly) | ||
+ | *Merge data together sampled at different frequencies | ||
==Module 7: Graphs Part I== | ==Module 7: Graphs Part I== | ||
==Module 8: Graphs Part II== | ==Module 8: Graphs Part II== |
Revision as of 12:57, 3 March 2015
Contents
Module 1: Reading, slicing and plotting stock data
- Overview of data we'll be working with: AAPL.csv, SPY.csv (note date order)
- Meaning of various columns
- The Pandas dataframe
- Read CSV into a dataframe (AAPL example)
- Slice according to dates
- [quiz: read SPY.csv and slice against different dates]
- Plot (note date order wrong)
- Sort
- Plot
Module 2: Building a dataframe with lots of stocks
- What we want to end up with: Rows: Dates, Columns: Symbols
- Step by step how to build it
- SPY.csv will be our reference -- it trades every day the market is open.
- Read SPY.csv, slice to date range, sort
- Read AAPL.csv, merge() into existing dataframe
- Repeat with GLD, IBM, GOOG
- Plot and display legend
- Observe: Scale not good, let's normalize
- Print some of the numbers
- Plot after normalization
- [quiz: normalize at a different date]
Module 3: Numpy fundamentals
- Numpy relationship to Pandas
- Creating Arrays
- empty, zeros, ones
- Basic Indexing and Slicing (start at 0 not 1)
- [quiz: print 2nd & 3rd columns]
- Index one array by another
- Reshaping
- Data Processing using Arrays
- Sum rows, Sum columns
- Statistics on columns: Mean, Median, stddev
- See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1
Module 4: Statistical analysis of time series data
- Rolling statistics example
- Read SPY
- 20 day rolling average
- +- 20 day rolling stdev * 2
- Plot above as Bollinger bands
- Discussion of daily returns, what they are, how to calculate
- [quiz: compute and plot SPY and XOM daily returns]
- Scatter plot (plot SPY vs XOM)
- Compare SPY vs GLD. How can we quantify these differences?
- Fit a line and plot it, print slope and corrcoef for SPY & XOM
- Discussion of correlation not the same as slope
Module 5: Incomplete data
- [for this lesson: need to create 4 assets: SPY no missing, X ends midway, Y begins midway, Z has periodic outages]
- Read SPY
- 20 day rolling average
- Plot
- Attempt above with X, what happens? (incomplete data)
- Look at data; NaN!
- What to do?
- Discussion & drawing of the types of incomplete data characterized by 4 examples above.
- What is the proper way to handle?
- [quiz: implement and plot fill forward on X]
- Show Pandas methods for forward fill and backward fill, plot rolling averages
Module 6: Date and time
- Creating Date and Time
- Date Mathematics
- Time Series Plotting
- Pandas date time features
- Slice dataframes at different frequencies (daily, weekly, monthly)
- Merge data together sampled at different frequencies