Difference between revisions of "Manipulating Financial Data in Python"
Jump to navigation
Jump to search
Line 48: | Line 48: | ||
*Important data processing on Numpy arrays | *Important data processing on Numpy arrays | ||
*Example use with pandas too | *Example use with pandas too | ||
+ | |||
+ | Reading: "Python for Finance", Chapter 4: Data types and structures | ||
<b>script</B> | <b>script</B> |
Revision as of 09:17, 12 March 2015
Contents
- 1 Lesson 1: Reading, slicing and plotting stock data
- 2 Lesson 2: Working with many stocks at once
- 3 Lesson 3: The power of Numpy
- 4 Lesson 4: Statistical analysis of time series
- 5 Lesson 5: Incomplete data
- 6 Lesson 6: Computing statistics on a portfolio
- 7 Lesson 7: Optimizers: Building a parameterized model
- 8 Lesson 8: Optimizers: How to optimize a portfolio
Lesson 1: Reading, slicing and plotting stock data
- Overview of the data we'll be working with (from Yahoo!)
- Introduction to our primary library: Pandas
- Reading CSV data into Pandas
- Filtering to specific dates
- Sorting
- Plotting
Reading: "Python for Finance", Chapter 6: Financial time series
script
- Overview of data we'll be working with: AAPL.csv, SPY.csv (note date order)
- Meaning of various columns
- The Pandas dataframe
- Read CSV into a dataframe (AAPL example)
- Slice according to dates
- [quiz: read SPY.csv and slice against different dates]
- Plot (note date order wrong)
- Sort
- Plot
- Important note: Do not use "live" data for this course.
Lesson 2: Working with many stocks at once
- Our target dataframe structure
- Reading data for multiple stocks into the structure
- Plotting
- Normalizing
- Memoizing
script
- What we want to end up with: Rows: Dates, Columns: Symbols
- Step by step how to build it
- SPY.csv will be our reference -- it trades every day the market is open.
- Read SPY.csv, slice to date range, sort
- Read AAPL.csv, merge() into existing dataframe
- Repeat with GLD, IBM, GOOG
- Plot and display legend
- Observe: Scale not good, let's normalize
- Print some of the numbers
- Plot after normalization
- [quiz: normalize at a different date]
Lesson 3: The power of Numpy
- What is Numpy and how it relates to Pandas
- Why is Numpy powerful/important?
- Creating Numpy arrays
- Indexing and slicing Numpy arrays
- Important data processing on Numpy arrays
- Example use with pandas too
Reading: "Python for Finance", Chapter 4: Data types and structures
script
- Numpy relationship to Pandas
- Creating Arrays
- empty, zeros, ones
- Basic Indexing and Slicing (start at 0 not 1)
- [quiz: print 2nd & 3rd columns]
- Index one array by another
- Reshaping
- Data Processing using Arrays
- Sum rows, Sum columns
- Statistics on columns: Mean, Median, stddev
- See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1
Lesson 4: Statistical analysis of time series
- Rolling statistics on dataframes
- Mean
- Stdev
- Max
- Gross statistics on dataframes
- Sum
- Mean
- Stdev
- Distribution (histogram)
script
- Rolling statistics example
- Read SPY
- 20 day rolling average
- +- 20 day rolling stdev * 2
- Plot above as Bollinger bands
- Discussion of daily returns, what they are, how to calculate
- [quiz: compute and plot SPY and XOM daily returns]
- Show time series plot of daily rets
- Show histogram of daily rets
- Scatter plot (plot SPY vs XOM)
- Compare SPY vs GLD. How can we quantify these differences?
- Fit a line and plot it, print slope and corrcoef for SPY & XOM
- Discussion of correlation not the same as slope
Lesson 5: Incomplete data
- How incomplete data arises in financial data
- Different approaches to dealing with it
script
- [for this lesson: need to create 4 assets: SPY no missing, X ends midway, Y begins midway, Z has periodic outages]
- Read SPY
- 20 day rolling average
- Plot
- Attempt above with X, what happens? (incomplete data)
- Look at data; NaN!
- What to do?
- Discussion & drawing of the types of incomplete data characterized by 4 examples above.
- What is the proper way to handle?
- [quiz: implement and plot fill forward on X]
- Show Pandas methods for forward fill and backward fill, plot rolling averages
Lesson 6: Computing statistics on a portfolio
- Average daily return
- Volatility: stddev of daily return (don't count first day)
- Cumulative return
- Relationship between cumulative and daily
script
- Statistics we'll look at:
- Average daily return
- Stddev of daily return: volatility
- Total return
- Sharpe ratio
- Buy and hold: SPY, GLD, GOOG, XOM, $100K each
- Show how to read assets, normalize so each starts at $100K, goes forward.
- [quiz: compute total daily portfolio value and daily rets for portfolio]
- Show how to compute, avg daily rets, stdev, total ret, Sharpe ratio
Lesson 7: Optimizers: Building a parameterized model
- Problem statement for an optimizer (inputs, outputs, assumptions)
- How to build a parameterized model from real data using an optimizer
script
- What does an optimizer do?
- Show syntax of optimizer use
- Create fake data
- Try to fit it using an optimizer
- [quiz: add another type of curve to fit (e.g., sine)]
Lesson 8: Optimizers: How to optimize a portfolio
- Framing the portfolio problem for an optimizer
- Constraints for an optimizer
- Optimizing a portfolio
script
- Frame the portfolio optimization problem for the optimizer
- Add target return
- Plug the parts together in code for 4 assets
- [quiz: Add constraints on holdings]