Difference between revisions of "Manipulating Financial Data in Python"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
Line 1: Line 1:
==Module 1: Reading, slicing and plotting stock data==
+
==Lesson 1: Reading, slicing and plotting stock data==
 
*Overview of the data we'll be working with (from Yahoo!)
 
*Overview of the data we'll be working with (from Yahoo!)
 
*Introduction to our primary library: Pandas
 
*Introduction to our primary library: Pandas
Line 19: Line 19:
 
* Important note: Do not use "live" data for this course.
 
* Important note: Do not use "live" data for this course.
  
==Module 2: Working with many stocks at once==
+
==Lesson 2: Working with many stocks at once==
 
*Our target dataframe structure
 
*Our target dataframe structure
 
*Reading data for multiple stocks into the structure
 
*Reading data for multiple stocks into the structure
Line 38: Line 38:
 
* [quiz: normalize at a different date]
 
* [quiz: normalize at a different date]
  
==Module 3: The power of Numpy==
+
==Lesson 3: The power of Numpy==
 
*What is Numpy and how it relates to Pandas
 
*What is Numpy and how it relates to Pandas
 
*Why is Numpy powerful/important?
 
*Why is Numpy powerful/important?
Line 58: Line 58:
 
*See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1
 
*See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1
  
==Module 4: Statistical analysis of time series==
+
==Lesson 4: Statistical analysis of time series==
 
*Rolling statistics on dataframes
 
*Rolling statistics on dataframes
 
**Mean
 
**Mean
Line 84: Line 84:
 
*Discussion of correlation not the same as slope
 
*Discussion of correlation not the same as slope
  
==Module 5: Incomplete data==
+
==Lesson 5: Incomplete data==
 
*How incomplete data arises in financial data
 
*How incomplete data arises in financial data
 
*Different approaches to dealing with it
 
*Different approaches to dealing with it
Line 101: Line 101:
 
*Show Pandas methods for forward fill and backward fill, plot rolling averages
 
*Show Pandas methods for forward fill and backward fill, plot rolling averages
  
==Module 6: Computing statistics on a portfolio==
+
==Lesson 6: Computing statistics on a portfolio==
 
*Average daily return
 
*Average daily return
 
*Volatility: stddev of daily return (don't count first day)
 
*Volatility: stddev of daily return (don't count first day)
Line 118: Line 118:
 
*Show how to compute, avg daily rets, stdev, total ret, Sharpe ratio
 
*Show how to compute, avg daily rets, stdev, total ret, Sharpe ratio
  
==Module 7: Optimizers: Building a parameterized model==
+
==Lesson 7: Optimizers: Building a parameterized model==
 
*Problem statement for an optimizer (inputs, outputs, assumptions)
 
*Problem statement for an optimizer (inputs, outputs, assumptions)
 
*How to build a parameterized model from real data using an optimizer
 
*How to build a parameterized model from real data using an optimizer
Line 129: Line 129:
 
*[quiz: add another type of curve to fit (e.g., sine)]
 
*[quiz: add another type of curve to fit (e.g., sine)]
  
==Module 8: Optimizers: How to optimize a portfolio==
+
==Lesson 8: Optimizers: How to optimize a portfolio==
 
*Framing the portfolio problem for an optimizer
 
*Framing the portfolio problem for an optimizer
 
*Constraints for an optimizer
 
*Constraints for an optimizer

Revision as of 23:18, 3 March 2015

Lesson 1: Reading, slicing and plotting stock data

  • Overview of the data we'll be working with (from Yahoo!)
  • Introduction to our primary library: Pandas
  • Reading CSV data into Pandas
  • Filtering to specific dates
  • Sorting
  • Plotting

script

  • Overview of data we'll be working with: AAPL.csv, SPY.csv (note date order)
    • Meaning of various columns
  • The Pandas dataframe
  • Read CSV into a dataframe (AAPL example)
  • Slice according to dates
  • [quiz: read SPY.csv and slice against different dates]
  • Plot (note date order wrong)
  • Sort
  • Plot
  • Important note: Do not use "live" data for this course.

Lesson 2: Working with many stocks at once

  • Our target dataframe structure
  • Reading data for multiple stocks into the structure
  • Plotting
  • Normalizing

script

  • What we want to end up with: Rows: Dates, Columns: Symbols
  • Step by step how to build it
  • SPY.csv will be our reference -- it trades every day the market is open.
  • Read SPY.csv, slice to date range, sort
  • Read AAPL.csv, merge() into existing dataframe
  • Repeat with GLD, IBM, GOOG
  • Plot and display legend
  • Observe: Scale not good, let's normalize
  • Print some of the numbers
  • Plot after normalization
  • [quiz: normalize at a different date]

Lesson 3: The power of Numpy

  • What is Numpy and how it relates to Pandas
  • Why is Numpy powerful/important?
  • Creating Numpy arrays
  • Indexing and slicing Numpy arrays
  • Important data processing on Numpy arrays

script

  • Numpy relationship to Pandas
  • Creating Arrays
    • empty, zeros, ones
  • Basic Indexing and Slicing (start at 0 not 1)
  • [quiz: print 2nd & 3rd columns]
  • Index one array by another
  • Reshaping
  • Data Processing using Arrays
    • Sum rows, Sum columns
    • Statistics on columns: Mean, Median, stddev
  • See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1

Lesson 4: Statistical analysis of time series

  • Rolling statistics on dataframes
    • Mean
    • Stdev
    • Max
  • Gross statistics on dataframes
    • Sum
    • Mean
    • Stdev
    • Distribution (histogram)

script

  • Rolling statistics example
    • Read SPY
    • 20 day rolling average
    • +- 20 day rolling stdev * 2
    • Plot above as Bollinger bands
  • Discussion of daily returns, what they are, how to calculate
  • [quiz: compute and plot SPY and XOM daily returns]
  • Show time series plot of daily rets
  • Show histogram of daily rets
  • Scatter plot (plot SPY vs XOM)
  • Compare SPY vs GLD. How can we quantify these differences?
  • Fit a line and plot it, print slope and corrcoef for SPY & XOM
  • Discussion of correlation not the same as slope

Lesson 5: Incomplete data

  • How incomplete data arises in financial data
  • Different approaches to dealing with it

script

  • [for this lesson: need to create 4 assets: SPY no missing, X ends midway, Y begins midway, Z has periodic outages]
  • Read SPY
    • 20 day rolling average
    • Plot
  • Attempt above with X, what happens? (incomplete data)
  • Look at data; NaN!
  • What to do?
  • Discussion & drawing of the types of incomplete data characterized by 4 examples above.
  • What is the proper way to handle?
  • [quiz: implement and plot fill forward on X]
  • Show Pandas methods for forward fill and backward fill, plot rolling averages

Lesson 6: Computing statistics on a portfolio

  • Average daily return
  • Volatility: stddev of daily return (don't count first day)
  • Cumulative return
  • Relationship between cumulative and daily

script

  • Statistics we'll look at:
    • Average daily return
    • Stddev of daily return: volatility
    • Total return
    • Sharpe ratio
  • Buy and hold: SPY, GLD, GOOG, XOM, $100K each
  • Show how to read assets, normalize so each starts at $100K, goes forward.
  • [quiz: compute total daily portfolio value and daily rets for portfolio]
  • Show how to compute, avg daily rets, stdev, total ret, Sharpe ratio

Lesson 7: Optimizers: Building a parameterized model

  • Problem statement for an optimizer (inputs, outputs, assumptions)
  • How to build a parameterized model from real data using an optimizer

script

  • What does an optimizer do?
  • Show syntax of optimizer use
  • Create fake data
  • Try to fit it using an optimizer
  • [quiz: add another type of curve to fit (e.g., sine)]

Lesson 8: Optimizers: How to optimize a portfolio

  • Framing the portfolio problem for an optimizer
  • Constraints for an optimizer
  • Optimizing a portfolio

script

  • Frame the portfolio optimization problem for the optimizer
  • Add target return
  • Plug the parts together in code for 4 assets
  • [quiz: Add constraints on holdings]