Difference between revisions of "Manipulating Financial Data in Python"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
 
(39 intermediate revisions by 4 users not shown)
Line 1: Line 1:
==Module 1: Reading, slicing and plotting stock data==
+
==Lesson 1: Reading, slicing and plotting stock data==
 
*Overview of the data we'll be working with (from Yahoo!)
 
*Overview of the data we'll be working with (from Yahoo!)
 
*Introduction to our primary library: Pandas
 
*Introduction to our primary library: Pandas
 
*Reading CSV data into Pandas
 
*Reading CSV data into Pandas
 
*Filtering to specific dates
 
*Filtering to specific dates
*Sorting
 
 
*Plotting
 
*Plotting
  
<B>script</B>
+
Reading: "Python for Finance", Chapter 6: Financial time series
* Overview of data we'll be working with: AAPL.csv, SPY.csv (note date order)
 
** Meaning of various columns
 
* The Pandas dataframe
 
* Read CSV into a dataframe (AAPL example)
 
* Slice according to dates
 
* [quiz: read SPY.csv and slice against different dates]
 
* Plot (note date order wrong)
 
* Sort
 
* Plot
 
* Important note: Do not use "live" data for this course.
 
  
==Module 2: Working with many stocks at once==
+
==Lesson 2: Working with many stocks at once==
*Our target dataframe structure
+
*Our target data frame structure
 +
*Address reverse order issue
 
*Reading data for multiple stocks into the structure
 
*Reading data for multiple stocks into the structure
 +
*Date slicing
 +
*Symbol slicing
 
*Plotting
 
*Plotting
 
*Normalizing
 
*Normalizing
  
<B>script</B>
+
==Lesson 3: The power of Numpy==
* What we want to end up with: Rows: Dates, Columns: Symbols
 
* Step by step how to build it
 
* SPY.csv will be our reference -- it trades every day the market is open.
 
* Read SPY.csv, slice to date range, sort
 
* Read AAPL.csv, merge() into existing dataframe
 
* Repeat with GLD, IBM, GOOG
 
* Plot and display legend
 
* Observe: Scale not good, let's normalize
 
* Print some of the numbers
 
* Plot after normalization
 
* [quiz: normalize at a different date]
 
 
 
==Module 3: The power of Numpy==
 
 
*What is Numpy and how it relates to Pandas
 
*What is Numpy and how it relates to Pandas
 
*Why is Numpy powerful/important?
 
*Why is Numpy powerful/important?
Line 44: Line 23:
 
*Indexing and slicing Numpy arrays
 
*Indexing and slicing Numpy arrays
 
*Important data processing on Numpy arrays
 
*Important data processing on Numpy arrays
 +
*Example use with pandas too
  
<b>script</B>
+
Reading: "Python for Finance", Chapter 4: Data types and structures
*Numpy relationship to Pandas
 
*Creating Arrays
 
**empty, zeros, ones
 
*Basic Indexing and Slicing (start at 0 not 1)
 
*[quiz: print 2nd & 3rd columns]
 
*Index one array by another
 
*Reshaping
 
*Data Processing using Arrays
 
**Sum rows, Sum columns
 
**Statistics on columns: Mean, Median, stddev
 
*See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1
 
  
==Module 4: Statistical analysis of time series==
+
==Lesson 4: Statistical analysis of time series==
 +
*Gross statistics on dataframes
 
*Rolling statistics on dataframes
 
*Rolling statistics on dataframes
**Mean
+
*Plotting a technical indicator (Bollinger Bands)
**Stdev
 
**Max
 
*Gross statistics on dataframes
 
**Sum
 
**Mean
 
**Stdev
 
**Distribution (histogram)
 
  
<B>script</B>
+
Reading: "Python for Finance", Chapter 6: Financial time series
*Rolling statistics example
 
**Read SPY
 
**20 day rolling average
 
**+- 20 day rolling stdev * 2
 
**Plot above as Bollinger bands
 
*Discussion of daily returns, what they are, how to calculate
 
*[quiz: compute and plot SPY and XOM daily returns]
 
*Show time series plot of daily rets
 
*Show histogram of daily rets
 
*Scatter plot (plot SPY vs XOM)
 
*Compare SPY vs GLD.  How can we quantify these differences?
 
*Fit a line and plot it, print slope and corrcoef for SPY & XOM
 
*Discussion of correlation not the same as slope
 
  
==Module 5: Incomplete data==
+
==Lesson 5: Incomplete data==
 
*How incomplete data arises in financial data
 
*How incomplete data arises in financial data
 
*Different approaches to dealing with it
 
*Different approaches to dealing with it
  
<b>script</b>
+
==Lesson 6: Histograms and scatter plots==
*[for this lesson: need to create 4 assets: SPY no missing, X ends midway, Y begins midway, Z has periodic outages]
 
*Read SPY
 
**20 day rolling average
 
**Plot
 
*Attempt above with X, what happens? (incomplete data)
 
*Look at data; NaN!
 
*What to do?
 
*Discussion & drawing of the types of incomplete data characterized by 4 examples above.
 
*What is the proper way to handle?
 
*[quiz: implement and plot fill forward on X]
 
*Show Pandas methods for forward fill and backward fill, plot rolling averages
 
  
==Module 6: Computing statistics on a portfolio==
+
* Histogram of daily returns
 +
* Compare SPY with XOM
 +
* Scatter plots
 +
* Correlation is not slope!
 +
* Compare SPY vs XOM, with SPY vs GLD scatter plots
 +
 
 +
Reading: "Python for Finance", Chapter 5: Data Visualization
 +
 
 +
==Lesson 7: Sharpe ratio & other portfolio statistics==
 +
*Speed up reading data by memoizing
 
*Average daily return
 
*Average daily return
 
*Volatility: stddev of daily return (don't count first day)
 
*Volatility: stddev of daily return (don't count first day)
 
*Cumulative return
 
*Cumulative return
 
*Relationship between cumulative and daily
 
*Relationship between cumulative and daily
 +
*Sharpe Ratio
 +
*How to model a buy and hold portfolio
  
<b>script</b>
+
==Lesson 8: Optimizers: Building a parameterized model==
*Statistics we'll look at:
+
*What does an optimizer do?
**Average daily return
+
*Syntax of optimizer use
**Stddev of daily return: volatility
 
**Total return
 
**Sharpe ratio
 
*Buy and hold: SPY, GLD, GOOG, XOM, $100K each
 
*Show how to read assets, normalize so each starts at $100K, goes forward.
 
*[quiz: compute total daily portfolio value and daily rets for portfolio]
 
*Show how to compute, avg daily rets, stdev, total ret, Sharpe ratio
 
 
 
==Module 7: Optimizers: Building a parameterized model==
 
 
*Problem statement for an optimizer (inputs, outputs, assumptions)
 
*Problem statement for an optimizer (inputs, outputs, assumptions)
*How to build a parameterized model from real data using an optimizer
+
*How to find X that minimizes f(X) with a minimizer
 
+
*How to build a parameterized polynomial model from real data using an optimizer
<b>script</b>
 
*What does an optimizer do?
 
*Show syntax of optimizer use
 
*Create fake data
 
*Try to fit it using an optimizer
 
*[quiz: add another type of curve to fit (e.g., sine)]
 
  
==Module 8: Optimizers: How to optimize a portfolio==
+
==Lesson 9: Optimizers: How to optimize a portfolio==
*Framing the portfolio problem for an optimizer
+
*What does it mean to "optimize" a portfolio
*Constraints for an optimizer
+
*Framing the problem for an optimizer
*Optimizing a portfolio
+
*Constraints on X for an optimizer
 +
*Ranges on X for an optimizer
  
<b>script</b>
+
Reading: "Python for Finance", Chapter 11: Statistics-Portfolio Optimization
*Frame the portfolio optimization problem for the optimizer
 
*Add target return
 
*Plug the parts together in code for 4 assets
 
*[quiz: Add constraints on holdings]
 

Latest revision as of 19:10, 24 August 2016

Lesson 1: Reading, slicing and plotting stock data

  • Overview of the data we'll be working with (from Yahoo!)
  • Introduction to our primary library: Pandas
  • Reading CSV data into Pandas
  • Filtering to specific dates
  • Plotting

Reading: "Python for Finance", Chapter 6: Financial time series

Lesson 2: Working with many stocks at once

  • Our target data frame structure
  • Address reverse order issue
  • Reading data for multiple stocks into the structure
  • Date slicing
  • Symbol slicing
  • Plotting
  • Normalizing

Lesson 3: The power of Numpy

  • What is Numpy and how it relates to Pandas
  • Why is Numpy powerful/important?
  • Creating Numpy arrays
  • Indexing and slicing Numpy arrays
  • Important data processing on Numpy arrays
  • Example use with pandas too

Reading: "Python for Finance", Chapter 4: Data types and structures

Lesson 4: Statistical analysis of time series

  • Gross statistics on dataframes
  • Rolling statistics on dataframes
  • Plotting a technical indicator (Bollinger Bands)

Reading: "Python for Finance", Chapter 6: Financial time series

Lesson 5: Incomplete data

  • How incomplete data arises in financial data
  • Different approaches to dealing with it

Lesson 6: Histograms and scatter plots

  • Histogram of daily returns
  • Compare SPY with XOM
  • Scatter plots
  • Correlation is not slope!
  • Compare SPY vs XOM, with SPY vs GLD scatter plots

Reading: "Python for Finance", Chapter 5: Data Visualization

Lesson 7: Sharpe ratio & other portfolio statistics

  • Speed up reading data by memoizing
  • Average daily return
  • Volatility: stddev of daily return (don't count first day)
  • Cumulative return
  • Relationship between cumulative and daily
  • Sharpe Ratio
  • How to model a buy and hold portfolio

Lesson 8: Optimizers: Building a parameterized model

  • What does an optimizer do?
  • Syntax of optimizer use
  • Problem statement for an optimizer (inputs, outputs, assumptions)
  • How to find X that minimizes f(X) with a minimizer
  • How to build a parameterized polynomial model from real data using an optimizer

Lesson 9: Optimizers: How to optimize a portfolio

  • What does it mean to "optimize" a portfolio
  • Framing the problem for an optimizer
  • Constraints on X for an optimizer
  • Ranges on X for an optimizer

Reading: "Python for Finance", Chapter 11: Statistics-Portfolio Optimization