Difference between revisions of "Manipulating Financial Data in Python"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
Line 1: Line 1:
==Module 1: Reading, Slicing and Plotting Stock Data==
+
==Module 1: Reading, slicing and plotting stock data==
 
* Overview of data we'll be working with: AAPL.csv, SPY.csv (note date order)
 
* Overview of data we'll be working with: AAPL.csv, SPY.csv (note date order)
 
** Meaning of various columns
 
** Meaning of various columns
Line 10: Line 10:
 
* Plot
 
* Plot
  
==Module 2: Building and plotting a dataframe with lots of stocks==
+
==Module 2: Building a dataframe with lots of stocks==
* Overview of what we want to end up with: Rows: Dates, Columns: Symbols
+
* What we want to end up with: Rows: Dates, Columns: Symbols
 
* Step by step how to build it
 
* Step by step how to build it
 
* SPY.csv will be our reference -- it trades every day the market is open.
 
* SPY.csv will be our reference -- it trades every day the market is open.
Line 23: Line 23:
 
* [quiz: normalize at a different date]
 
* [quiz: normalize at a different date]
  
==Module 3: Numpy Fundamentals==
+
==Module 3: Numpy fundamentals==
 +
*Numpy relationship to Pandas
 
*Creating Arrays
 
*Creating Arrays
 
**empty, zeros, ones
 
**empty, zeros, ones
Line 35: Line 36:
 
*See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1
 
*See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1
  
==Module 2: Pandas DS- Series==
+
==Module 4: Statistical analysis of time series data==
*Working with index
+
*Rolling statistics example
*Operations
+
**Read SPY
*Filtering
+
**20 day rolling average
*Handling Incomplete Data
+
**+- 20 day rolling stdev * 2
 +
**Plot above as Bollinger bands
 +
*Discussion of daily returns, what they are, how to calculate
 +
*[quiz: compute and plot SPY and XOM daily returns]
 +
*Scatter plot (plot SPY vs XOM)
 +
*Compare SPY vs GLD.  How can we quantify these differences?
 +
*Fit a line and plot it, print slope and corrcoef for SPY & XOM
 +
*Discussion of correlation not the same as slope
  
==Module 3: Pandas DS- Data Frame==
+
==Module 5: Incomplete data==
*Creating Data frame
+
*[for this lesson: need to create 4 assets: SPY no missing, X ends midway, Y begins midway, Z has periodic outages]
*Operations
+
*Read SPY
*Columns and rows
+
**20 day rolling average
*Essential Function
+
**Plot
*Reindexing
+
*Attempt above with X, what happens? (incomplete data)
*Indexing and Filtering
+
*Look at data; NaN!
 +
*What to do?
 +
*Discussion & drawing of the types of incomplete data characterized by 4 examples above.
 +
*What is the proper way to handle?
 +
*[quiz: implement and plot fill forward on X]
 +
*Show Pandas methods for forward fill and backward fill, plot rolling averages
  
==Module 4: Data Analysis- Reading/Writing Data==
+
==Module 6: Date and time==
*Importing Data using Pandas
 
*Importing data without pandas
 
*Saving and exporting data using pandas
 
*Saving and exporting data without pandas
 
 
 
==Module 5==
 
*Pre-processing Data
 
*Statistical Functions for Analysis
 
 
 
==Module 6: Date And Time==
 
 
*Creating Date and Time
 
*Creating Date and Time
 
*Date Mathematics
 
*Date Mathematics
 
*Time Series Plotting
 
*Time Series Plotting
 +
*Pandas date time features
 +
*Slice dataframes at different frequencies (daily, weekly, monthly)
 +
*Merge data together sampled at different frequencies
  
 
==Module 7: Graphs Part I==
 
==Module 7: Graphs Part I==
  
 
==Module 8: Graphs Part II==
 
==Module 8: Graphs Part II==

Revision as of 12:57, 3 March 2015

Module 1: Reading, slicing and plotting stock data

  • Overview of data we'll be working with: AAPL.csv, SPY.csv (note date order)
    • Meaning of various columns
  • The Pandas dataframe
  • Read CSV into a dataframe (AAPL example)
  • Slice according to dates
  • [quiz: read SPY.csv and slice against different dates]
  • Plot (note date order wrong)
  • Sort
  • Plot

Module 2: Building a dataframe with lots of stocks

  • What we want to end up with: Rows: Dates, Columns: Symbols
  • Step by step how to build it
  • SPY.csv will be our reference -- it trades every day the market is open.
  • Read SPY.csv, slice to date range, sort
  • Read AAPL.csv, merge() into existing dataframe
  • Repeat with GLD, IBM, GOOG
  • Plot and display legend
  • Observe: Scale not good, let's normalize
  • Print some of the numbers
  • Plot after normalization
  • [quiz: normalize at a different date]

Module 3: Numpy fundamentals

  • Numpy relationship to Pandas
  • Creating Arrays
    • empty, zeros, ones
  • Basic Indexing and Slicing (start at 0 not 1)
  • [quiz: print 2nd & 3rd columns]
  • Index one array by another
  • Reshaping
  • Data Processing using Arrays
    • Sum rows, Sum columns
    • Statistics on columns: Mean, Median, stddev
  • See: http://wiki.quantsoftware.org/index.php?title=Numpy_Tutorial_1

Module 4: Statistical analysis of time series data

  • Rolling statistics example
    • Read SPY
    • 20 day rolling average
    • +- 20 day rolling stdev * 2
    • Plot above as Bollinger bands
  • Discussion of daily returns, what they are, how to calculate
  • [quiz: compute and plot SPY and XOM daily returns]
  • Scatter plot (plot SPY vs XOM)
  • Compare SPY vs GLD. How can we quantify these differences?
  • Fit a line and plot it, print slope and corrcoef for SPY & XOM
  • Discussion of correlation not the same as slope

Module 5: Incomplete data

  • [for this lesson: need to create 4 assets: SPY no missing, X ends midway, Y begins midway, Z has periodic outages]
  • Read SPY
    • 20 day rolling average
    • Plot
  • Attempt above with X, what happens? (incomplete data)
  • Look at data; NaN!
  • What to do?
  • Discussion & drawing of the types of incomplete data characterized by 4 examples above.
  • What is the proper way to handle?
  • [quiz: implement and plot fill forward on X]
  • Show Pandas methods for forward fill and backward fill, plot rolling averages

Module 6: Date and time

  • Creating Date and Time
  • Date Mathematics
  • Time Series Plotting
  • Pandas date time features
  • Slice dataframes at different frequencies (daily, weekly, monthly)
  • Merge data together sampled at different frequencies

Module 7: Graphs Part I

Module 8: Graphs Part II