Difference between revisions of "Marketsim"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
 
(13 intermediate revisions by 4 users not shown)
Line 1: Line 1:
==Draft==
+
==Updates & FAQs==
  
This project is in draft mode.  When it is finalized, this notice will be removed.
+
Updates:
  
==Updates & FAQs==
+
* 2018-03-08: Second FAQ answer revised.
 +
* 2018-10-16: Finalized
  
 
FAQs:
 
FAQs:
Line 9: Line 10:
 
* Q: How can I be sure that my function is returning a DataFrame? Q: Check the type of the returned item foo with <tt>type(foo)</tt>
 
* Q: How can I be sure that my function is returning a DataFrame? Q: Check the type of the returned item foo with <tt>type(foo)</tt>
  
* Q: Do we have to reject the order if the stock is not trading on that day?  A: Yes, but you should fill the data forward and back first, which should eliminate this possibility.
+
* Q: Do we have to reject the order if the stock is not trading on that day?  A: You should fill orders for any day the market is open, and reject orders for any day the market is closed. Suggested operation: Drop days when the market is not open (check by looking at SPY), fill forward, then fill back.
 
   
 
   
 
* Q: Is it true that the orders might not be in order in the csv? A: Yes.
 
* Q: Is it true that the orders might not be in order in the csv? A: Yes.
Line 31: Line 32:
  
 
Instructions:
 
Instructions:
* Download the appropriate zip file [[Media:marketsim.zip]]
+
* Download the appropriate zip file [[File:18fall_marketsim.zip]]
 
* Implement the <tt>compute_portvals()</tt> function in the file <tt>marketsim/marketsim.py</tt>.
 
* Implement the <tt>compute_portvals()</tt> function in the file <tt>marketsim/marketsim.py</tt>.
 
* The grading script is <tt>marketsim/grade_marketsim.py</tt>. For more details see here: [[ML4T_Software_Setup#Running_the_grading_scripts]]
 
* The grading script is <tt>marketsim/grade_marketsim.py</tt>. For more details see here: [[ML4T_Software_Setup#Running_the_grading_scripts]]
Line 126: Line 127:
 
Example orders files are available in the orders subdirectory.
 
Example orders files are available in the orders subdirectory.
  
Here are some additional test cases: [[Media:Testcases_mc2p1.zip|testcases_mc2p1.zip]]
+
Here are some additional test cases:  
 +
* [[Media:Testcases_mc2p1.zip|testcases_mc2p1.zip]]
 +
* [[Media:additional_orders.zip|additional_orders.zip]]
  
 
==Short example to check your code==
 
==Short example to check your code==
Line 176: Line 179:
  
 
<pre>
 
<pre>
Data Range: 2011-01-10 to 2011-12-20
+
Data Range: 2011-01-05 00:00:00 to 2011-01-20 00:00:00
  
 
Sharpe Ratio of Fund: -1.00015025363
 
Sharpe Ratio of Fund: -1.00015025363
Sharpe Ratio of $SPX: 0.0183389807443
+
Sharpe Ratio of $SPX: 0.882168679776
  
 
Cumulative Return of Fund: -0.00447059537671
 
Cumulative Return of Fund: -0.00447059537671
Cumulative Return of $SPX: -0.0224059854302
+
Cumulative Return of $SPX: 0.00289841448894
  
 
Standard Deviation of Fund: 0.00678073274458
 
Standard Deviation of Fund: 0.00678073274458
Standard Deviation of $SPX: 0.0149716091522
+
Standard Deviation of $SPX: 0.00544933521991
  
 
Average Daily Return of Fund: -0.000427210193308
 
Average Daily Return of Fund: -0.000427210193308
Average Daily Return of $SPX: 1.7295909534e-05
+
Average Daily Return of $SPX: 0.000302827205547
  
 
Final Portfolio Value: 993036.375
 
Final Portfolio Value: 993036.375
Line 199: Line 202:
 
We provide an example, <tt>orders.csv</tt> that you can use to test your code, and compare with others. All of these runs assume a starting portfolio of 1000000 ($1M).
 
We provide an example, <tt>orders.csv</tt> that you can use to test your code, and compare with others. All of these runs assume a starting portfolio of 1000000 ($1M).
 
<pre>
 
<pre>
Data Range: 2011-01-10 to 2011-12-20
+
Data Range: 2011-01-10 00:00:00 to 2011-12-20 00:00:00
  
 
Sharpe Ratio of Fund: 0.997654521878
 
Sharpe Ratio of Fund: 0.997654521878
Line 221: Line 224:
  
 
<pre>
 
<pre>
Data Range: 2011-01-10 to 2011-12-20
+
Data Range: 2011-01-14 00:00:00 to 2011-12-14 00:00:00
  
 
Sharpe Ratio of Fund: 0.552604907987
 
Sharpe Ratio of Fund: 0.552604907987
Sharpe Ratio of $SPX: 0.0183389807443
+
Sharpe Ratio of $SPX: -0.177203019906
  
 
Cumulative Return of Fund: 0.0538411196951
 
Cumulative Return of Fund: 0.0538411196951
Cumulative Return of $SPX: -0.0224059854302
+
Cumulative Return of $SPX: -0.0629581516192
  
 
Standard Deviation of Fund: 0.00728172910323
 
Standard Deviation of Fund: 0.00728172910323
Standard Deviation of $SPX: 0.0149716091522
+
Standard Deviation of $SPX: 0.0150564855724
  
 
Average Daily Return of Fund: 0.000253483085898
 
Average Daily Return of Fund: 0.000253483085898
Average Daily Return of $SPX: 1.7295909534e-05
+
Average Daily Return of $SPX: -0.000168071648902
  
 
Final Portfolio Value: 1051088.0915
 
Final Portfolio Value: 1051088.0915
Line 251: Line 254:
 
Be sure to follow these instructions diligently!
 
Be sure to follow these instructions diligently!
  
Via T-Square, submit as attachment (no zip files; refer to schedule for deadline):
+
Via Canvas, submit as attachment (no zip files; refer to schedule for deadline):
  
 
* Your code as <tt>marketsim.py</tt> (only the function <tt>compute_portvals()</tt> will be tested)
 
* Your code as <tt>marketsim.py</tt> (only the function <tt>compute_portvals()</tt> will be tested)
Line 290: Line 293:
 
* Unladen African swallows.
 
* Unladen African swallows.
 
* You may use the NumPy, SciPy, matplotlib and Pandas libraries.  Be sure you are using the correct versions.
 
* You may use the NumPy, SciPy, matplotlib and Pandas libraries.  Be sure you are using the correct versions.
* You may reuse sections of code (up to 5 lines) that you collected from other students or the internet.
 
 
* Code provided by the instructor, or allowed by the instructor to be shared.
 
* Code provided by the instructor, or allowed by the instructor to be shared.
 
* To read in orders files pandas.read_csv() is allowed.
 
* To read in orders files pandas.read_csv() is allowed.
Line 298: Line 300:
 
* Reading in data by any means other than the functions in util.py or pandas.read_csv()
 
* Reading in data by any means other than the functions in util.py or pandas.read_csv()
 
* Any libraries or modules not listed in the "allowed" section above.
 
* Any libraries or modules not listed in the "allowed" section above.
* Any code you did not write yourself (except for the 5 line rule in the "allowed" section).
+
* Any code you did not write yourself.
 
* Code that takes longer than 10 seconds to run.
 
* Code that takes longer than 10 seconds to run.
 
* Any Classes (other than Random) that create their own instance variables for later use (e.g., learners like kdtree).
 
* Any Classes (other than Random) that create their own instance variables for later use (e.g., learners like kdtree).

Latest revision as of 15:49, 16 October 2018

Updates & FAQs

Updates:

  • 2018-03-08: Second FAQ answer revised.
  • 2018-10-16: Finalized

FAQs:

  • Q: How can I be sure that my function is returning a DataFrame? Q: Check the type of the returned item foo with type(foo)
  • Q: Do we have to reject the order if the stock is not trading on that day? A: You should fill orders for any day the market is open, and reject orders for any day the market is closed. Suggested operation: Drop days when the market is not open (check by looking at SPY), fill forward, then fill back.
  • Q: Is it true that the orders might not be in order in the csv? A: Yes.
  • Q: Are we supposed to sort the date column after? A: Yes.
  • Q: If we were to sort, we can assume that multiple orders on the same date won't have any side-effect if they are executed in random orders? A: Yes.
  • Q: Should we assume the prices stay the same the whole day? A: Assume execution prices are adjusted close.
  • Q: So buying any stocks won't change our portfolio's value (but selling will), correct? A: If you mark the value of the portfolio to the market as of close. Neither buying or selling will affect the value.
  • Q: Do we return portvals daily for the entire periods including weekends, holidays and non-trading days? A: Only consider trading days. This should happen for you automatically when using get_data() from util.py
  • Q: It seems like my avg_daily_returns and std are off compared to the wiki's because of this. If not, do we count trading days only for when list of symbols (or SPY) are traded A: Assume the market was open iff SPY was traded.

Overview

In this project you will create a market simulator that accepts trading orders and keeps track of a portfolio's value over time and then assesses the performance of that portfolio.

Template

Instructions:

Part 1: Basic simulator (90 points)

Your job is to implement your market simulator as a function, compute_portvals() that returns a dataframe with one column. You should implement it within the file marketsim.py. It should adhere to the following API:

def compute_portvals(orders_file = "./orders/orders.csv", start_val = 1000000, commission = 9.95, impact = 0.005):
    # TODO: Your code here
    return portvals

The start date and end date of the simulation are the first and last dates with orders in the orders_file. The arguments are as follows:

  • orders_file is the name of a file from which to read orders, and
  • start_val is the starting value of the portfolio (initial cash available)
  • commission is the fixed amount in dollars charged for each transaction (both entry and exit)
  • impact is the amount the price moves against the trader compared to the historical data at each transaction

Return the result (portvals) as a single-column pandas.DataFrame (column name does not matter), containing the value of the portfolio for each trading day in the first column from start_date to end_date, inclusive.

The files containing orders are CSV files with the following columns:

  • Date (yyyy-mm-dd)
  • Symbol (e.g. AAPL, GOOG)
  • Order (BUY or SELL)
  • Shares (no. of shares to trade)

For example:

Date,Symbol,Order,Shares
2008-12-3,AAPL,BUY,130
2008-12-8,AAPL,SELL,130
2008-12-5,IBM,BUY,50

Your simulator should calculate the total value of the portfolio for each day using adjusted closing prices. The value for each day is cash plus the current value of equities. The resulting data frame should contain values like this:

2008-12-3 1000000
2008-12-4 1000010
2008-12-5 1000250
...

How it should work

Your code should keep account of how many shares of each stock are in the portfolio on each day and how much cash is available on each day. Note that negative shares and negative cash are possible. Negative shares mean that the portfolio is in a short position for that stock. Negative cash means that you've borrowed money from the broker.

When a BUY order occurs, you should add the appropriate number of shares to the count for that stock and subtract the appropriate cost of the shares from the cash account. The cost should be determined using the adjusted close price for that stock on that day.

When a SELL order occurs, it works in reverse: You should subtract the number of shares from the count and add to the cash account.

Evaluation

We will evaluate your code by calling compute_portvals() with multiple test cases. No other function in your code will be called by us, so do not depend on "main" code being called. Do not depend on global variables.

For debugging purposes, you should write your own additional helper function to call compute_portvals() with your own test cases. We suggest that you report the following factors:

  • Plot the price history over the trading period.
  • Sharpe ratio (Always assume you have 252 trading days in an year. And risk free rate = 0) of the total portfolio
  • Cumulative return of the total portfolio
  • Standard deviation of daily returns of the total portfolio
  • Average daily return of the total portfolio
  • Ending value of the portfolio

Part 2: Transaction Costs (10 points)

Note: We strongly encourage you to get the basic simulator working before you implement the transaction cost and market impact components of this project. Ok, now on to transaction costs.

Transaction costs are an important consideration for investing strategy. Transaction costs include things like commissions, slippage, market impact and tax considerations. High transaction costs encourage less frequent trading, and accordingly a search for strategies that pay out over longer periods of time rather than just intraday or over several days. For this project we will consider two components of transaction cost: Commissions and market impact:

  • Commissions: For each trade that you execute, charge a commission according to the parameter sent. Treat that as a deduction from your cash balance.
  • Market impact: For each trade that you execute, assume that the stock price moves against you according to the impact parameter. So if you are buying, assume the price goes up before your purchase. Similarly, if selling, assume the price drops 50 bps before the sale. For simplicity treat the market impact penalty as a deduction from your cash balance.

Both of these penalties should be applied for EACH transaction, for instance, depending on the parameter settings, a complete entry and exit will cost 2 * $9.95, plus 0.5% of the entry price and 0.5% of the exit price.

Part 3: Implement author() function (deduction if not implemented)

You should implement a function called author() that returns your Georgia Tech user ID as a string. This is the ID you use to log into t-square. It is not your 9 digit student number. Here is an example of how you might implement author():

    def author():
        return 'tb34' # replace tb34 with your Georgia Tech username.

And here's an example of how it could be called from a testing program:

    import marketsim as ms
    print ms.author()

Check the template code for examples. We are adding those to the repo now, but it might not be there if you check right away. Implementing this method correctly does not provide any points, but there will be a penalty for not implementing it.

Orders files to run your code on

Example orders files are available in the orders subdirectory.

Here are some additional test cases:

Short example to check your code

Here is a very very short example that you can use to check your code. Starting conditions:

start_val = 1000000

For the orders file orders-short.csv, the orders are:

Date,Symbol,Order,Shares
2011-01-05,AAPL,BUY,1500
2011-01-20,AAPL,SELL,1500

The daily value of the portfolio (spaces added to help things line up):

2011-01-05     997495.775
2011-01-06     997090.775
2011-01-07    1000660.775
2011-01-10    1010125.775
2011-01-11    1008910.775
2011-01-12    1013065.775
2011-01-13    1014940.775
2011-01-14    1019125.775
2011-01-18    1007425.775
2011-01-19    1004725.775
2011-01-20     993036.375

For reference, here are the adjusted close values for AAPL on the relevant days:

              AAPL
2011-01-05  332.57
2011-01-06  332.30
2011-01-07  334.68
2011-01-10  340.99
2011-01-11  340.18
2011-01-12  342.95
2011-01-13  344.20
2011-01-14  346.99
2011-01-18  339.19
2011-01-19  337.39
2011-01-20  331.26

The full results:

Data Range: 2011-01-05 00:00:00 to 2011-01-20 00:00:00

Sharpe Ratio of Fund: -1.00015025363
Sharpe Ratio of $SPX: 0.882168679776

Cumulative Return of Fund: -0.00447059537671
Cumulative Return of $SPX: 0.00289841448894

Standard Deviation of Fund: 0.00678073274458
Standard Deviation of $SPX: 0.00544933521991

Average Daily Return of Fund: -0.000427210193308
Average Daily Return of $SPX: 0.000302827205547

Final Portfolio Value: 993036.375

More comprehensive examples

orders.csv

We provide an example, orders.csv that you can use to test your code, and compare with others. All of these runs assume a starting portfolio of 1000000 ($1M).

Data Range: 2011-01-10 00:00:00 to 2011-12-20 00:00:00

Sharpe Ratio of Fund: 0.997654521878
Sharpe Ratio of $SPX: 0.0183389807443

Cumulative Return of Fund: 0.108872698544
Cumulative Return of $SPX: -0.0224059854302

Standard Deviation of Fund: 0.00730509916835
Standard Deviation of $SPX: 0.0149716091522

Average Daily Return of Fund: 0.000459098655493
Average Daily Return of $SPX: 1.7295909534e-05

Final Portfolio Value: 1106025.8065

orders2.csv

The other sample file is orders2.csv that you can use to test your code, and compare with others.

Data Range: 2011-01-14 00:00:00 to 2011-12-14 00:00:00

Sharpe Ratio of Fund: 0.552604907987
Sharpe Ratio of $SPX: -0.177203019906

Cumulative Return of Fund: 0.0538411196951
Cumulative Return of $SPX: -0.0629581516192

Standard Deviation of Fund: 0.00728172910323
Standard Deviation of $SPX: 0.0150564855724

Average Daily Return of Fund: 0.000253483085898
Average Daily Return of $SPX: -0.000168071648902

Final Portfolio Value: 1051088.0915

Hints & resources

Here is a video outlining an approach to solving this problem [youtube video].

Hint, use code like this to read in the orders file:

orders_df = pandas.read_csv(orders_file, index_col='Date', parse_dates=True, na_values=['nan'])

In terms of execution prices, you should assume you get the adjusted close price for the day of the trade.

What to turn in

Be sure to follow these instructions diligently!

Via Canvas, submit as attachment (no zip files; refer to schedule for deadline):

  • Your code as marketsim.py (only the function compute_portvals() will be tested)

Unlimited resubmissions are allowed up to the deadline for the project.

Rubric

Out of a total of 100 points:

  • Basic simulator: 90 points: 10 test cases: We will test your code against 10 cases (9 points per case) without transaction costs. Points per case are allocated as follows:
    • 2.0: Correct number of days reported in the dataframe (should be the number of trading days between the start date and end date, inclusive).
    • 5.0: Correct portfolio value on the last day +-0.1%
    • 1.0: Correct Sharpe Ratio +-0.1%
    • 1.0: Correct average daily return +-0.1%
  • Transaction costs: 10 points: 5 test cases: We will test your code against 5 cases as follows:
    • 2.0: Two test cases that evaluate commissions only (impact = 0)
    • 2.0: Two test cases that evaluate impact only (commission = 0)
    • 1.0: One test case with non-zero commission and impact.

Test Cases

Here are the test cases we used while grading. These are updated each semester, and released after grading.

Required, Allowed & Prohibited

Required:

  • Your project must be coded in Python 2.7.x.
  • Your code must run on one of the university-provided computers (e.g. buffet02.cc.gatech.edu).
  • When utilizing any of the example orders files code must run in less than 10 seconds on one of the university-provided computers.
  • To read in stock data, use only the functions in util.py,

Allowed:

  • You can develop your code on your personal machine, but it must also run successfully on one of the university provided machines or virtual images.
  • Your code may use standard Python libraries.
  • Unladen African swallows.
  • You may use the NumPy, SciPy, matplotlib and Pandas libraries. Be sure you are using the correct versions.
  • Code provided by the instructor, or allowed by the instructor to be shared.
  • To read in orders files pandas.read_csv() is allowed.

Prohibited:

  • Global variables.
  • Reading in data by any means other than the functions in util.py or pandas.read_csv()
  • Any libraries or modules not listed in the "allowed" section above.
  • Any code you did not write yourself.
  • Code that takes longer than 10 seconds to run.
  • Any Classes (other than Random) that create their own instance variables for later use (e.g., learners like kdtree).

Legacy