2.0 Helper Columns

This notebook adds “helper columns” which may need to be used in any strategy.

Import Packages

In [1]:
import numpy as np
import cPickle as pickle
shortUniverse = False

Load Pickle Data

First, we’ll load the pickle we developed in the previous section. The new prices dataframe is given a new name to reflect edits made during this notebook.

In [2]:
if shortUniverse:
    with open('intermediaries/pricesRaw.p', 'rb') as handle:
        prices = pickle.load(handle)
else:
    with open('intermediaries/pricesRaw-full.p', 'rb') as handle:
        prices = pickle.load(handle)

Adjust for Splits and Dividends

Although Yahoo includes an “adjusted close” column, they do not adjust the open, high and low columns. This is accomplished in the following function, and the adjusted close column is removed.

In [3]:
def adjust_prices(df):
    '''Adjusts df prices for dividends and splits.'''
    df['AdjFactor'] = df.groupby('symbol')['Adj Close'].shift(1) / df.groupby('symbol')['Close'].shift(1)
    df['Open'] = df['Open'] * df['AdjFactor']
    df['High'] = df['High'] * df['AdjFactor']
    df['Low'] = df['Low'] * df['AdjFactor']
    df['Close'] = df['Close'] * df['AdjFactor']
    new_df = df[['sector', 'symbol', 'Open', 'High', 'Low', 'Close', 'Volume']]
    new_df.columns = ['sector', 'symbol', 'open', 'high', 'low', 'close', 'volume']
    return new_df
In [4]:
prices = adjust_prices(prices)

Generate Returns

The following returns are generated for each observation:

  1. ret_cc: Return from yesterday’s Close through today’s Close.
  2. ret_co: Return from yesterday’s Close through today’s Open.
  3. ret_oc: Return from today’s Open through today’s Close.

All returns are log returns.

In [5]:
def add_returns(df):
    df['lastClose'] = df.groupby('symbol')['close'].shift(1)
    df['ret_cc'] = np.log(df['close'] / df['lastClose'])
    df['ret_oc'] = np.log(df['close'] / df['open'])
    df['ret_co'] = np.log(df['open'] / df['lastClose'])
    df.drop('lastClose', 1, inplace=True)
    return df
In [6]:
prices = add_returns(prices)

Output Example

The first 20 observations are displayed below.

In [7]:
prices.head(20)
Out[7]:
sector symbol open high low close volume ret_cc ret_oc ret_co
Date
2012-01-03 XLP PG NaN NaN NaN NaN 11565900 NaN NaN -725.284337
2012-01-04 XLP PG 58.168537 58.386789 57.915364 58.316954 10595400 NaN 0.002548 -725.435742
2012-01-05 XLP PG 58.002670 58.186002 57.836797 58.072506 10085300 -4.200510e-03 0.001203 -6.722692
2012-01-06 XLP PG 58.063780 58.168537 57.557434 57.932828 8421200 -2.408137e-03 -0.002258 NaN
2012-01-09 XLP PG 57.836796 58.273300 57.810607 58.177269 7836100 4.210499e-03 0.005870 -5.137983
2012-01-10 XLP PG 58.360600 58.378064 57.679655 57.906639 10318900 -4.662662e-03 -0.007809 NaN
2012-01-11 XLP PG 57.705844 57.889175 57.155851 57.339182 14676800 -9.847850e-03 -0.006374 NaN
2012-01-12 XLP PG 57.618545 57.618545 57.234419 57.452671 29758100 1.977309e-03 -0.002883 NaN
2012-01-13 XLP PG 57.234419 57.470135 57.068552 57.452671 33539100 -5.687474e-09 0.003806 -5.571164
2012-01-17 XLP PG 57.583623 58.063780 57.574891 57.845528 35728800 6.814655e-03 0.004538 -5.395281
2012-01-18 XLP PG 57.531246 57.732034 57.461404 57.644734 10386800 -3.477244e-03 0.001971 -6.229368
2012-01-19 XLP PG 58.043521 58.298717 57.568331 58.149122 9346600 8.711868e-03 0.001818 -6.310196
2012-01-20 XLP PG 57.753125 58.298716 57.594729 58.281119 12190100 2.267407e-03 0.009101 -4.699402
2012-01-23 XLP PG 57.999527 58.025925 57.005145 57.198741 14828100 -1.874629e-02 -0.013903 NaN
2012-01-24 XLP PG 56.881947 56.978746 56.353960 56.758751 9742400 -7.722049e-03 -0.002168 NaN
2012-01-25 XLP PG 56.653150 57.330739 56.477156 57.181143 10913100 7.414341e-03 0.009277 -4.680260
2012-01-26 XLP PG 57.286737 57.533131 56.961149 57.022747 11932100 -2.773915e-03 -0.004619 NaN
2012-01-27 XLP PG 56.591552 57.172342 56.019566 56.582757 10979900 -7.745980e-03 -0.000155 NaN
2012-01-30 XLP PG 56.142763 56.142763 55.465178 55.623575 12492000 -1.709717e-02 -0.009291 NaN
2012-01-31 XLP PG 55.808369 55.922767 55.174786 55.473979 11358600 -2.693050e-03 -0.006010 NaN

Save Output

Output is written to disk using the pickle package.

In [8]:
if shortUniverse:
    with open('intermediaries/prices.p', 'wb') as handle:
        pickle.dump(prices, handle)
else:
    with open('intermediaries/prices-full.p', 'wb') as handle:
        pickle.dump(prices, handle)

Potential Extensions

This notebook will one day include windsorized and scaled returns for use in constructing meta-models. This step will help create fits which are robust to outliers.

This notebook will one day include capm-adjusted returns.