1.0 Download Data

In [1]:
import datetime as dt
import pandas as pd
from pandas_datareader import data
import cPickle as pickle

Load Universe File

I created a CSV listing all stocks included in SPDR’s sector funds as of January 24th, 2016. It is loaded as a pandas dataframe. In order to make calls to Yahoo, all periods are replaced with hyphens.

Set shortUniverse=True to limit to first four stocks and a shorter timeframe. This is useful for debugging purposes.

In [2]:
universe = pd.read_csv('inputs/universe.csv', index_col='symbol')
universe.index = map(lambda x: x.replace('.', '-'), universe.index)
In [3]:
shortUniverse = True

Pull Price History

Price history is pulled from Yahoo and concatenated into a single multi-index dataframe. Stocks were pulled from January 1st 2010 through December 31st, 2015.

The head of the dataframe is displayed below.

In [4]:
def pull_universe(uni, sdate, edate):
    '''Expects a universe csv. Returns a dataframe of prices.
    
    Might not work if an error is generated on the first stock in the index.
    Performance and stability could be improved through list comprehension
    and a single pd.concat operation.'''
    
    # Start by pulling one stock.
    df = data.DataReader(uni.index[0],data_source='yahoo', start=sdate, end=edate)
    df['symbol'] = uni.index[0]
    df['sector'] = uni.loc[uni.index[0],'sector']
    
    #Loop through remaining stocks. Append to prices file.
    for Tic in uni.index[1:]:
        try:
            df_new = data.DataReader(Tic,data_source='yahoo', start=sdate, end=edate)
            df_new['symbol'] = Tic
            df_new['sector'] = uni.loc[Tic,'sector']
            df = pd.concat([df, df_new])
        except IOError:
            pass

    return df
In [5]:
if shortUniverse:
    prices = pull_universe(uni=universe.iloc[:3, ], sdate=dt.datetime(2015, 1, 1), edate=dt.datetime(2015, 12, 31))
else:
    prices = pull_universe(uni=universe, sdate=dt.datetime(2000, 1, 1), edate=dt.datetime(2015, 12, 31))
In [6]:
prices.head()
Out[6]:
Open High Low Close Volume Adj Close symbol sector
Date
2015-01-02 90.839996 91.000000 89.919998 90.440002 7251400 86.817403 PG XLP
2015-01-05 90.230003 91.000000 89.849998 90.010002 8626100 86.404627 PG XLP
2015-01-06 90.309998 90.559998 89.260002 89.599998 7791200 86.011046 PG XLP
2015-01-07 89.940002 90.370003 89.559998 90.070000 5986600 86.462221 PG XLP
2015-01-08 90.480003 91.230003 90.129997 91.099998 6823300 87.450963 PG XLP

Save Output

Output is written to disk using the pickle package.

In [7]:
if shortUniverse:
    with open('intermediaries/pricesRaw.p', 'wb') as handle:
        pickle.dump(prices, handle)
else:
    with open('intermediaries/pricesRaw-full.p', 'wb') as handle:
        pickle.dump(prices, handle)

Notes on Working with Price Dataframe

Please note that the shift method will shift column values from one symbol to the next. Proceed all shift methods with the groupby method. For example:

prices['last_volume'] = prices.groupby(level='symbol')['Volume'].shift(1)

As you can see, last_volume is NaN for the first observation of all stocks.

In [13]:
prices['last_volume'] = prices.groupby('symbol')['Volume'].shift(1)
prices.loc[prices['symbol']==universe.index[7], :].head()
Out[13]:
Open High Low Close Volume Adj Close symbol sector lastVol last_volume
Date
2012-01-03 33.450001 33.700001 33.040001 33.060001 7248100 30.228668 WBA XLP NaN NaN
2012-01-04 33.009998 33.080002 32.410000 32.830002 8165300 30.018366 WBA XLP 7248100 7248100
2012-01-05 32.490002 33.150002 32.459999 32.720001 9351200 29.917786 WBA XLP 8165300 8165300
2012-01-06 32.730000 33.200001 32.320000 33.080002 8825400 30.246955 WBA XLP 9351200 9351200
2012-01-09 33.099998 33.520000 32.840000 33.500000 7636500 30.630984 WBA XLP 8825400 8825400