3.0 Testing Strategies

This notebook illustrates how signals will be judged and defines formulas which will be used repeatedly.

Import Packages

This package import includes functions developed in section 2.

In [1]:
import numpy as np
import pandas as pd
import datetime as dt
import seaborn as sns
import math
import helper_functions as hf
%matplotlib inline
from scipy import stats
import cPickle as pickle

shortUniverse = True

Load Pickle Data

First, we’ll load the pickle we developed in the previous section. The new prices dataframe is given a new name to reflect edits made during this notebook.

In [2]:
if shortUniverse:
    with open('intermediaries/prices.p', 'rb') as handle:
        prices = pickle.load(handle)
else:
    with open('intermediaries/prices-full.p', 'rb') as handle:
        prices = pickle.load(handle)

Example Strategy

For illustrative purposes, a random “signal” has been generated for every symbol for every day with the following probabilities:

  1. bull – 20%
  2. bear – 10%
  3. neut – 70%
In [3]:
def add_fake_signals(df):
    df['rand'] = np.random.uniform(size=len(df.index))
    df['fake_signal'] = 'neut'
    df.loc[df['rand'] <= .2, 'fake_signal'] = 'bull'
    df.loc[df['rand'] >= .9, 'fake_signal'] = 'bear'
    df.drop('rand', axis=1, inplace=True)
    return df
In [4]:
prices = add_fake_signals(prices)

Signal Age Columns

For now, a strategy will assume that we can go long or short at the closing price on the day a signal occurs. In practice, this can be achieved by looking for signals shortly before the market closes and then executing. Subsequent daily returns are then tracked. To prevent double counting, positions are held until a signal is confirmed in either the bullish or bearish direction.

If a “bull” signals occur on Wednesday and Friday, then we track daily returns from Wednesday close to Friday close to judge the quality of signal 1, and from Friday close onwards to judge the quality of signal 2.

Two additional columns are added to aid in this calculation:

  1. bull_age – Tracks the age of the most recent signal if it is a bull signal.
  2. bear_age – Tracks the age of the most recent signal if it is a bear signal.
In [5]:
def add_age_num(df, cType, strat):
    """Adds the age of a given fake_signal.
    
    cType should be either 'bull' or 'bear'.
    strat should be the name of a column of fake_signals with 'bull' and 'bear' values.
    Can be improved through parallel functions. This is an expensive function to run.
    """
    if cType == 'bull':
        same, diff = 'bull', 'bear'
    else:
        same, diff = 'bear', 'bull'

    df[strat + '_age_' + cType] = np.nan
    for i in range(1, len(df[strat + '_age_' + cType]) - 1):
    	# if new stock, then age = 0
        if df.ix[i - 1, 'symbol'] != df.ix[i, 'symbol']:
            df.ix[i, strat + '_age_' + cType] = 0
        # if last fake_signal was same, age = 1
        elif df.ix[i - 1, strat] == same:
            df.ix[i, strat + '_age_' + cType] = 1
        # if last fake_signal was diff, age = 0
        elif df.ix[i - 1, strat] == diff:
            df.ix[i, strat + '_age_' + cType] = 0
        # if last fake_signal was neut and last age >0, age=last age + 1
        elif ((df.ix[i - 1, strat] == 'neut') & (df.ix[i - 1, strat + '_age_' + cType] > 0)):
            df.ix[i, strat + '_age_' + cType] = df.ix[i - 1, strat + '_age_' + cType] + 1
        # if last fake_signal was neut and last age == 0, age = 0
        elif ((df.ix[i - 1, strat] == 'neut') & (df.ix[i - 1, strat + '_age_' + cType] == 0)):
            df.ix[i, strat + '_age_' + cType] = 0
        # This should be triggered near the beginning when we haven't seen anything yet.
        else:
            df.ix[i, strat + '_age_' + cType] = 0

    return df

def add_age_nums(df, strat):
    """Wrapper function to generate both bull and bear signals.
    
    This function is currently iterative and can take a long time to run."""
    df = add_age_num(df, 'bull', strat)
    df = add_age_num(df, 'bear', strat)
    return df
In [6]:
prices = add_age_nums(prices, strat='fake_signal')

Judging Returns

Now that we know the age and type of the most recent signal, we can simply group by this column to determine the expected return of a given signal at a given price.

This is demonstrated for our random strategy below. As you would expect, this strategy’s mean return is basically zero.

The columns testStat and p-value give the two-tailed t-test p-value that the mean is different from zero. If a p-value is below 0.05, then the mean is statistically different from zero at the 95% confidence level.

In [7]:
# http://libguides.library.kent.edu/SPSS/OneSampletTest

def get_results_table(df, strat, lag):
    resultsBull = df[[strat + '_age_' + 'bull', 'ret_cc']].groupby(strat + '_age_' + 'bull').describe().unstack(1)
    resultsBull = resultsBull[0:lag]['ret_cc']
    resultsBull['testStat'] = resultsBull['mean'] / resultsBull['std'] / np.sqrt(resultsBull['count'])
    resultsBull['p-value'] = stats.t.sf(np.abs(resultsBull['testStat']), resultsBull['count']-1)*2
    resultsBull.drop(['25%', '75%'], axis=1, inplace=True)
    
    resultsBear = df[[strat + '_age_' + 'bear', 'ret_cc']].groupby(strat + '_age_' + 'bear').describe().unstack(1)
    resultsBear = resultsBear[0:lag]['ret_cc']
    resultsBear['testStat'] = resultsBear['mean'] / resultsBear['std'] / np.sqrt(resultsBear['count'])
    resultsBear['p-value'] = stats.t.sf(np.abs(resultsBear['testStat']), resultsBear['count']-1)*2
    resultsBear.drop(['25%', '75%'], axis=1, inplace=True)
    
    results = pd.concat({'bull':resultsBull, 'bear':resultsBear}, axis=1)
    results.index.name = 'signal_age'
    return results

Putting it all together

The above steps are combined into one convenient wrapper function.

In [8]:
def strat_results(df, strat, lag):
    '''Returns a results table for a given strategy for a given number of lags.
    
    The prices table fed into df must contain a column strat.
    Column strat should contain "bull" and "bear" values.'''
    df = add_age_nums(df, strat)
    results = get_results_table(df, strat, lag)
    
    return results

results_fake_signal = strat_results(prices, 'fake_signal', 5)
results_fake_signal
Out[8]:
bear bull
count mean std min 50% max testStat p-value count mean std min 50% max testStat p-value
signal_age
0 508 0.000146 0.009944 -0.040889 -0.000248 0.031200 0.000650 0.999482 246 0.000276 0.011781 -0.033144 -3.223172e-04 0.083806 0.001491 0.998811
1 71 0.000823 0.015472 -0.033144 -0.000608 0.083806 0.006310 0.994984 147 -0.000127 0.009662 -0.035103 -2.420437e-04 0.020647 -0.001083 0.999137
2 46 0.000323 0.008976 -0.025476 0.000611 0.016849 0.005310 0.995787 114 0.000009 0.010248 -0.040889 -4.729074e-04 0.027221 0.000086 0.999932
3 36 0.001569 0.010141 -0.014224 -0.000845 0.030039 0.025785 0.979575 75 -0.000688 0.011220 -0.038308 -1.635698e-03 0.031200 -0.007082 0.994368
4 25 -0.000671 0.009802 -0.027563 -0.000708 0.017720 -0.013682 0.989197 48 0.000322 0.009619 -0.020973 -8.267891e-04 0.025958 0.004831 0.996166
5 15 -0.001259 0.011595 -0.025365 -0.001511 0.016179 -0.028032 0.978033 37 0.001969 0.010544 -0.015899 1.364359e-08 0.026662 0.030701 0.975678

Save Formulas

No new price file is created. However, add_age_num, add_age_nums, get_results_table and strat_results have been added to the separate document “helper_functions.py” which can be imported into future tests.

For workbooks which test an individual strategy, a separate script will run on the entire universe of stocks and export a pickle file for each result table. iPython notebooks will analyze the resulting file.

Potential Extensions

We need to compute beta-adjusted returns and evaluate strategies based on them.

A parallel version of the signal age formula is currently being developed.