# Do company valuations react to market sentiment with a lag?

According to their website, Duff & Phelps “is the premier global valuation and corporate finance advisor with expertise in complex valuation” and other fields. According to their annual Valuation Handbook – Guide to Cost of Capital, most stock valuations react to market news with a lag.

Often, valuators use conclusions of this model to make meaningful decisions about pricing company risk.

If the model isn’t true, then we will need to rethink the way we value companies — likely in a way which increases the value of closely held businesses. They will systemically overstate the risk of publicly traded corporations and therefore understate the value of closely held firms.

If the model were true, one could utilize the predictions to almost effortlessly make untold fortunes in the stock market.

I’m publicly sharing the results of my research, so I imagine you can guess where I stand on the question.

I am not comfortable using the “sum beta” methodology to estimate the riskiness of an individual stock. Any observed difference between the normal beta and the sum beta is likely the result of statistical noise. Unfortunately, this invalidates the conclusions of the Duff & Phelps methodology. Their industry risk estimates are likely biased in an upward direction.

I didn’t want this to be the result. I wanted to find an easy way to consistently beat the stock market, but I’m going to have to keep looking.

# Introducing Analyzing Technical Analysis

I’m happy to announce a new series of pages: Analyzing Technical Analysis.

I frequently read people talking about Moving Average Crossovers, Bollinger Bands, Relative Strength Indexes and other indicators that allegedly allow traders to beat the market. The default algorithm at QuantConnect uses volume-weighted moving averges to beat the market. A few weeks ago, my Uber driver told me he uses Average True Ranges to trade the currency market.

It’s hard to believe all these strategies are being discussed if there’s nothing there. Then again, it’s also hard to believe a successful currency trader would drive Uber.

This series of posts is an attempt to rigorously and thoroughly check the effectiveness of technical analysis indicators through a series of iPython notebooks. In the notebooks, I’ll do my best to set my biases aside and objectively test the facts. In posts such as these, I’ll share my conclusions.

In the initial block of posts, I tested how returns react in the days after moving averages cross each other and when prices “bounce” off moving averages. (Click here for results and here for a guide to interpreting results.) In every strategy for every horizon, nothing close to statistically significant returns are observed. The strategies failed so spectacularly, I almost felt guilty about making my computer subset white noise so many times.

Regardless, it has been a good exercise. The past two weeks haven’t taught me how to beat the market, but I did finally have a good excuse to make the jump from R to Python, I learned how to perform bash scripting and I developed an objective way to test trading strategies.

With some luck, the next strategy I try will make me fabulously wealthy.

# Forecasting Ebola Deaths Using Time Series Regressions

Johnny Voltz, an old college friend of mine who was once voted most-likely to have a superhero named after him, send in a great question about the recent, tragic Ebola outbreak in West Africa:

I know very little about math, and even less about medicine. From the data I have, it would suggest that the Ebola virus is growing at a logarithmic pace. Would it be fair to predict continued growth at this rate considering lack of medical care in Africa? Would 100,000 be an exaggerated number for March 2015? What are your thoughts?

He then linked to a chart which showed the logarithm of Ebola Deaths and an Excel fit.

This is a classic time series problem, and I’d like to use it to illustrate the process, merits and accuracy of fitting time trends through regressions. As a final step, we’ll produce an estimate of cumulative Ebola deaths in March 2015. But first, let’s talk about regressions in general. Continue reading →

# Predicting NFL Scores: A Bayesian Approach

With the 2013 football season behind us, I’ve been spending my weekends developing a Bayesian model of the NFL with my college friend and UT math phd candidate Chris White.

To generate a Bayesian model, one first comes up with a parametric model that could generate the observed data. For simple problems, one uses Bayes rule to calculate the probability of parameters equaling certain values given the observed data. For complicated models, this is an extremely complicated task, but there are some monte carlo methods, such as Gibbs Sampling, which produce satisfactory approximations of the actual distributions. There are R packages available that make Gibbs Sampling fun.

I want to touch on many of these topics later in more in-depth posts, but first, here’s some results.

As of the end of the 2013 regular season, this is how our model ranked the NFL teams.

These are the mean estimates of our Bayesian model trained on the 2013 NFL regular season. Our model predicts that, on a neutral playing field, each team will score their oPower less their opponent’s dPower. The former is the mean of a poisson process and the latter is the mean of a normal distribution. Homefield advantage is worth an additional 3.5 net points, most of which comes from decreasing the visitor’s score.

So we predict the Rams would beat the Lions by a score of 27.4 to 25.8 on a neutral field. Since we know the underlying distributions, we can also calculate prediction intervals.

Here’s the same model trained on both the regular season and the post-season. The final column shows the change in total power.

I have a list of ways I want to improve the model, but here’s where it stands now. Before next season, I want to have a handful of models whose predictions are weighted using a Bayesian factor.

# What is partial autocorrelation?

This is the final post I have planned in the autocorrelation series. I’ll introduce a new concept called partial autocorrelation and show a couple examples of the concept in action. We’ll learn some important things about the stock market.

Without getting into the math, here’s a couple definitions you should commit to memory:

1. Autocorrelation (“ACF”) measures the correlation between observations in a time series and observations a fixed unit of time away.
2. Partial autocorrelation (“PACF”) measures the correlation between observations in a time series and observations a fixed unit of time away while controlling for indirect effects.

For example, here’s both ACF and PACF for monthly U.S. unemployment.

This is a common pattern you’ll see in the field often: a slow decay of ACF, and a spike in PACF. Starting at a lag of two months, PACF is basically zero.

This means that January’s unemployment influences March’s unemployment, but only because January influences February and February influences March. If unemployment were high in January and low in February, we would predict March’s unemployment to also be low.

In the Box-Jenkins methodology, ACF and PACF plots like this one are used to determine the correct model. Applying this step to a preexisting model will address seasonal patterns in residuals.

A few notes: Continue reading →

# Grand Theft Autocorrelation

In my last post, I discussed the concept of correlation. In my free time since then, I’ve been playing a lot of Grand Theft Auto V. It’s time to merge these noble pursuits.

As you may know, GTA V includes an online stock market that allows players to invest their ill-gotten gains in fictitious companies. Naturally, a Reddit user has created an updating database of Stock Market prices. I play on a 360, so I’ve analyzed the Xbox prices. I’ve developed a strategy that will earn money in the long-run, but first let’s do some learning.

As we previously discussed, correlation is a measure of how well the highs of one series line up with the highs of another series. Autocorrelation is a measure of how well highs of a series line up with the highs of the previous observation of the same series. Continue reading →

# What is Correlation?

Besides not being causation, many pedantically smart laymen don’t know what correlation is. I’m here to fix that with a mathematical, an intuitive explanation and a brief philosophical comment.

### Intuitive Explanation

Correlation is a quantitative measure of how well the highs line up with the highs and the lows line up with the lows between two arrays. Correlation will always be between -1 and 1, inclusively. A correlation of 1 indicates a linear, positive relationship between two variables. A correlation of zero indicates no correlation. A correlation of negative one indicates a perfect, negative relationship.

### Mathematical Definition

This value can be computed in Excel through the CORREL() function, but stepping into the formula helps enhance the understanding. Feel free to skim over this part to the applications and philosophy sections. Mathematically, correlation can be computed as follows: Continue reading →

# Unit Roots and Economic Recovery

This may seem like an esoteric subject, but the concept of “unit roots” has implications for practitioners who debate time series forecasts.

Consider the following two models of economic growth:

$y_t = y_0 e^{\theta t}\varepsilon_t$
$y_t = y_{t-1} e^{\theta}\varepsilon_t$
Where $y_t$ is GDP at time $t$, $\theta$ is a long-term exponential growth rate, and $\varepsilon_t$ is a stochastic driver with mean 1.

The first model multiplies an initial value by an exponential growth trend and a random variable. For every $y_t$, the $\varepsilon_t$ term will make $y_t$ be above or below the long-term trend. However, these deviations are temporary and will have no impact on $y_{t+1}$.

In the second model, the most recent observation is multiplied by the growth rate and a new error term. Like in the previous model, $y_t$ is affected by growth and random shocks. However, rather than $\varepsilon_t$ generating a temporary deviation from a trend, $\varepsilon_t$ generates a deviation from the trended prior observation. Because $y_{t-1}$ is dependent on $\varepsilon_{t-1}$, $y_t$ depends on all prior shocks. In this model, shocks have a permanent impact on future $y$.

The first model is said to be trend-stationary because the expected value and variance is identical throughout for all $t$ after adjusting for the trend. The second model is harder to forecast. It will resemble a random walk with an upward drift.

The above graph from wikipedia shows what a trend-stationary and unit-root time series might look like after a negative shock. The blue line is trend-stationary and will return to the long-term trend. The green line has a unit root and will continue to drift upwards at the previous growth rate.

Surprisingly, there isn’t a consensus among economists over whether economic growth has a unit root. For example, in March 2009, Paul Krugman and Greg Mankiw had a public disagreement over whether GDP has a unit root. Mankiw believed the damage caused by the financial crisis would be persistent. Krugman entitled his response “roots of evil.” Although Krugman never took Mankiw’s subsequent offer to wager, Mankiw likely would have won the bet. The sustained recession appears to be the predictions of the Unit Root model playing out.

In the less-glamorous world of litigation support and business valuation, I’ve seen the following graph used to argue that area businesses were due a high growth trend as the local economy would soon be “reverting to the trend” but for the business interruption.

This is a valid point if, as the expert was implicitly assuming, the time series is trend-stationary. However, if Corporate Income Tax Revenue has a unit root, then there is no long-term trend to which the economy can revert.

The most common way to test whether a time series is trend-stationary is to use a Dickey-Fuller test. In my opinion, applying this test should be one of the first things an aspiring analyst does before working on a time series. Specifying the wrong model when a unit-root relationship is present can lead to spurious results and bad forecasts.