Regime Switching Models With Hidden Markov Models: Bull Or Bear Market?

May 12, 2018
Data Science Finance

Financial markets have various states they can be in, such as recession/expansion or bearish/bullish/sideways. Different (algorithmic) trading strategies might perform different, depending on what state (or hereafter called “regime”) the market is in. Thus, regime switching is a well researched topic in finance. This article explores some simple models and tries to detect when these regimes occur and what probability they have given the data of the market.

The Data

The S&P 500 is an index on the top 500 US companies, and thus represents a benchmark for individual companies to be compared against. It is commonly used to compare (algorithmic) trading strategies, where the S&P is the reference. If the strategy is worse or somewhat equal than the SPY, we could’ve just as well invested in an ETF (exchange traded fund) such as SPDR® S&P 500 ETF that replicates the company performances of the whole index. That’s why one of these indexes (SPY) will be the reference for this analysis as well.

# get data
getSymbols("SPY")
## [1] "SPY"
# make return time series
returns <- as.numeric(diff(log(Cl(SPY))))
z.returns <- zoo(returns)
index(z.returns) <- index(SPY)
chartSeries(SPY, theme='white')

The plot above visualizes the performance of the SPY ETF as well as its volume. An interesting time span immediately making note of itself is the year 2009, in which the subprime crisis induced recession was well underway. The price drop as well as the spike in volume shows how the recession manifested itself in the market, giving way to a lot of players buying the dip, as what goes down must go up–at least most of the times.

Hidden Markov models

Hidden markov models (HMM) allow to model state-based systems based on probabilistic observations. I won’t go into the theory that deeply, there have been plenty of good articles with various degrees of theory of this on the web. There are even plenty of articles about HMM for regime switching1.

The main idea is that we have states (in our case bear and bull market), and that we don’t know what the process is that results in the transitions between each states, we can only observe the results (the closing price, returns and so on), we can never observe the process itself (we say it’s hidden). HMMs now allow us to model these transitions based on the observations we can make (again, closing price, return and so on). This provides probabilities for each state given the observed data, hence for a point in time we can assume the state with the highest probability to be the state the hidden process is in (we can’t check anyway, it is hidden afterall).

Two State HMM

hmm.1 <- depmix(returns ~ 1, family = gaussian(), nstates = 2, data=data.frame(returns=returns))
hmmfit.1 <- fit(hmm.1, verbose = FALSE)
## converged at iteration 44 with logLik: 9165.435
post_probs.1 <- posterior(hmmfit.1)
regimes.1 <- apply(post_probs.1, 1, max)

We can plot the probabilities of the two states to see how distinct they are:

df.probs <- data.frame(x=index(SPY), probs=post_probs.1)
ggplot(df.probs, aes(color=factor(probs.state))) +
  geom_line(aes(x=x, y=probs.S1)) +
  geom_line(aes(x=x, y=probs.S2)) +
  theme_hc() +
  xlab("Date") +
  ylab("State Probability")

Most of the times the probabilities are clearly distinct, and thus the expressiveness of the two states and our model is rather high.

The returns colored by state look like this:

lr <- as.numeric(diff(log(SPY$SPY.Close), lag=1)$SPY.Close)
df.1 <- data.frame(date=index(SPY), return=returns, state=regimes.1, log.return=lr)

g1 <- ggplot(df.1, aes(x=date, y=return, color=factor(state), group=1)) +
  geom_line() +
  xlab("Date") +
  ylab("Return") +
  theme_bw()
g1

The next thing we’ll have a look at is what constitutes the states. What’s the return distribution? Is it high/low volatility, bear/bull market or something entirely different?

ggplot(df.1) +
  geom_density(aes(x=returns, group=state, fill=factor(state)), alpha=.5, adjust=2) +
  xlab("Returns") +
  ylab("Density") +
  theme_bw()

The data are obviously (somewhat2) normal distributed around zero. We can use the statistical moments of the normal distribution as an approximation to check for the volatility. We already know the mean is near zero, so no state has a trend whatsoever. The volatility3 of the first state data is 0.006800447, whereas for the second state it is 0.02139266.

The original time series (closing prices) with the states as colors now looks like this:

df.close <- data.frame(date=index(SPY), return=df.1$return, close=as.numeric(SPY$SPY.Close), state=df.1$state)

ggplot(df.close) +
  geom_line(aes(x=date, y=close, color=factor(state), group=1)) +
  xlab("Date") +
  ylab("Closing Price") +
  theme_bw()

It seems it picked up some kind of distinction between bearish and bullish markets! This shows that the saying of “bearish markets are high volatility”4. We found a simple model that we can use to differentiate between a bear and a bull market, which allows us to utilize different strategies, based on their performance for each of the markets. Additionaly we now can generate data easily to see which date period corresponds to what market type without having to rely on intuition or external data.


  1. see this overview

  2. except for the long tails, but that seems to be an ongoing research topic as to what distribution returns really are

  3. defined as the standard deviation of log returns

  4. also see this article over at bloomberg

comments powered by Disqus