All You Need To Know About Markov Chains

Jonathan Ho
10 min readApr 13, 2024

--

Photo by Anton Maksimov 5642.su on Unsplash

“Simons concluded that markets didn’t always react in explainable or rational ways to news or other events, making it difficult to rely on traditional research, savvy and insights. Yet, financial prices did seem to feature at least some defined patterns, no matter how chaotic markets appeared, much as the apparent randomness of weather patterns can mask identifiable trends. It looks like there’s some structure here” — The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution.

Introduction

That was the birth of Jim Simons’ very first algorithm. Jim Simons is the founder of Renaissance Technologies — a quantitative hedge fund based in East Setauket, New York. He and his fund are known to be quantitative investors, using mathematical models and algorithms to make investment gains from market inefficiencies.

He believed that markets existed in various hidden states that could be identified with mathematical models and as such, he developed an algorithm to analyze Markov Chains which are sequences of events in which the probability of what happens next depends only on the current state, not past events. In a Markov Chain, it is impossible to predict future steps with certainty, yet one can observe the chain to make educated guesses about possible outcomes. Baseball, can be seen as a Markov game. If a batter has three balls and two strikes, the order in which they came and the number of fouls in between does not matter. If the next pitch is a strike, the batter is out. As such, this article seeks to delve into the premise of these models.

What are Markov Chains?

A Markov process is a stochastic process that satisfies the Markov property, it is sometimes being characterized as “no memory”. In simpler terms, it is a process for which predictions can be made regarding future outcomes based solely on its present state and — most importantly — such predictions are just as good as the ones that could be made knowing the process’s full history. In other words, conditional on the present state of the system, its future and past states are independent. This is known as the Markov property, expressed as such, Xn represents the state of the system at time n

P(Xn+1​∣ X1​,X2​,…, Xn ​)=P(Xn +1​∣ Xn ​)

Probability Markov Chains {Xaktly}

The Markov process also operates in a certain set of possible states, known as the state space. Each state represents a particular configuration or condition of the system. The state space can be discrete, where the possible states are countable (e.g., the outcome of a dice roll), or continuous, where the states form a continuous range (e.g., temperature measurements). Markov processes are also characterized by transition probabilities, which specify the likelihood of transitioning from one state to another in a single time step. These probabilities are often represented by a transition matrix or transition function, which describes the probabilities of moving between states. For a discrete-state Markov chain with N possible states, S1, S2, S3, S4,, … , SN, the transition probabilities are often represented as a square matrix P, where Pij represents the probability of transitioning from state Si to state Sj​.

Transition Matrix {Chat GPT}

Stationary Distribution

One key concept is the Markov’s Stationary Distribution, also known as the steady state. For certain Markov processes, there exists a stationary distribution, which represents the long-term proportion of time that the process spends in each state. In the stationary distribution, the probabilities of transitioning between states balance out over time, resulting in a stable distribution. In other words, the system has reached a point where the probabilities of transitioning between states no longer change, resulting in a stable distribution of states.

To get a firm understanding of steady states, we need to solve the vector of probabilities, π, such that moving the chain forward by one iteration from π keeps the probability at π.

In discrete time Markov Chains, a steady state is achieved when the distribution of probabilities across different states remains constant from one time step to the next. Mathematically, if π represents the probability distribution vector of the states, and P represents the transition probability matrix, then a steady state π satisfies the condition: πP = π

In other words, multiplying the probability distribution vector π by the transition probability matrix P results in the same probability distribution vector π. This implies that the system’s probabilities remain unchanged over time.

In continuous time Markov Chains, the concept of steady state is similar but is expressed in terms of a probability density function or a probability mass function, depending on whether the state space is continuous or discrete.

For continuous time Markov Chains with a transition rate matrix Q and a probability density function π(t), a steady state is reached when the system satisfies the condition: π(t)Q = 0

This condition implies that the rate of change of the probability density function with respect to time is zero, indicating that the system has reached a stable distribution of states.

Intuitively, if we keep iterating the Markov Chain process, there is a condition (probability distribution) that will stay at that particular condition.

Markov Chains Monte Carlo

The Monte Carlo Method within the context of Markov Chain is known as Markov Chain Monte Carlo (MCMC). MCMC is a powerful technique for estimating the properties of complex distributions. By simulating random samples and applying statistical techniques, MCMC methods can estimate parameters that are otherwise difficult to compute directly.

Monte Carlo simulation, a computational method named after the famous casino destination in Monaco, is employed to estimate the probability of outcomes in complex systems or processes. It involves modelling the system using mathematical equations or algorithms and then drawing random samples from probability distributions that represent uncertain variables. These samples are generated using pseudorandom number generators and serve as input to the model, which simulates the system’s behaviour. Through multiple simulations, the results are aggregated and analysed to estimate probabilities, assess risk, or make predictions. Monte Carlo simulation is invaluable in situations where analytical solutions are challenging or impossible to obtain, such as in complex systems with uncertainty or variability. It is an alternative to complex numerical methods. Examples of Monte Carlo simulation optimizing the geometry of nuclear reactors and atomic bombs and the simulation of stock prices. It essentially uses randomness to estimate some deterministic quantity of interest.

Combining the two we thus have MCMC which is used to sample from any probability distribution. It is mostly used to sample from the intractable posterior distribution for the purpose of inference.

The objective is to find the stationary distribution (target distribution) that we can sample from. As such, the previous states before the stationary states are “burn in” where we only keep the future samples while discarding the other states. How do we check this?

We must proof that the transition probabilities satisfy the Detailed Balance Condition. This condition essentially states that the probability of transitioning from state x to state x′ under the target distribution π(x) is equal to the probability of transitioning from state x′ to state x. In other words, the chain behaves reversibly with respect to the target distribution. Ensuring detailed balance guarantees that the Markov chain asymptotically converges to the target distribution π(x) as n → ∞, where n is the number of iterations.

Detailed Balance Condition {Chat GPT}

If the condition satisfies, then it guarantees the stationary state to be approximately representing posterior distribution.

Metropolis Hasting Algorithm

MCMC taps on the metropolis hasting algorithm when sampling from the distribution. The metropolis hasting method addresses this issue by using proposal distribution Q that depend on the current state. A simple example is for a given state x, the proposal can be a Gaussian distribution with mean at x and standard deviation 1. Unlike importance and rejection sampling, the generation of samples in an iteration of metropolis hasting depends on the sample generated in the previous iteration.

It first proposes a new sample x’ conditioned on the current sample value xᵗ using the proposal distribution Q. It then calculates an acceptance ratio for the new proposed sample x. After that, generate a random number in the range of 0 to 1 and if the generated number is less than the calculated acceptance ratio, accept that sample, otherwise reject it. Repeat the steps until the sequence of states forms a Markov Chain whose stationary distribution converges to the target distribution.

Acceptance Probability Formula {Chat GPT}

In summary, the metropolis hasting algorithm explores the state space by proposing new states and accepting or rejecting them based on their likelihood under the target distribution. By doing so, the algorithm generates a sequence of samples that asymptotically converges to the desired distribution.

Hidden Markov Models

Jim Simons uses a Hidden Markov (HM) process where the chain of events is governed by unknown, underlying parameters or variables. One sees the results of the chain but not the states that help explain the progression of the chain. We are able to infer the processes from a distribution, even if the processes remain hidden. This algorithm is so powerful as it is able to determine the set of states best fitting the observed pricing data, enabling the model to bet accordingly. “The whys didn’t matter, Simons and his colleagues seemed to suggest, just the strategies to take advantage of the inferred states.” It is likened to a game of Poker. Poker players surmise the mood of their opponents by judging their behavior and adjusting their strategies accordingly. Facing off against someone in a miserable mood calls for certain tactics; others are optimal if a competitor seems overjoyed and overconfident. Players do not need to know why their opponent is feeling a certain way, they just have to identify the moods themselves and play accordingly. The same approach is used for stock analysis using the HM process.

A Hidden Markov Model (HMM) is a statistical model that represents a sequence of observable events generated by a sequence of hidden states. Unlike a regular Markov chain where all states are observable, in an HMM, the states are hidden or unobserved, and only the emissions or observations associated with each state are observable. The goal is to maximizes the sequence’s probability for the hidden states given the data we observe. The same assumptions in a Markov Chain explained earlier still applies.

Hidden Markov Models Algorithms

Just like how MCMC uses metropolis hasting as its sampling algorithm, HMM uses one of following algorithms: the Forward-Backward algorithm, Viterbi algorithm or Baum-Welch algorithm.

The Forward-Backward algorithm is used to compute the likelihood of observing a sequence of symbols given an HMM model. It works by computing the forward probabilities, which represent the probability of being in a particular state at a specific time given all the observations up to that time. Then, it computes the backward probabilities, which represent the probability of observing the remaining symbols from a particular state at a specific time. These forward and backward probabilities are combined to compute the overall likelihood of the observed sequence using the formula of total probability.

The Viterbi algorithm is used to find the most likely sequence of hidden states given a sequence of observations. It works by efficiently computing the most likely path through the HMM, taking into account both the transition probabilities between states and the emission probabilities associated with each observation. It does this by maintaining a dynamic programming table that stores the probabilities of the most likely paths ending in each state at each time step. The algorithm then backtracks through this table to find the optimal state sequence that maximizes the probability of observing the given sequence.

Lastly, the Baum-Welch algorithm, also known as the Forward-Backward algorithm, is used to train the parameters of an HMM model from observed data when the underlying states are not known. It is an iterative algorithm that alternates between the expectation (E) step and the maximization (M) step. In the E-step, it computes the expected counts of transitions and emissions using the Forward-Backward algorithm. In the M-step, it updates the transition and emission probabilities based on these expected counts. This process continues iteratively until convergence, where the likelihood of the observed data under the updated model parameters is maximized. This is the same EM algorithm used in GMM in my Machine Learning: Unsupervised Learning article.

Conclusion

In conclusion, Markov Chains provide a powerful framework for modelling sequential processes with memoryless transitions between states, offering insights into a wide array of phenomena across various domains. Monte Carlo Markov Chain (MCMC) methods extend the utility of Markov Chains by enabling efficient sampling from complex probability distributions, facilitating inference and estimation tasks in statistics, machine learning, and Bayesian modelling. Hidden Markov Models (HMMs) introduce a probabilistic approach to modelling sequential data with unobserved states, enabling tasks such as sequence prediction, classification, and anomaly detection. Together, these frameworks form a cornerstone of probabilistic modelling, offering versatile tools for analyzing and understanding diverse datasets and processes.

References

[1] The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution

[2] Markov Chain: Data Science Basics — Ritvikmath

[3] Markov Chain Monte Carlo: Data Science Concepts — Ritvikmath

[4] Hidden Markov Model: Data Science Concepts — Ritvikmath

--

--

Jonathan Ho
Jonathan Ho

Written by Jonathan Ho

A 20 year old who is serving National Service, passionate about Quantitative Finance, Systematic Trading and Machine Learning.

No responses yet