Stochastic Processes

TagsBasicsEE 276

Stochastic processes

A stochastic process is just a sequence of random variables with a joint probability mass function.

We call a stochastic process stationary if the joint mass function doesn’t depend on shifts in the time index. Intuitively, it means that there’s some sort of pattern in the distribution. Automatically, IID stochastic processes are stationary.

Entropy Rate

So previously, we had a nice notion of entropy because we had IID variables and we could just take H(X1)H(X_1). Now that we have a stochastic process with potentially evolving variables, how do we give a measure of entropy?

Here, we look at a measure called the entropy rate.

The definition 🐧

We define the entropy rate of a stochastic process as

Now this can be interesting. Consider a few examples and types of behavior

We can also define an alternative entropy rate measurement

They measure slightly different things, but we will relate them in the next section

Stationary Stochastic Processes equates H,HH, H’ 🚀

Theorem: if a stochastic process is stationary, then

H(χ)=H(χ)H(\chi) = H'(\chi)

Markov Chains

In the AEP section, we talked about IID variables, which is a good start, but we seldom deal with IID characters in real life. Here, we move to another level of complexity, where we have random variables that satisfy the Markov property.

The best way of thinking of a Markov Chain is as as journey of samples, in accordance to the Markov probabilities. You record this sample down as the stochastic process. You look at general behavior: what do a distribution of such samples look like? At the end of the day, it’s just a joint distribution with special properties.

PGM view

Markov chains are just PGMs such that every pair in the PGM has only one active path.

The key idea

If you took a random slice of a markov chain, you get some random variable XX. This XX can be distributed like anything. If it’s in the stationary distribution, then any XX slice shares the same distribution statistics. Now, even though every slice shares the same statistics doesn’t mean that it’s IID! There is a dependence, which is captured by the p(xx)p(x’ | x).

Therefore, you might have two X1,XnX_1, X_n that are all identically distributed, but they have a dependence.

Markov Properties 🐧

Remember that the markov property means that you can factor the joint distribution as so:

which means that you can express the following relationships between samples:

Intuitively, we’re just making a joint distribution and marginalizing. This is also equivalent to PuPu, where uu is a vector of probabilities that represents p(xn)p(x_n).

Markov chains are time invariant if p(xn+1xn)p(x_{n+1} | x_n) is agnostic to nn. This is different from being stationary (more on this in a second). We can express the a time invariant Markov chain through a matrix where Pij=P(X=iX=j)P_{ij} = P(X = i | X = j). All columns of the matrix sum to 1. Matrix multiplication PuPu is just a weighted sums of the columns, which is guarenteed to be a distribution.

If the rows sum to 1, then the operation is uPuP, not PuPu. THIS IS CRITICAL.

The Matrix

The matrix PP is quite special.

Special Properties 🐧

If Markov chain is irreducible and aperiodic, then the stationary distribution is unique and a convergence point.

You can solve for convergence points by using the vector definition: u=Puu = Pu. Usually, this gets you an equation that can’t be solved directly (it’s underdetermined) but you can use the final constraint that u=1\sum u = 1. When you solved for all uu, you have a stationary distribution!

Entropy of Markov Chain

In general, you can find the conditional entropy by using the vector definition.

and this is taken directly from the definition of conditional entropy and the meaning of PP and uu.

Entropy Rates

As long as the distribution is stationary, then

H(X)n=H(X1)n+n+1nH(XX)\frac{H(X)}{n} = \frac{H(X_1)}{n}+ \frac{n+1}{n}H(X'| X)

So the limit is just H(XX)H(X’|X), where we draw from the stationary distribution. We can calculate this value by using the stationary distribution and then the p(xx)p(x | x) of the Markov matrix.