Applications

TagsEE 276

Distributed Compression

Say that you had X,YX, Y correlated with entropies H(X),H(Y)H(X), H(Y). Say that you want to send both of them to a receiver. We can, of course, just use channels with rates Rx=H(X),Ry=H(Y)R_x = H(X), R_y = H(Y). But given that X, Y are correlated, can we do better?

So this is actually a hard problem because the encoders for X, Y don’t have access to each other.

Toy motivation

Say the encoder and decoder had access to YY. Then, you can send XX at a rate H(XY)H(X | Y), which is more efficient.

However, this is a pipe dream because we don’t have real access to YY during encoding. But now, we know that there is an interval [H(XY),H(X)][H(X | Y), H(X)], and the optimal coding rate lies in there.

Slepian & Wolf (Distributed Compression)

In this setup, we assume that the decoder has access to the sent Y, and that this has no errors. We can essentially use a glorified hash function. We compress XX by using a hash function, and then we use a notion of joint typicality at the decoder to infer, among all the hash collisions, which XX is most likely.

The algorithm

Let’s assume that we have pure access to YY at the decoder. This means that we need RYH(Y)R_Y \geq H(Y). If this is satisfied, then we do the following

  1. Pick a hash function ϕ\phi with 2nR2^{nR} values, where we pick RR later
  1. Hash ϕ(xn)\phi(x^n) and deliver the bin index to the decoder
  1. Using the decoder, look at the indexed bin and find the element that is in the typical set of XYX | Y ( we assume access to this distribution). If we can find a unique result, return it. Else, return failure.
    1. Concretely, we look at the 2nH(X)2^{nH(X)} typical sets of XX, and there are 2nH(XY)2^{nH(X | Y)} bins, so every bin has 2nI(X;Y)2^{nI(X ; Y)} typical sequences of XX. This actually can form a grid where one axis is bin and the second axis is the bin index. The area is just 2n(H(X))2^{n(H(X))}. The grid is uniformly distributed because it’s a typical set.
    1. Without knowing the bin index, this is a really bad situation because you have 2nH(XY)2^{nH(X | Y)} options from the typical set.
    1. Without knowing YY, this is a really bad situation because you have 2nI(X;Y)2^{nI(X ; Y)} options.
    1. But what if you know both the bin index and Y? Well, here’s the interesting part. There are 2nH(XY)2^{nH(X | Y)} options and there are 2nH(XY)2^{nH(X | Y)} bins. So in expectation, there is only one option per bin. There is only one option that satisfies both YY and the bin index. This is why decoding is possible!

Picking RR

So we can show that this algorithm works if R>H(XY)R > H(X | Y). There are 2nH(XY)2^{nH(X | Y)} typical sequences per YY, Therefore, there should be 2nH(XY)2^{nH(X | Y)} bins. This is very handwavy, and indeed you can’t do any strong guarentees. You essentially want all your hash collisions to be with non-typical sequences, but you can’t guarentee it.

How good can this algorithm get?

So we can actually establish three critical restrictions that tells us how good we can actually get.

We know that the corner point is H(YX)+H(XY)H(YX)+H(X)=H(X,Y)H(Y | X) + H(X | Y) \leq H(Y | X) + H(X) = H(X, Y), which means that the third restriction is actually meaningful. At the end of the day you get the following rate region for this encoding approach:

Now, it is possible to achieve point AA if we transmit YY at full rate and XX at the restricted rate given the hash function. Now, recall that this isn’t the same as giving YY directly to the X encoder. Rather, we showed that the hash function approach can yield the same rate lower bound!

It is possible to achieve point BB if we transmit X at full rate and Y at the restricted rate.

Now, we can achieve any point on the connecting line by just transmitting using the two strategies with varying probabilities. In other words, you might have two types of encoder structures created and during transmission, you flip a weighted coin.

Cybersecurity

In cybersecurity, you want to send WCWW → C → W where CC is assumed visible. Now you want to have I(W;C)=0I(W; C) = 0. How can this be done?

Well, you might have a secret key kk and the code comes from C=f(W,k)C = f(W, k).

Coding theorem

Shannon stated that to have perfect secrecy, you need H(k)H(W)H(k)\geq H(W). To achieve this, it’s actually quite trivial. Take kk as IID Ber(0.5)Ber(0.5) and do XOR. If you xor any kk this way, you will get CC as also IID (because you randomly bit flip; it doesn’t matter where you come from). And the receiver just need to do XOR the kk again.

Can we extract this kk without sending it?

Common Randomness

So in an ideal world, if we have some correlated Xn,YnX^n, Y^n, can we extract some meaningful random kk that is shared the two sides? Intuitively, we can. If there is correlation, there is shared randomness between X,YX, Y, as measured by I(X;Y)I(X; Y), which we can try to extract.

So this sounds trivial at first: what if we just combined X,YX, Y and generated a random kk? So that would work perfectly, but this would be a security risk. Remember that kk represents a security key and that Eve can look at any sort of communication. To produce this kk, you would need to share X,YX, Y across this channel. So you can assume that Eve can construct kk too, which would make your code meaningless.

So here’s the revised problem statement:

  1. You are given two correlated data streams Xn,YnX^n, Y^n. One is sent to Alice, and one is sent to Bob. Don’t worry about how these streams are created.
  1. You can communicate some message mm across a public channel
  1. You want to create a shared random key kk that both parties will share in common with high probability
  1. You want the message mm to give away as little of kk as possible

The common randomness solution

So here’s the thought process: if we can get XX to Bob in a convoluted way and have Bob use his knowledge of YY to decode XX, then we can use the remaining randomness of XX as the shared key. Why? Well in this ideal situation, both Bob and Alice have copies of XX. If they can systematically know what part of XX they reveal in the public channel, they can both peel back this known randomness to expose a secret core. They do this independently and require no communication.

All of this sounds fine, but how exactly do we accomplish such a task? Well, recall the Sleppian Wolf encoder. Let Bob be the decoder and Alice be the transmitter of XX.

If we assume that the Y encoder is perfect, then the Alice only needs to send at rate H(XY)H(X | Y) to convey XX to Bob. Now, to the outside observer, the rate H(XY)H(X | Y) is not possible for a noiseless reconstruction.

Concretely, recall that the SW encoder uses a hash function that pushes every XnX^n into a bin BB. Inside this bin, we give this XnX^n an index JJ. So this XnX^n can be described perfectly by (B,J)(B, J). To send this XX at the rate H(XY)H(X | Y), we only send the bin index BB, as where are 2nH(XY)2^{nH(X | Y)} of them.

On Bob’s side, the BB gives him the right bin to look at. Furthermore, he knows what YY is. So from our previous discussion about Slepian-Wolf, we know that Bob can decode with high probability. So now bob has XX and alice has XX. Both of them know the hash function, so it’s trivial to derive the JJ part of the XX that wasn’t transmitted. Let k=Jk = J.

What is the entropy of JJ? Well, recall the grid of size 2nH(X)2^{nH(X)}. It is uniformly distributed. So any slice is also uniform. The size of the slice is 2n(I(X;Y)2^{n(I(X; Y)}. So the entropy is jut nI(X;Y)nI(X; Y). This makes intuitive sense! We were able to transmit nI(X;Y)nI(X; Y) bits of shared randomness.

Proof of optimality

So is this the best we can do? It turns out, Yes. Let’s proof it by using the converse. We will assert that H(k)nI(X;Y)H(k) \leq nI(X; Y).

Let kk be the code that alice gets, and kk’ be the code that bob gets. We have kkk \approx k’, so we have H(k)I(k;k)H(k) \approx I(k; k’). Why? Well, I(k;k)=H(k)H(kk)H(k)I(k; k’) = H(k) - H(k | k’) \approx H(k).

Now, let’s condition on the message. Because I(k;m)0I(k ; m) \approx 0, we can say that I(k;k)I(k;km)I(k; k’) \approx I(k; k’ | m) 

Now, we use something tricky. Recall that Xn,YnX^n, Y^n are correlated. So we can construct the markov chain kxnynkk \leftarrow x^n — y^n → k’. Note that both m,ynm, y^n determine kk’, but we already condition the MI on the mm. By the data processing inequality we have I(k;km)I(Xn;Ynm)I(k; k’ | m) \leq I(X^n ; Y^n | m). We’re in the homestretch!

Let’s use the chain rule to yield I(Xn;Ynm)+I(m;Yn)=I(Xn,m;Yn)I(X^n ; Y^n | m) + I(m ; Y^n) = I(X^n, m ; Y^n). Given that mm is a deterministic function of XnX^n, we have I(Xn,m;Yn)=I(Xn;Yn)I(X^n, m; Y^n) = I(X^n; Y^n). Therefore, we have I(Xn;Ynm)I(Xn;Yn)I(X^n; Y^n | m) \leq I(X^n; Y^n).

To recap:

h(K)I(k;k)I(k;k;m)I(Xn;Ynm)I(Xn;Yn)=nI(X;Y)h(K) \approx I(k;k')\approx I(k; k; | m) \leq I(X^n ; Y^n | m) \leq I(X^n; Y^n) = nI(X; Y)

as desired.