Bayes Net D-separation and I-maps

TagsCS 228Construction

D-Separation

The cool part is that we learned all that we need to know about bayesian structure because these three patterns form the recursive building blocks for all other bayesian networks. To do this,

But to understand this, we must first understand D-separation. Intuitively, it’s just a generalized concept of conditional independence that we use to describe between sets of variables and not single variables any more.

Formalism

Three sets of nodes Q,W,OQ, W, O are considered D-separated given OO if QQ and WW are not connected by any active path. What does that mean?

Active path

In an active path ( a list of nodes), at least one must be true for ALL consecutive nodes on the path

  1. XYZX → Y → Z or XYZX \leftarrow Y \leftarrow Z and YY is unobserved (i.e. not in OO)
  1. XYZX \leftarrow Y → Z and YY is unobserved
  1. XYZX → Y \leftarrow Z and YY or any of its descendants are observed

This is just a fancy way of saying that the information propagates through the path freely. In other words, the all the variables in that path are dependent on each other; it “passes” dependence and nothing ruins it. When we say “ruin”, we mean observation in some cases, or non-observation in others (see discussion on information passing in other section)

For example, if YY had been observed for 1 and 2, then XX and ZZ become independent, and therefore XX becomes independent of everything after that ZZ; the chain is broken;

D-separation in terms of active paths

Back to D-separation: because we DON’T want an active path, it is quite easy to show D-separation. Just find/make a counterexample for every possible path through the graph.

The general algorithm

If you wanted to see if XYZX \perp Y | Z where X,Y,ZX, Y, Z are general sets of variables, then we need to see if each element of XX is pairwise independent from each element of YY given ZZ. To do this pairwise comparison, a BFS is the best deal

The indented bullet point needs some clarification. At every node, you check if things are blocked. Blocked means two things: if you reach a V structure and this node is not in W, or if you are not in the middle of a V structure and it is in Z. If either of these cases are satisfied, this current path is no good.

Soundness and Completeness of D-seps

We want to use D-separation to encode independencies in the original distribution pp that GG was generated from. Can we do this?

I-maps

In general, when we talk about I(P)I(P), this is the set of variable sets that are conditionally independent of each other in a distribution (can be a huge set because it is a subset of a superset)

Let’s define a similar thing for a graph:

I(G)={(XYZ):X,Y are d-sep given Z}I(G) = \{(X \perp Y | Z) : X, Y \text{ are d-sep given } Z\}

In our case, X,Y,ZX, Y, Z are all sets of nodes (can be the empty set). As such, I(G)I(G) contains all the sets of nodes that are conditionally independent or marginally independent (if ZZ is the empty set)

We call this I(G)I(G) an I-map of GG.

Soundness

It is true that I(G)I(p)I(G) \subseteq I(p). Let’s take this statement by faith right now.

This actually has a pretty significant upshot: any d-separated set of nodes will be independent from each other in the actual distribution!

Completeness

However, it is not the case that I(G)=I(P)I(G) = I(P). In other words, there can be independences that are not captured through the D-separation approach. And, to be more big-picture, these are the independencies that are not captured in the graph itself.

Why is this the case?

While it seems counterintuive at first, it’s actually pretty straightforard why d-separation is not complete. Think about how we claimed that every distribution can be expressed as a fully-connected network? It’s because some of the arrows don’t actually do something. In the table, if the table looks like this

you conclude that BB is independent from AA. And yet, in the graph, AA is still directed to BB, so there is no way to tell, structurally, that there is an independence. The moral of the story is that D-separation cares about the structure, but the structure of a Bayes net only tells half of the story.

The critical upshot is this: graphical models can add dependencies where there are none, but they can’t take away dependencies. That’s why it’s sound but not complete.

Representational power of directed graphs

Almost-completeness

The good news is that, although D-sep is not complete, it is almost complete. For example, it may be that case that XYZX \perp Y | Z in pp doesn’t imply d-separation, but if all distributions that factor over GG are independent in this way, then it must be the case that XX and YY are D-sep given ZZ. In other words, if the structure forbids dependencies, then D-separation comes automatically because it looks at the structure.

The contrapositive, therefore, is also true:

Theorem: If XX and YY are not d-separated given ZZ, then there exists some distribution that factorizes over GG such that XX and YY are dependent given ZZ.

Looking at the chart above, you see how it’s only an edge case that decouples AA and BB. If you change any of those numbers (i.e. make a new distribution with the same graphical factorization), AA and BB become dependent.

Do perfect maps exist?

Is it possible to make I(G)=I(p)I(G) = I(p)? This would be known as a perfect I-map, or a P-map. Well, it’s actually not possible to always find I(G)=I(p)I(G) = I(p). Take the noisy XOR example

In fact, it’s just impossible to express that noisy XOR in its glory at all, using a Bayes framework.

Minimal I-map

A minimal I-map is the closest we can get at times to a P-map. This is when you have some graph GG such that if you remove any edge, I(G)⊈I(p)I(G’) \not \subseteq I(p). In essence, you have the slimmest graph possible given the distribution.

I-equivalence

We compare I-maps of two graphs by doing an element-wise comparison. We call two Bayes nets G1,G2G_1, G_2 as I-equivalent if I(G1)=I(G2)I(G_1) = I(G_2)

Interestingly, two I-equivalent graphs may not be unique. For example, XYX→Y and XYX \leftarrow Y form different graphs but have the same (none) independencies.

Therefore, P-maps also are not unique.

I-equivalence and triplets

In fact, we might have observed something interesting.

In (a, b, c) of these graphs, the variables have the same independencies. Namely, that if ZZ is observed, the other two variables are independent. As such, we can freely change the directions of the arrows AS LONG AS we don’t make a V-structure as in (d). This V-structure is unique and can’t be changed.

The key upshot is that independence, minus the V-structures, do not care about arrow directionality!

Therefore, there exists many, many different maps with the same independencies. Or, to be more precise: if G,GG, G’ have the same skeleton (ignoring the arrows) and the same number of V-structures, then they are I-equivalent.

Intuitively this is true because two graphs with the same skeleton and the same V-structures will have the same D-separations.

Although it is important to be careful about the direction of logic. I-equivalent BN’s will have the same undirected connections but the converse doesn’t hold due to V-structures. Similarly, if two graphs have the same undirected connections and same V-structures, then they are I-equivalent but the converse doesn’t hold. Here’s a counterexample. Basically, the idea is that the v-structure is short-circuited by another edge, so it doesn’t matter where it is.

Here’s the key upshot: two networks with the same skeletons and V-structures have the same independencies, which means that you can’t distinguish them from data alone! You need some other forms of inductive biases. It’s the same thing about how observation can’t form causative relationships because ABA → B is equivalent to BAB→A.

Examples of I-equivalence

These two networks do not have the same I-maps because the number of V-structures are different

A review of vocab and notation

  1. An active path is a path that preserves “information” along the entire path. In other words, the “dependency” travels down the path unhindered.
  1. Two sets of variables are D-separated if they have no active paths between them
  1. The I-set contains all the sets of nodes that are D-separated (i.e. independent)
  1. This I-set is I(G)I(G) or I(p)I(p), and the set contains propositions. For example, XYX \perp Y is a proposition. It can be true or false. If X is truly independent of Y in pp, then we say that XYI(p)X \perp Y \in I(p).
  1. Two graphs with the same independencies are called I-equivalent.