Practice English Speaking&Listening with: Lec 18 | MIT 6.451 Principles of Digital Communication II

Difficulty: 0

PROFESSOR: --wind up doing a substantial part of Chapter

11, although I do tend to be over-optimistic.

So we're handing out Chapter 11, which I expect to revise

still further.

As we get towards these end chapters, these are things

I've only taught from, at most, once before, and so I

have more revisions to do than the earlier chapters which

I've used for many years.

The Problem Set Seven, as we announced last time, you only

needed to hand in the first four problems today, so we're

only handing out the solutions for the first four today.

I haven't made up any additional problems, so

Problem Set Eight is simply to do the last two problems in

Problem Set Seven, due next Wednesday.

And a reminder that we have no class on Monday because of

this peculiar local holiday we have called Patriot's Day,

when we run the marathon.

And I hope you'll all be out there running the marathon.

So enjoy the long weekend.

So what have we been doing in Chapter 10?

We've basically been getting minimal trellis realizations,

or state-state space realizations, in system theory

terms, for linear block codes.

We've been focusing on binary linear block codes, but

clearly everything we've been doing goes to block codes over

any finite field or, in fact, over the

real or complex field.

It's just basic realization theory,

minimal realization theory.

So as long as we have a linear time-varying system, it works.

I had a question after class last time, well, doesn't it

work just as well for nonlinear systems?

Can't we do the trellis in the same way?

In general, not.

If it's a nonlinear system with a group

property, then you can't.

It's really the group property that's key to these


And we seem to have pretty well cracked the problem.

We have this tool called the trellis-oriented or

minimal-span generator matrix, from which we can read all the

parameters of the trellis, from which we

can construct a trellis.

Really, we found that the trellis-oriented generator

matrix contains all the information we need to

construct a minimal trellis.

And it's just a matter of turning the crank

once we have that.

The only degree of freedom that I've allowed so far in

trellis constructions is sectionalization.

I've shown you that if we can choose where to put the state

spaces wherever we like--

typically we put them in the center and other places,

sometimes we put them everywhere we could put them,

that's called an unsectionalized trellis--

and by so doing, we can sometimes reduce the apparent

state-space complexity.

But we can never reduce the branch complexity.

So we're left with branch complexity as a fundamental

complexity parameter for linear block

codes, or so it seems.

Is there any other--


AUDIENCE: Does the complexity of the number of computations

you need to go through the trellis depends only on the

number of branches, right?

PROFESSOR: It depends primarily

on the number branches.

It also depends on the edges.

If you go through a detailed operation count, I think I

give the formula for if you count additions and

comparisons as an addition and selection you don't count as a

logical operation, then I think the total count for the

Viterbi algorithm is twice the number of edges in the graph,

which is the number of branches, minus the number of

vertices, which you lose one for every--

AUDIENCE: What do you mean by vertices?

PROFESSOR: By vertices I mean states, now I'm talking about

it as a graph.

Minus one or plus one or something.

So that's an exact count for the Viterbi algorithm.

But it's dominated by the total number of edges in the

graph which, in turn, is well-estimated by the maximum

branch complexity at any one time.

So there's a bit of a trade-off between an exact

count and just getting a gross handle on the situation.

But I say it's certainly got to exceed the branch


So when I find the branch complexity goes up

exponentially, even for one branch, then I know that the

total Viterbi algorithm complexity has to go up

exponentially with that.

All right?

So is that the whole story?

Do we now have a pretty definitive handle on the,

quote, complexity of at least maximum likelihood decoding,

trellis decoding of a block code.

And if you've read Chapter 11, you know there's one other

degree of freedom that we haven't exploited yet in

representing a linear block code.

Anyone read that far, or are you just going to wait for me

to explain it to you?

Flip the coordinates, very good.

Could that make a difference?

Yes, it certainly could.

Let's take our standard example by now, the 8-4-4 code

with trellis-oriented matrix that looks like this.

OK, and we know everything about this by now.

But suppose, say, we flip these two center coordinates.

Certainly it would be an equivalent code from all of

its n, k, d and that sort of properties.

So I want to take these two coordinates and make them 1,

1, 1, 0, 0, 1, 1, 1.

Let's suppose that's a trellis-oriented generator

matrix for the code.

I think it is.

It's got starting times here, here, here, and now here.

And it's got ending times now here, here, here, and here.


Where I introduce the diverging and merging symbols

that I used last time for starting and ending.

All right, I think that is a

trellis-oriented generator matrix.

Well, it is.

We found a matrix that has distinct starting times and

ending times, so it is.

Now what's its state branch complexity?

The state dimension is 0.

Now what are our spans now?

Two are the same, but two are larger.

OK, so now when we count, we get that the state complexity

goes 0, 1, 2, 3, there are actually four that are now

active in the center.

3, 2, 1, 0.

OK, and that's worse than we had before, at

least in one place.

If we look at the dimensions of the branch spaces, which

I'm arguing is more fundamental, then we get 1, 2,

3, again, 4, 4.

So the branch complexity is now 16 rather than 8.

It's going to be more difficult, definitely, to

decode this code with the Viterbi algorithm.

This is a more complex representation of an

effectively equivalent code.

Well, so is there--

so this is a bad permutation, it's something I

didn't want to do.

Is it possible there's some good permutation that would

reduce the complexity?

In general, when we're given a block code, traditionally at

least, we think of it as the coordinates as just being not

in any particular order.

We can write down a generator matrix, but the view is that

any permutation of the symbol times is perfectly legitimate.

Certainly, if we send it over a memoryless channel, it's not

going to affect the performance of the code, what

order we send the symbols in.

So we should certainly regard all such codes as equivalent.

But we can affect the complexity, for better or for

worse, so we should try to --

presumably, if we want a more fundamental complexity metric

for the code, we want to find the complexity of the least

complex trellis realization under all permutations of the


All right, so that's a good issue, but very little is

known about this issue.

It is known that finding the best permutation of an

arbitrary linear code is an NP-hard problem.

So this is going to be hard.

We know for some specific classes of codes what the best

permutation is.

For instance, the Reed-Muller codes are nice because it's

been proved by Kasami and others that the standard

coordinate ordering, the one that I've been using all along

based on this length doubling construction, the u-u plus v

construction, gives you the minimal trellis complexity.

All right, so we know what the best ordering is for the

Reed-Muller codes.

And in a minute, I'm going to prove what the best possible

we can get bounds--

in particular, the Muder bounds, which I'm going to

discuss next--

which bound the best possible complexity that

you could ever get.

And in some cases, we can find coordinate orderings that meet

that bound, notably in the case of the 24-12-8 Golay

code, we can find a coordinate ordering that's very closely

related to the Reed-Muller ordering such that

the bound is met.

And so we know that we've found the best

complexity for that code.

It couldn't possibly be any better.

But apart from certain very nice cases like those, we

don't know.

For instance, take an arbitrary BCH code.

What's the best coordinate ordering for that?

We don't know.

So this is the open question in this subject, and it's now

been open long enough and enough smart people have

looked at it, and we've got this NP-hardness result, so

that it looks like it's going to stay open indefinitely.

All right, let me discuss, however, this Muder bound,

because it's simple and sometimes definitive.



PROFESSOR: Minimal branch complexity or any other

measure that you like.

They all are quite fungible, it turns out.

OK, The Muder bound.

Let's look at a particular code and suppose we just know

its parameters, n, k, d.

Let's ask what the minimal trellis complexity could be

for a particular coordinate ordering, say, that divides --

we have a certain number of past symbols, call it n_p, and

a certain number of future symbols, n_f.

And I'm going to ask what the minimum state complexity, or I

could also ask what's the minimum

branch complexity for--

let me do now a state complexity, going back to

where we started from, for a a certain division in the past

and future.

To be concrete, let me take the 24-12-8 Golay code.

It turns out that it's been proved that all codes with

these parameters are equivalent.

So we can just call this code the binary Golay code.

And let's look for any division into 16 future

coordinates and 8 past coordinates.

So we don't know what the ordering is going to be, but

we could say a few things.

Remember how we get the state complexity.

We look for what's the dimension of the past subcode.

So here's a generator for the past subcode here, the code

whose support is the past.

We can ask what the dimension of the future subcode is, and

then the dimension of the state space is the dimension

of what's left.

So it's dimension of c minus the dimension of the past

subcode minus the dimension of the future subcode.

State-space theorem.

OK, but we know a few things about this code.

We know what its length is and we know what its minimum

distance is.

So for this particular case, we have that c past is an

8-something-8 code, because its minimum distance has got

to be at least as great as this.

It's a subcode of this code.

So any non-zero code words in this past subcode have to have

minimum weight of at least 8.


All right.

And similarly, the future subcode, in this

case, is a 16-k --

let's call this past --

k-future-8 code.

At least 8.

So that imposes some bounds on the dimension of this code.

How large could the dimension possibly be?

All right, anybody?

For the past subcode its clearly, we have that k_p less

than or equal to 1.

We could have, at most, one generator here.

And if there were one, it would have to be

the all-1 code word.

Or the all-1 symbols here, because it has

to have weight 8.

So that's all we can get.

For the future code--

well, I guess you don't know this, but you can guess that

the Reed-Muller code with length 16 and minimum distance

8 is the best there is, that distance 16, and that happens

to be true.

So k-future can't possibly be better than 5, which is the

dimension of the Reed-Muller code with those parameters.

For more general cases, you have to look up tables on

bounds on the best codes that have ever been found.

But certainly for code lengths less than a 100, by this time

we pretty much know all the best binary codes.

So we can look up in a table what's the best code of length

16 and distance 8?

Well, it has five information symbols, dimension five.

So we couldn't have more than one dimension here and we

couldn't have more than five dimensions here, so the

dimensional code itself is 12.

We have to say that the dimension state-space is now

going to be greater than equal to--

this is k_p minus 1 minus 5--

so, that's 6.

So there have to be at least 6 generators down here that have

support neither wholly on the past nor wholly on the future

and therefore have to be included in the state-space.

All right, that's the idea of the Muder bound.

Now, we could do this for every single state-space.

We simply need to know what the dimension is of the best

code of each length with minimum distance 8.

And we can do a similar thing for every possible partition

in the past and future.

And that will give us a set of bounds, and we go through that

exercise in the notes.

And I can write it on the board, maybe for memory.

And OK, that tells us what the best possible

state profile is.

Let me guess state dimension profile is 0, 1,

2, 3, 4, 5, 6, 7.

Then it goes down to 6, 7, 8, 9, then it goes down to 8.

This is at the central state-space.

This is for dividing it in the way that we just did.

Let's do this here.

For the central state-space, what's the best k_p for a 12,

k_p, 8 code?

Let's make the past and future both 12.

Anyone want to guess what the best dimension is?

Let's think about it.

Let's think of a code that's going to have to have

minimum weight 8.

So let's start out with g1.

And we'll just make that look like that.

It's going to have length 12.

It's going to have to have at least eight 1s, so let's make

them like that.

We want to add another 1 so that the total code has a

minimum distance 8.

Subject to up to permutations, there's only one

thing it can be.

We need another weight-8 generator, and it needs to

differ from this in at least four places so that when we

take the mod 2 sum of these two, it has distance 8.

And we want g1 plus g2.

This is really the only way it can be.

Are you with me?

We have four code words in this dimension 2 code.

And if they're going to have minimum distance 8, they're

going to have to look like this, subject to up to


So from that little argument, we conclude that, in this

case, this is less than or equal to 2.

This is less than or equal to 2, the Muder bound becomes

that the dimension of the state-space has

to be at least 8.

And so in the same way, and then it becomes symmetrical on

the other side.

So I'm pretty sure I get it right.

And from memory, that's the best possible state dimension

profile, dot dot dot.

And we actually know a coordinate ordering for the

Golay code, a generator matrix for the Golay code, that gives

this state dimension profile.

So from that, we can conclude that we know an optimum

coordinate ordering for the Golay code.

I just want you to get the essence of the argument here.

You with me?

For branch spaces, the argument is much the same.

Remember in branch spaces, we basically --

let me say it in a slightly different way than we did

before -- we take the past to be up to time k.

And we have the symbol time, which is just one time in here

for the actual symbol we're looking at.

And the future is one unit shorter than it was for the


So when we compute the branch dimension, the

past goes up to k.

So this is 0 and before k.

And the future goes from k plus one up to n.

So this is what the matrix looks like for this.

And the effect over here is that the future subcode now is

one unit shorter.

So let's do the same calculation here.

If we look at a time where we have 12 in the past, this one

time, the 13th symbol time that we're actually looking at

and then 11 more in the future, the past subcode,

again, could have dimension up to 2 because it has length 12

and minimum distance 8.

The future subcode now has a shorter length and by this

little argument I went through here.

If we take out one of these coordinates, it's

not going to work.

So if we only have length 11, the future subcode is the

dimension --

still can only be 1.

We can't possibly add another second generator in there.

And so the dimension of the branch space has to be vastly

greater than or equal to 9, at this point.

OK, so the Golay code has branch complexity of at least

512, we conclude.

And, in fact, this best possible branch dimension

profile looks like this.

The branches are in the middle here.

They're 1, 2, 3, 4, 5, 6, 6.

Basically we have generators which look like this.

And then there's one that comes down, there are three

more that goes up and one that comes down.

And from that, we get 1, 2, 3, 4, 6, 7, 7, 8 --

that can't be right.

Goes down 7, goes up, it's still 7, then it goes up to 8,

9, and 9, 9, 9.

Something like that.

That one's probably wrong, and symmetrically.

Where this 9 is the one we just calculated.

Similarly, if you come down to the neighbor of this 6, it's

got a merge on one side, a diverge on the other

side, and that's 7.

OK, so we conclude that the Golay code has branch

dimension not less than 512 within 9.


AUDIENCE: Suppose we take past and future separated by 12, 12

in the past, 12 in the future.

Suppose we did it as they did in [UNINTELLIGIBLE] state?

PROFESSOR: OK, but that's not the way we do branches.

We've got to isolate the symbol time.

AUDIENCE: Suppose we took 12-branch-12 and we calculate

the state complexity, that's 8.

And then we took 13 and 11, we get the state complexity 9.

PROFESSOR: All right, you're saying if we do this, we

should get the state complexity 9.

And that's correct, we do.

AUDIENCE: The maximum of those two numbers, is it the same as

branch complexity?

PROFESSOR: It turns out to be in this case.

AUDIENCE: Not always?

PROFESSOR: Not always.

But if we go to 14, 10, then it turns out we can add

another one here because we can add another generator that

looks like something like this.

1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1.

So if we have two more units, we can add another generator.

Sort of Reed-Muller style.

And so this goes up to 3 and this goes down to 8.

AUDIENCE: So the branch complexity has to be at least

the maximum of those two, right?

PROFESSOR: Obviously, right.

And really, together these give you the story of where

the thing has to diverge and merge.

In this case, this is a self-dual code, so you only do

one at each time.

So with this sort of thing, you can

pretty well pin it down.

Again, I don't know how far to take this lecture.

I'm really just trying to get to you the idea of the Muder

bound from this kind of argument, therefore we can get

a bound on the best possible state dimension profile and

the best possible branch complexity profile.

If you compute this bound and you're able to find an actual

generator matrix that meets this bound, you know you've

found the best one.

But this rarely happens, this is a very special code.

It's not the way you prove the optimality of the longer

Reed-Muller codes, they're not good enough to --

this really assumes that you can get the best possible

sub-codes at every point.

And you can imagine that's very hard to do.

It would be a very special case where you can optimize

this for every single possible cut between past and future.

OK, so that's the Muder bound.

And you have a homework exercise on that as well.

Oh, there's another case where we can find what the best

possible trellis complexity is.

It's MDS codes over GFq, and you're going to do that on the

homework as well.

And there, we basically show that an MDS code, its n, k, d

are such that it has to have the worst possible state

complexity and branch complexity.

So because it's such a tightly-packed code, in

Hamming distance sense, it forces you to have the worst

possible complexity that you could have.

In other words, you have diverges immediately and as

long as you can, and then they're all packed together on

the left side and the merges are all packed together on the

right side, and that's the only way it can be.

I'm just speaking very impressionistically.

But you'll see what I mean when you get to the homework,

if you haven't done it already.

OK, so we also know MDS codes, we know their trellis

complexity and it's terrible.

We can't get any effective reduction.

Really, over exhaustive decoding methods with MDS

codes, we find that the trellis complexity is like

either q to the k or q to the n minus k,

depending on which is less.

So that's as bad as it could possibly be.

So if the trellis doesn't help for MDS codes, in general

cyclic codes --

which were subject to a lot of study -- if we put the code

into cyclic form, which if you remember Reed-Solomon codes

and BCH codes can always be put into cyclic form, at least

the shortened versions of them.

In that case, the generator matrix looks

like this one up here.

It looks like x, x, x, x, x, x, x, 0, 0, 0, 0, 0, 0, 0, x,

x, x, x, x, x, x, 0, 0, x, x, x, x, x, x, x.

In other words, the generators are just cyclic

shifts of one another.

Down to x, x, x, x, x, x, x, 0, 0, 0, 0.

That is a generator matrix that is a trellis-oriented

generator matrix because all the starting times are

different and all the stopping times are different.

So this is the trellis-oriented generator

matrix for cyclic code, and it's in the coordinate

ordering in which it is cyclic.

And it's always going to have this worst case -- all the

branch diverges are going to occur over here and you may

get nothing for a while.

And then all the merges over here, which if it's a rate

less than 1/2 code, this means that the trellis complexity is

basically going to be 2 to the k.

Or if it's a high-rate code, it means it's going to be 2 to

the n minus k.

And again, you get no improvement over some

exhaustive minimum distance decoding, or maximum

likelihood decoding.

So cyclic is very bad from a trellis point of view.

So I think that's about all the general

comments I can make.

The very last thing that I want to mention is

we now want to --

all right, so through this nice little theory, we've got

a way of trellis decoding (maximum likelihood decoding),

via the Viterbi algorithm, of block codes.

And now, did we achieve any improvements in terms of

performance versus complexity versus the convolutional codes

that we looked at previously?

And the basic answer is, no.

If you compare the tables,

convolutional are always better.

I'll just say it.

You can imagine that there have been long debates in the

fraternity, and that certainly some of the motivation for

trellis decoding of block codes was, oh, gee, now we

have a good way to decode block codes, just like we did

for convolutional codes.

And maybe now we'll find that block codes are worthwhile.

But the answer is they're not.

I should put this comparison into Chapter 10.

You can actually get this comparison by taking the table

-- for instance, for Reed-Muller codes.

In terms of performance versus complexity, Reed-Muller codes

are very good.

I would say as a general statement, they're the best

block codes that we know of, in terms of performance versus

trellis decoding complexity, because they have this nice

structure that gives them very low-complexity trellises for

their n, k, d parameters.

So in this sense, it's a fair comparison.

Reed-Muller codes are the best general class we know of in

terms of trellis complexity.

We compare them against binary convolutional codes and

they're not nearly as good.

I've already talked about the comparison.

It seems like a very fair comparison to take our rate

1/2 four-state convolutional code example, which had

distance 5, which had not only a nominal coding gain of 4 dB,

but it also has an effective coding gain of 4 dB.

And for a block code, this 8, 4, 4 code is a very fair


It's also rate 1/2, so it has the same spectral efficiency.

It also has a four-state trellis that looks very much

like a rate 1/2 convolutional code trellis.

But it turns out it only has minimum distance of 4 rather

than 5, so its nominal coding gain is only 3 dB

rather than 4 dB.

Its effective coding gain is less.

Its effective coding gain is 2.6 dN because it has 14

nearest neighbors, which is a 3 1/2 nearest neighbors per

bit, and that costs you about 4/10 of a dN.

And so really, you compare the performance of the

convolutional to the block code, that's very much better.

Another head-to-head comparison might be between

the 32, 16, 8 Reed-Muller code.

That's a rate 1/2 code.

It's a very good one, it's known to be the best with

those parameters, n, k, d.

So what's its nominal coding gain?


It has rate 1/2, minimum distance 8, its

nominal coding gain is?


6 dB.

You take the rate times the minimum distance--

1/2 times 8, you get 4.

So that's the nominal coding gain.

A factor of 4, which in dB terms is 6 dB.

But again, it has lots of minimum-weight code words.

It has 620 minimum-weight code words.

And therefore, the number of minimum-weight code words per

bit is 39, which is between five and six factors of 3, so

it costs you 5 and 1/2 times 0.2 dB in terms of effective

coding gain, which brings you down to an effective coding

gain of 4.9 dB.

I assume you're all very fluent in

doing these, as I am.

And what's its complexity?

If you look its state complexity, let's take--

we're looking at the 32, 16, 8.

And all the Reed-Muller codes have four section trellises,

you proved that this week.

If you just measure the state complexity of four places,

equally spaced with four equal sections, then all the

trellises wind up looking like this, for a general

Reed-Muller trellis --

a four-section trellis.

And for the 32, 16, 8, it turns out you get eight

parallel sections like that with eight of these each.

In any case, it's 64 states at each of these boundaries.

So you could say it's a 64-state trellis.

But if you're more honest, just exactly as with the Golay

code calculation we went through, you find that the

maximum branch complexity is 512.

So branch complexity is 512.

OK, and its nominal coding gain we said is 6 dB and its

effective coding gain is 4.9 dB.

OK, so let's look for a convolutional code of rate 1/2

that has a coding gain of about 5 or 6 dB.

And how far do we have to go?

We actually only have to go up to a--

well, a 32-state code has an effective

coding gain of 5.6 dB.

A 64-state code we can get up to an effective coding gain of

about 6 dB.

In fact, the 64-state rate 1/2 convolutional code was a

standard for a very long time, first in space communications

and then in much more general applications.

This was the code that Linkabit Corporation --

Jacobs and Viterbi --

rode to fame and fortune, made their first pile before they

went on to found Qualcomm, where they made

a much bigger pile.

And so this was their--

they call it k equals 7.

But in my terms, nu equals 6, a 64-state rate 1/2

convolutional code.

So let's take that rate 1/2 64-state convolutional code,

just with the standard trellis.

So its branch complexity is 128 at every time.

It has a more regular structure than this guy.

Its nominal coding gain is --

there are actually two of them, and I don't remember.

There's one of them that has a nominal coding gain of 7, the

other one is 6.5.

So it's at least 6.5 dB.

And its effective coding gain is still 6.1 dB and it's

clearly much better than this.

If you kind of try to plot a graph of this, you'll see that

the effective coding gains of convolutional codes with the

same state complexity is at least one dB better, as a

general trend, than that of block codes.

And of course, if you do it in terms of branch complexity,

it's an even better comparison.

So what happened was there was about a decade of effort on

basically finding good trellis representations of block

codes, and the net of it was that they couldn't get

anywhere close to finding anything as good as

convolutional codes.

Although, it's a nice little piece of theory.

And sometimes you want to block codes for other reasons.

Sometimes you want short blocks.

Sometimes you're limited to a block of less than 100.

And in that case, you're certainly not going to do

better in terms of n, k, d than the best block code of a

length 100.

Why are convolutional codes better?

First of all, they obviously naturally are well-adapted to

the trellis idea.

They are linear time invariant machines that just have very

nice, regular trellises.

So they're naturally adapted.

But also, they have exactly the right--

a rate 1/2 code is diverge, merge, diverge, merge,

diverge, merge forever.

If you want to make it into a block code, you can terminate

it or you can tail-bite it, which is something we're going

to talk about shortly.

But whichever way you do it, to make it into a block code,

you lose something.

The termination, you don't lose minimum distance, but you

do lose rate.

You have to put in some dummy information symbols, so the

rate becomes less than 1/2 if you terminate this code.

Nonetheless, as I believe I commented earlier, that's a

good way to come up with good block codes, is to terminate a

good convolutional code.

If you choose the parameters right, you can come up with

pretty good codes.

But nonetheless, in terms of block code parameters, you're

not going to do better than what 50 years of block code,

algebraic coding theory has come up with.

So for short block codes with length less than 100, really

the best we know of right now-- let's say 128--

is to pick a good Reed-Muller code or BCH code, find its

best trellis and do maximum likelihood decoding.

It used to be that you thought of 64 states as a lot of

states, but clearly, technology has progressed.

And now you wouldn't blink at 1,000 states or maybe even

8,000 or 16,000 states.

The biggest Viterbi decoder that was ever built had 2 to

the 14 states for a space program out of JPL.

So anyway, it's doable now.

If you needed the best possible performance and you

were constrained in block length, you'd pick the best

block code at the rate that you wanted, and the best

minimum distance at that rate, and the best effective coding

gain, and you'd do maximum likelihood decoding with

trellis decoding.

But if that's not usually the situation --

the situation in data communications, because

usually more you have a stream of data or a packet length of

at least 1,000, and then it would certainly be better to

use convolutional codes.

And trellis decoding is --

as long as we're in this class of techniques.

Where we're finally going in this course is to the new

codes that have been developed in the past decade,

capacity-approaching codes, turbo codes, low-density

parity-check codes.

These are simple enough to do so that as soon as you get

past the small codes, length less than 100, you would

switch over to that track.

You would use a capacity-approaching code.

But we don't know about them yet.

I guess that's the end of my lecture on this, so it's a

very good place for questions about state-of-the-art, where

we've been, where we're going.

The floor is always open.

But that was all I was going to say about Chapter 10.

Chapter 11, Codes on Graphs.

And now we begin our final stretch drive towards

capacity-approaching codes, which is going to be the

subject of the next three chapters.

Chapter 11 is sort of the beginnings of codes on graphs.

We're going to look at some elementary representations.

One of the ones we're going to look at is a trellis


That's one of the reasons we've been going through this

theory, is we're going to, again, encounter trellises as

a natural, simple, cycle-free, graphical representation that,

in some sense, is about as good as you can ever do, as

long as you don't allow cycles in graphs.

But then we're going to go on to graphs with cycles, which

is really the way to understand these

capacity-approaching codes.

So in this chapter, we'll start the subject, we'll

introduce graphical representations of codes in

more general terms.

In Chapter 12, we'll talk about the generic decoding

algorithms for codes on graphs called sum-product and min-sum

decoding algorithms.

And then in Chapter 13, we'll talk about the actual classes

of capacity-approaching codes that people have developed,

what their graphs are, how you decode them --

I hope in enough detail so that you'll get a good sense

for how all this works.

So that's where we're going.

What are we talking about when we're talking

about codes on graphs?

There are many styles of representations of codes, and

at this point, for linear codes, we have seen a number

of ways we can characterize a code.

One of the ways is just by giving a set of generators for

the code, k generators for an n, k code.

And that sort of characterizes the code.

Another way is to basically give the generator matrix for

the dual code, the h matrix, which becomes the parity-check

matrix for the code.

So we can also characterize the code by a

dual-generator matrix.

That's a particularly efficient way to do it if you

have a high-rate code like a single parity-check code, it's

better to say that the dual code is the repetition code

and it's the set of all code words that are orthogonal to

the all-one code word.

That's the simplest characterization of the single

parity-check code.

In other words, that the sum of all bits is equal to 0.

That's what orthogonal to the all-one code word means.

So that's a simpler representation than giving a

generator matrix formed for the single parity-check code.

Or now we have this trellis representation, which

certainly looks like a more graphical representation.

It certainly characterizes the code in a very direct way.

What does a trellis do?

It basically displays all the code words on a graph.

There's a 1:1 correspondence between the set of all paths,

from the root to the n-node in the graph and the code.

So that's another way to characterize the code.

So now we're going to go to higher level graphical


As a first step, we talk about behavioral realizations.

Again, a term from system theory, closely identified

with Jan Willems, who promoted this way of representing

linear systems.

In linear system theory terms, the basic idea in behavioral

systems theory is you describe a system by the set of all its

possible trajectories.

What are all the things it can do?

And now, how do you characterize the trajectories?

Often, you characterize them by a set of local constraints.

You can think of it as equations that the system has

to satisfy.

Ordinarily, they're differential equations or

partial differential equations or something.

And if it satisfies all of those constraints, then it's a

trajectory that satisfies all these partial differential

equations, is a valid trajectory.

So you can set up the whole system that way.

So it characterizes a system by its trajectories.

That's the fundamental thing.

And it characterizes trajectories --

legitimate, valid trajectories --

by local constraints.

Let me get a little bit more concrete.

In coding terms, by trajectories we

just mean code words.

So this is a very compatible notion with the way we've been

talking in coding.

What is a code?

It's simply a set of code words.

So if we simply say, what are all the possible code words?

We have to find all the trajectories of the code, and

we have a very explicit example of that in trellises.

We've called the paths

trajectories through a trellis.

What is a code?

It's simply the set of all valid code words.

And how do we characterize the valid code words?

Well, in each of these styles --

the generator, parity-check, trellis style --

the code words are the code words that satisfy certain

equations or constraints.

For instance, the generator representation --

we've said a code is the set of all uG, such that u is in a

field to the k.

That's a generator matrix style of representing a code.

In other words, we can say it's the set of all y, such

that y equals uG for some u in Fk.

Or we can say it's all y, such that y minus uG equals 0 for

some u in Fk.

So it's the set of all y that, together with some auxiliary

variables which are not part of the code word --

here they're input variables, if you like, or information

bits, information symbols.

We'll come to think of these as state variables because

they're hidden, they're not visible, they're just an

auxiliary part of the description.

They're code words that, together with some auxiliary

variables that, in this case, can be freely chosen, satisfy

a certain set of linear homogeneous equations.

Something that's even better in the behavioral style is a

parity-check representation.

Let me put that up.

In a parity-check representation, we say the

code is the set of all y, such that y H transpose equals 0,

where H is the generator matrix of the dual code.

This is sometimes called a kernel representation.

y is in the kernel of this linear transformation.

We see H transpose is a linear transformation from n-tuples

to n minus k-tuples, and the ones whose image is 0 under

this transformation, the ones that map to 0 are legitimate

code words.

So this is a kernel representation, this is an

image representation.

Here, we view the code as the image of the linear

transformation, which is characterized by g, where the

inputs are free.

But the parity-check representation is much more

explicitly, we don't need any auxiliary variables here.

It's simply the set of n-tuples that satisfy a

certain set of linear homogeneous equations.

So this maybe is a place where we start and, of course, it's

the place where Bob Gallager started when he did

low-density parity-check codes, which are a principal

class of capacity-approaching codes.

Then in either case, we characterize the code by, in

one case, directly satisfying some equations.

In the other case, here we say that the total behavior is the

set of all y and u, such that y minus uG equals 0.

So we're going to take the set of all n-tuples and k-tuples,

such that y minus uG equals 0.

And we say that's the total behavior of the code.

The set of all y's and u's that satisfy this equation is

the total behavior.

And then we'll say the code word is just the projection of

the behavior on the y's.

In other words, it's the set of all y's, such that this is

satisfied first some u.

And that's an equivalent description.

So the general setup, going back and forth, is that we

have the observed symbols.

And in the coding case, this will always just be the n

symbols in the code word.

These are the ones that we really care about.

But we're also going to have some hidden or auxiliary

symbols or state variables, hidden variables, which, in

this case, for instance, are the u's.

And we can introduce these at our convenience, just to make

a good realization.

So we have two types of symbols: the observed, or

external variables--

sometimes these are called external

variables, internal variables.

Again, in system theory, you're

consistently seeing this.

There are some that we can actually

observe, interact with.

These are the observed symbols, and then there are

internal variables to the system, very often called

state variables, which we can't directly see but which

are part of the description of the system, as in this case.

The whole system, we find the set of all internal and

external variables that satisfy a set of constraints.

But we only care about the patterns of observed variables

that are consistent with some set of internal variables.

So the final element here in this general setup is

constraints, which in this case are simply linear

homogeneous equations on subsets of variables.

So let me go through similar kinds of examples,

as I do in the notes.

But let me maybe build up from the simplest one first,

parity-check representation of our favorite 8, 4, 4 code.

In this case, the parity-check matrix is the same, we can

take it to be the same as the generator matrix.

Since, by now, we like trellis-oriented generator

matrices, we'll take that one.

So to get very explicit, so the code is the set of all y,

such that y H transpose equals 0, and y is 8-tuple.

So it's the set of all code words that satisfy these

parity-check constraints.

The first one is y1 plus y2 plus y3 plus y4 equals 0.

That's orthogonality to this first code word.

The second one is y2 plus y4 plus y5 plus y7 equals 0.

The third equation is y3 plus y4 plus y5 plus y6 equals 0.

And the fourth is y5 plus y6 plus y7 plus y8 equals 0.

If I have any 8-tuple y, such that the elements of the

8-tuples satisfy these four equations, then it's a code

word, if and only if.

A very explicit description of the code.

Now let's go onto a graph of behavioral realization.

So it's very natural to draw the following kind of graph of

this realization.

We let the y's --

y1, y2, y3, y4 --

sorry, I've changed the index set, haven't I?

I get 1 through 8 now.

I'm just going to make these into vertices of my graph.

So here we have symbols, and I'm going to make a bipartite

graph, and over here I'm going to have equations, or

constraints, or checks.

And I have four checks, which I'm going to draw like this.

The first check is that y1 plus y2 plus y3 plus y4 equals

0, so I'll just draw that like this.

That means the same thing as this.

The second one is y2 plus y4 plus y5 plus y7 equals 0.

And the third one is these four --

two, three, four.

And the last one is these four.

This, this, this, this.

It's done more nicely in the notes.

This is called a Tanner graph, after an extremely good paper

by Michael Tanner in 1981, which was about the only paper

of any consequence in this subject between Gallager's

thesis in 1961 and the rediscovery of low-density

parity-check codes around 1995.

That's a graphical picture of these equations.

And what do we think of?

You can think of testing an 8-tuple.

Take an arbitrary, binary 8-tuple, think of these as

memory elements, put the bits of that 8-tuple in here, and

then ask if all these checks are satisfied.

Since we're talking mod 2 arithmetic, we're basically

asking if there are an even number of 1s attached to each

of these checks.

If and only if that's true, we've got a code 8-tuple.

From the set of all 256 8-tuples, we find 16 that

actually satisfy these checks, and that's the code.

That's one way of describing the code.

So that's how the graph is associated with local


Well call them local, each of these constraints, because

each of them only involves four of the symbols.

In a much larger graph, a low-density parity-check code,

the idea is that there are not very many symbols that are

checked by each check.

The graph is sparse, in that sense.

But we're not quite there yet.

So that's pretty simple.

Let me do a generator representation.

This is going to be a little bit more complicated because

we have these auxiliary input variables.

But that doesn't really complicate it very much.

I keep using this 8, 4, 4 code just to emphasize that a

single code can be described in many different ways.

We're going to wind up with half a dozen different

graphical representations of this code, and they can be

grouped according to style.

So again, we could use our favorite generator matrix for

our favorite code.

It looks like that.

Now what do we mean?

What are our constraints now?

Well, now we have some hidden variables, ui, and we have y1

is equal to u1.

We're going to basically multiply this by u1.

I've got some u1, u2, u3, u4 multiplying this.

So y1 is u1, y2 is u1 plus u2, y3 is equal to u1 plus u3, y4

is equal to u1 plus u2 plus u3, so forth.

y5 is u2 plus u3 plus u4, y6 is u2 plus u4.

I jumped one here.

This is u3 plus u4, y7 is u2 plus u4, and y8 equals u8.

And you will agree that the code is described, if I can

find any pair, (y,u), such that this is satisfied, then y

is an element of the code.

I can choose u freely, that is the input.

If I find a y at any u, such that these equations are

satisfied, then I've found a code word.

And that's an if and only if.

So that's another characterization of the code.

And its Tanner graph looks like this.

Here in this case, we have hidden variables.

So let's put the inputs--

or, we'll see they can also be thought of as states, but

they're hidden variables that we don't actually

see in the code word.

u1, u2, u3, u4.

You can think of these as free-driving variables.

And to get y1, here's what we do.

We're going to take y1 and we're going to create a mod 2

sum of some of these inputs.

In this case, it's just u1, so I can draw it

as a straight through.

But in general, I'm going to have to make a combination to

get all of these.

So I have eight symbols over here --

code symbols --

and again, some constraints that have to be satisfied.

Equations, mod 2 sums, y8.

And I just, again, draw a picture.

y2 is u1 plus u2, so u1 plus u2.

these sum to give y2.

And so, again, if I put down some u's, put down some y's,

test whether all these sums are correct, then I will have

tested whether I found a valid behavior or trajectory.

Next one is u1 plus u3.

The next one is u1, u2, u3.

The next one is u2, u3, u4.

The next one is u3, u4.

The next one is u2, u4.

And the last one is just that one.

OK, so that's another graph whose constraints

describe the code word.

Here it's helpful to start to introduce the following


We fill in the circle of the variables that we actually

want to see, and we leave these open, the free

variables over here.

Some people would call it a generalized Tanner graph.

In the notes, I just call this Tanner graph 2.

Tanner didn't actually consider state variables.

This was introduced into graphical models by a guy

named Wiberg in his PhD thesis in Sweden in about 1995.

And that was a huge contribution, to have hidden

variables as well as observed variables, because otherwise

you couldn't really draw a generator representation as a

graph, or a trellis representation.

You need hidden state variables.

So those are two graphical styles, and the idea here is

that every set of 12 variables, such that these

eight equations are satisfied, gives you a

legitimate code word.

Here you can easily think of this as --

you notice that we've described these as passive


It's not an input-output, block diagram type of


It causes cause-effects.

In this case, it implicitly is.

How would you actually implement this generator


You implement it by picking four input bits and then

seeing what code words they generate.

In other words, there's a very definite input-output relation

here, where we could make this into a directed graph by

drawing arrows here.

In general, in our graphical models, we're

not going to do this.

The behavioral style is very much not in this spirit.

But in this case, we clearly can.

So if you actually wanted to generate all the code words --

that's why it's called a generator representation --

you simply jump through all 16 possibilities here and you

generate all 16 code words.

Or if you wanted to do a simulation, generate code

words, this is the way you would do it.

This has easy cause and effect style.

The kernel representation is much more implicit.

What is the cause and the effect here?

In this case, it's actually possible to find a cause and

effect representation.

It turns out that these four bits can be taken as

information bits, or these four places form an

information set.

So suppose we choose these four bits, and it turns out

that for this graph, we can trace through as follows.

These three bits going into this zero-sum,

these all go in.

That determines this output, right?

d minus 1, if d is the degree of the zero-sum node, any d

minus 1 inputs determine the last output, because the

parity-check has to check.

So that determines this one, which then determines y4.

Now, knowing that, we know all but one input here, because I

also have this one.

And that determines this, and I think it probably also

determines this.

I now have three of the four inputs here,

so I can get this.

And anyway, by tracing through it I can create a directed

graph that gives me a cause and effect

relationship from here.

But it's much more implicit and it can't always be done.

Sometimes you can find an information set that works

with these particular set of equations and sometimes you

can't, sometimes you get stuck.

We can come back to this when we talk about binary erasure

channel and stopping sets.

In this case, given these four, you can determine the

remaining four.

If you think of it in equation terms, if you're given y1, y2,

y3, you can basically go through a Gaussian elimination

of these equations and find y4, y6, y7 and y8.

So you can regard y1, y2, y3, and y5 as inputs.

This is much more the behavioral style, where it's

just simply any n-tuple that satisfies the constraints is a

legitimate n-tuple.

This is more cause and effect style over here.

Since we're at the end, let me just introduce one more thing.

This is a bipartite graph.

We basically have a graph with two types of nodes, one

representing variables, one representing constraints.

And the edges always go between a variable and a


So that's what we mean by a bipartite graph.

There are two types of vertices, and all the edges

connect one type of vertex to another type of vertex.

So is this over here.

I've drawn it in a way where it's not so obvious.

In a generalized Tanner graph, you'd put all the variables

over on one side and the constraints over

on the other side.

In this case, these observed variables go naturally with

the constraints, so I put them over here.

But I could have put them over here and you would see this is

also a bipartite graph.

So this is the general character.

What do we think of the edges as doing in this graph?

We think of the edges as conducting the variables that

they're associated with.

So all of these edges, in fact, convey u1 to this


We could, if we wanted to, label all of the edges with

their associated variables.

That's what they mean.

They mean here that u1 equals y1, or that u1

plus u2 equals y2.

So the edges just serve for communication in this graph.

There's another style of graph, which I introduced, and

therefore, people --

not me -- but other people have called it a Forney graph.

I call it a normal graph.

The difference is that basically the vertices all

equal constraints, and the edges represent variables.

And it has certain advantages that have led to it being used

more and more.

Let me just indicate how we go from a graph of this style to

a graph of this style.

We basically just make all the variables into equality


So in this case, I draw y1, y2, and so forth.

I put a little dongle on them to indicate that they

communicate with the outside world.

They are observed, external variables.

And then I put a little equality constraint that says

all of the variables attached to this constraint have to be

the same, which is really doing the same thing.

Otherwise, the topology of the graph looks the same.

These are, again, zero-sum constraints, or parity-check

constraints, and the connections are

done just as before.

I forget.

I'm not going to do them all out.

The only change, in this case, is to replace external

variables by an equality constraint in this.

So the equality constraint basically says this edge still

equals y1, this edge still equals y2.

That's what the equality constraint means.

Everything tied to that equality

constraint has to be equal.

So we can think of these as just replicas of this original

variable out here.

So far, you don't see any reason to prefer this.

If I do it in this graph, then, again, I get equality

constraints here.

I get zero-sum constraints, again, over here.

Eight of them.

And in this case, I can just represent the external

variables directly by these little dongles.

And again, I have the same graph here.

You still probably don't see that there's very much

difference and, in fact, I'd agree.

There's hardly any difference between these two styles.

So forth.

This is the normal graph that's equivalent to this

Tanner graph, it's obtained in the same way.

If I wanted to do it more painstakingly, I would put

equality constraints for each of these, I'd put little

external variables, and then I'd see I don't really need

the equality constraints.

So I can just compress it into y1 through y8.

The advantages, I will say, of the normal graph will appear

as we go along.

The main advantage is that you can prove a very nice duality

theorem for normal graphs --

that basically shows the duality between generator and

parity-check constraints.

But to say anything about that right now would just

hand-waving, so we'll come back to this next week.

See you in a week, have a nice weekend.

The Description of Lec 18 | MIT 6.451 Principles of Digital Communication II