# Practice English Speaking&Listening with: Lec 17 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005

Normal
(0)
Difficulty: 0

We're going to talk about shortest paths,

and we're going to talk about shortest paths for three

lectures. So, this is a trilogy.

Today will be Shortest Paths One.

I've been watching far too many versions of Star Wars this

weekend. I saw the musical yesterday,

matinee. That was an MIT musical.

That was fun, of all three movies in about

four hours. That was a bit long and then I

saw the one-man show on Friday. One-man Star Wars:

the original three movies in one hour.

That was the opposite of too long.

Both were fun. So I get my trilogy fix.

The New Hope, and we're going to talk about

the shortest paths problem and solve one particular problem of

it, a very interesting version. And then we're going to look at

increasingly more general versions as we go on.

Shortest paths are sort of an application of dynamic

programming, which we saw last week, and greedy algorithms,

which we also saw last week. So, were going to build that

and get some pretty interesting algorithms for an important

problem, which is how to get from Alderon to,

I don't know, Cambridge as quickly as

possible, OK, when you live in a graph.

So, there's geometric shortest paths which is a little bit

harder. Here, we're just going to look

at shortest paths in graphs. Now, hopefully you all know

what a path in a graph is. But, so, very quick review in

particular because we're going to be looking at weighted

graphs. So, the usual setup:

suppose we have directed graph, G, have some vertices,

some edges. We have edge weights,

make it a little more interesting.

So, this is just a real number on each edge.

So, edge weights are usually given by function,

w. For every edge,

you get a real number.

And then, if we look at the paths in the graph,

so we're going to use some simple notation for paths called

a path, p, starts at some vertex, and it goes to some

other vertex, and so on.

Say the last vertex is v_k, and each of these should be a

directed edge in the digraph. So, this is a directed path.

It has to respect edges in here.

And, we'll say that the weight of such a path is just the sum

of the weights of the edges along the path.

And, we'll call that w(p). This is sum,

i equals one to k minus one of w(v_i, v_(i+1)) plus one.

OK, so just to rub it in, and in particular,

how general this can be, we have some path,

it starts at some vertex, there's some edge weights along

the way. This is some arbitrary path in

the graph, in some hypothetical graph.

OK, this is mainly to point out that some of the edge weights

could be negative. Some of them could be zero.

This sum here is minus two. So, the weight of this path is

minus two. And, presumably,

the graph is much bigger than this.

This is just one path in the graph.

We're usually thinking about simple paths that can't repeat a

vertex. But, sometimes we allow that.

And then, what we care about is the shortest path,

or a shortest path. Again, this may not be unique,

but we'll still usually call it the shortest path.

So, we want the shortest path from some A to some B.

Or, we'll call the vertices u and v.

And we want this to be some path of minimum possible weight,

subject to starting at u, and going to v.

OK, so that's what we're looking for.

In general, give you a vertex, u, give you a vertex,

v, find a shortest path as quickly as possible.

What's a good algorithm for that?

That's the topic for the next three lectures.

We'll usually think about a slightly simpler problem,

which is just computing the weight of that path,

which is essentially computing the distance from A to B.

So, we'll call this the shortest path weight from u to

v. And, we'll denote it by delta

of (u,v), delta . So, I mean, it's the weight of

the shortest path, or a weight of every shortest

path. Or, in other words,

it's the Min over the weight of each path from u to v.

So, p here is a path. OK, so you just consider,

there could be a lot of different paths.

There could, in principle,

be infinitely many, if you're allowed to repeat

vertices. You look at all those paths

hypothetically. You take the minimum weight.

Question? Good.

My next question was going to be, when do shortest paths not

exist? And you've hit upon one

version, which is when you have negative edge weights.

So, in principle, when you have negative edge

weights, some shortest paths may not exist in the sense that

there is no shortest paths. There are no shortest paths.

There is no shortest path from u to v.

OK, in particular, if I have two vertices,

u and v, and I want the shortest path between them,

and I have negative edge weights, well,

this is fine. I mean, I can still compute the

weight of a path that has negative weights.

But when specifically won't I have a single shortest path from

u to v? So, go ahead.

Good. So, if I can find the cycle

somewhere along here whose total weight, say, the sum of all the

weights of these images is negative, then I get there,

I go around as many times as I want.

I keep decreasing the weight because the weight is negative.

I decrease it by some fixed amount, and then I can go to v.

So, as long as there is a negative weights cycle reachable

from u that can also reach v, then there's no shortest path

because if I take any particular path, I can make it shorter by

going around a couple more times.

So, in some sense, this is not really a minimum.

It's more like an infimum for those who like to get fancy

about such things. But we'll just say that delta

of (u,v) is minus infinity in this case.

There's a negative weights cycle from u to v.

So, that's one case we have to worry about in some sense.

But, as long as there are no negative weight cycles,

delta of (u,v) will be something bigger than minus

infinity, bounded below by some finite value even if you could

have negative weights, but still no negative weights

cycle for example, there might not be any cycles

in your graph. So that's still interesting.

And, I guess it's useful to note that you can get from A to

B in negative infinite time. It's time travel,

if the weights happen that correspond to time.

But when else might shortest paths not exist?

So, this is one case, but there's another,

simpler case. It's not connected.

There might not be any path from u to v.

This path might be empty. There may be no path from u to

v. Here we have to define what

happens, and here, we'll say it's infinity if

there's no path from u to v. So, there are these exceptional

cases plus infinity and minus infinity, which are pretty

intuitive because it takes a really long time to get from u

to v if there's no path there. You can't get there from here.

OK, but that's the definition. Most of the time,

this is the case we care about, of course.

Usually this is a finite set. OK, good, so that's the

definition. We're going to get a few basic

structural properties about shortest paths that will allow

us to obtain good algorithms finding these paths when they

exist. And, in particular,

we want to use ideas from dynamic programming.

So, if I want to use dynamic programming to solve shortest

paths, what do I need to establish?

What's the first thing I should check?

You've all implemented dynamic programming by now,

so should make complete sense hopefully, at least more sense

than it did a couple of weeks ago, last week,

when we learned it. Dynamic programming is

something that grows on you. Every year I think I understand

it better than the previous year.

But, in particular, when you learned dynamic

programming in this class, there is this nice key property

that you should check. Yeah?

Optimal substructure: good.

This is the phrase you should keep in mind.

It's not really enough for dynamic programming to be useful

in an efficient way, but it at least tells you that

you should be able to try to apply it.

That's a pretty weak statement, but it's something that you

should check. It's definitely pretty much a

necessary condition for dynamic programming to make sense.

And so, optimal some structure here means that if I take some

shortest path, and I look at a subpath of that

shortest path, I claimed that it too is a

shortest path, OK, with its respective

endpoints; obviously not between the same endpoints.

But if I have some shortest path between two endpoints,

I take any subpath and that's also the shortest path.

This is one version of optimal substructure.

This one turns out to be true for this setup.

And, how should I prove an optimal substructure property?

Cut and paste. Yep, that works here too.

I mean, this isn't always true. But it's a good technique here.

So, we're going to think about, and I'll do essentially a proof

by picture here. So, suppose you have some

subpath of some shortest path. So, let's say the subpath is x

to y. And, the path goes from u to v.

So, we assume that (u,v) is a shortest path.

We want to prove that (x,y) is a shortest path.

Well, suppose (x,y) isn't a shortest path.

Then there is some shorter path that goes from x to y.

But, if you have some shorter path from x to y than this one.

Then I should just erase this part of the shortest path from u

to v, and replace it with this shorter one.

So, this is some hypothetical shorter path.

So, suppose this existed. If that existed,

then I should just cut the old path from x to y,

and paste in this new one from x to y.

It's strictly shorter. Therefore, I get a strictly

shorter path from u to v. But I assumed u to v was a

OK, so there is no shorter path.

And that proves the lemma that we have this:

subpaths of shortest paths are shortest paths.

OK, this should now be a pretty familiar proof technique.

But, there is yet another instance of cut and paste.

OK, so that's a good sign for computing shortest paths.

I mean, in terms of dynamic programming, we won't look

directly at dynamic programming here because we are going to aim

for greedy, which is even stronger.

But, next Monday we'll see some dynamic programming approaches.

Intuitively, there are some pretty natural

sub-problems here. I mean, going from u to v,

if I want to find what is the shortest path from u to v,

well, that's a particular problem.

Maybe it involves computing shortest paths from u to some

intermediate point, x, and then from x to u,

something like that. That feels good.

many subproblems. And, V^2 subproblems,

it sounds like that would lead to a dynamic program.

You can make it work out; it's just a little bit trickier

than that. We'll see that next Monday.

something called the triangle inequality.

So, you've probably heard some form of the triangle inequality

before. It holds in all sorts of

geometric spaces, but it also holds for shortest

paths, which is slightly less obvious, or more obvious,

I guess, depending on your inclination.

So, if you have any triple of vertices, the shortest path from

u to v is, at most, the shortest path from u to x

plus the shortest path from x to v.

Of course, here I need a shortest path weight from u to

x, and shortest path weight from x to v.

So, this should be pretty natural just from the statement,

even more natural if you draw the picture.

So, we have some vertex, u.

I'm using wiggly lines to denote potentially long paths as

opposed to edges. We have some intermediate

point, x, and we have some target, v, and we are

considering these three shortest paths.

This is the shortest path from u to v, or this is its weights.

This is the shortest path from u to x.

And here's its weight, and the shortest path from x to

v. And here's its weight.

And, the point is, this should be the shortest

path or a shortest path from u to v.

And, in particular, one such path is you go from u

to x, and then you go from x to v.

So, I mean, this sum is just measuring the length of this

particular path. Take the shortest path here;

take the shortest path here. And, this is supposed to be the

Min over all paths. So, certainly this is,

at most, this particular path, the sum of these two values,

OK, another proof by picture. Clear?

OK, this stuff is easy. I assume we'll get into some

more set exciting algorithms in particular, which is always more

exciting. Today, we're going to look at a

particular version of shortest paths called,

or the shortest paths problem called the single source

shortest path problem. OK, it's a little bit more

general than go from A to B. The problem is,

you're given a source vertex, and you want to know how to get

from that source vertex to everywhere else.

So, we'll call this source vertex s.

And from that source, we want to find,

let's say, the shortest path weights from s to everyone.

In particular, we'd also like to know the

shortest paths, but that isn't too much harder.

So, that's delta of s, v for all vertices,

v. OK, so this is actually a

little bit harder than the problem we started with a

getting from Alderon to Cambridge.

Now, we want to get from Alderon to the entire universe.

OK, it turns out, this is one of the weird things

about shortest paths, according to the

state-of-the-art we know today, it seems like the following

statement will remain true for all time.

But we don't know. The best algorithms for solving

the A to B problem, given s, given t,

go from s to t, is no easier than this problem.

It's the best ways we know how to solve going from A to B is to

solve how to go from A to everywhere else.

So, we sort of can't help ourselves, but to solve this

problem it turns out. Today, we're going to look at a

further restriction on this problem because this is a bit

tricky. Will solve it next class.

But, today we're going to get rid of the negative weight cycle

issue by forbidding negative weights.

So, we're going to assume that all of the edge weights are

nonnegative, so, for all vertices,

u and v. So, in particular,

shortest paths exist, provided paths exist.

And, we don't have to worry about these minus infinities.

Delta of (u,v) is always bigger than minus infinity.

It still might be plus infinity if there is no path,

but this will make life a lot easier.

And the algorithm we'll cover today really requires this

property. You can't get away without it.

Next class, we'll get away without it with a fancier and

slower algorithm. So, as I hinted at,

the main idea we're going to use for the algorithm today is

greedy, which should be faster than dynamic programming

generally. And, the tricky part will be

proving that the greedy algorithm actually works.

So, I think there's pretty much only one natural way to go

let's say. This may be not the obvious

one. So, let me give you a little

bit of setup. The invariant we are going to

maintain is that at all times, we have estimates on the

distances from the source to every vertex.

When I say distance, I mean shortest path weight.

I'm going to use weight and distance interchangeably here

for more intuition. And, in particular,

I want to maintain the set of vertices where those estimates

OK, this is little s. This is big S.

So, the big S will be the set of all vertices where I know the

answer. What is the shortest path

distance from little S to that vertex in big S?

So, for starters, which distance do I know?

Sorry? s.

I know the shortest path distance from s to s because if

I assume that all of my weights are nonnegative,

I really can't get from s to s any faster than not doing

anything. OK, if I had a negative weight

cycle, maybe the distance from s to s is minus infinity.

OK, but I can't have negative weights so there's no way I can

get from s to s any faster than zero time.

There might be a longer path that still has zero cost,

but it can't be any better than zero.

So, in particular, I know that.

So, initially, S is certainly an s.

OK, and the idea is we're going to accumulate more and more

vertices that we know. So, at some point we know the

distances from some of the vertices.

So, we have some cloud here. This is S, and this is

everything else. This is the graph,

G. This is the subset of the

vertices. And, there's some edges that go

out from there. And, so we have estimates on

how to get to these vertices. Some of them,

we may not have even seen yet. They may not be connected to

this portion of S. I mean: not directly.

They might be connected by some longer path.

They might be in a completely different connected component.

We don't know yet. Some of them,

we have estimates for because we've sort of seen how to get

there from S. And the idea is,

among all of these nodes where we have estimates,

and on to get from little S, which is some vertex in here,

to these vertices, we're going to take the one for

which the estimate is smallest. That's the greedy choice.

And, we're just going to add that vertex to S.

So, S grows one vertex per step.

Each step, we're going to add to S, the vertex.

Of course, again, this is not a unique,

it's a vertex, v, in V minus S.

So, it's something we haven't yet computed yet whose estimated

distance from S is minimum. So, we look at all the vertices

we haven't yet added to S. Just take the one where we have

the estimated smallest distance. The intuition is that that

should be a good choice. So, if I pick the one that's

closest to little s among all the ones that I've seen,

among all the paths that I've seen, I sort of have to buy into

that those are good paths. But, I mean,

maybe there's some path I didn't see.

Maybe you go out to here and then you take some other path to

some vertex, which we've already seen.

OK, the worry is, well, I'd better not say that

that's the shortest path because there may have been some other

way to get there. Right, as soon as I add

something to S, I declare I've solved the

problem for that vertex. I can't change my answer later.

OK, the estimates can change until they get added to S.

So, I don't want to add this vertex to S because I haven't

considered this path. Well, if all my weights are

nonnegative, and I take the vertex here that has the

shortest estimate from S, so let's suppose this one is

the shortest one, then this can't be a shorter

path because the distance estimate, at least,

from S to that vertex is larger from S to that vertex.

So, no way can I make the path longer and decrease the

distance. That's the intuition.

OK, it's a little bit fuzzy here because I don't have any

induction hypotheses set up, and it's going to be a lot more

work to prove that. But that's the intuition why

this is the right thing to do. OK, you have to prove something

about the distance estimates for that to be a proof.

But, intuitively, it feels good.

It was a good starting point. OK, and then presumably we have

to maintain these distance estimates.

So, the heart of the algorithm is updating distance estimates,

I mean, choosing the best vertex to add to S,

that's one step. Then, updating the distance

estimates is sort of where the work is.

And, it turns out we'll only need to update distance

estimates of some of the vertices, the ones that are

to S. So, once we add somebody to S,

so we grow S by a little bit, then we look at all the new

edges that go out of S from that vertex.

We update something. That's the idea.

So, that's the idea for how we're going to use greedy.

Now I'll give you the algorithm.

So, this is called Dijkstra's algorithm.

Dijkstra is a famous, recently late,

if that makes sense, computer scientist from the

Netherlands. And, this is probably the

algorithm he is most famous for. So, the beginning of the

algorithm is just some initialization,

not too exciting. OK, but let me tell you what

some of the variables mean. OK, so d is some array indexed

by vertices, and the idea is that d of x is the distance

estimate for x, so, from S to x.

so in particular, it's going to equal the real

shortest path weight from S to x when we've added x to our set

capital, S. OK, so this is,

in particular, going to be the output to the

algorithm. Did you have a question?

Or were you just stretching? Good.

So, in d of x, when we are done,

d of x is the output. For every vertex,

it's going to give us the shortest path weight from S to

that vertex. Along the way,

it's going to be some estimated distance from S to that vertex.

And, we're going to improve it over time.

This is an infinity. So initially,

we know that the distance, we know the distance from S to

S is zero. So, we're going to set that to

be our estimate. It's going to be accurate.

Everything else we're going to just set to infinity because we

may not be connected. From the beginning,

we don't know much. S, initially,

is going to be infinity. Immediately,

we're going to add little s to big S.

And then, the interesting part here is Q, which is going to

consist of, initially all the vertices in the graph.

And, it's going to not just be a queue as the letter suggests.

It's going to be a priority queue.

So, it's going to maintain, in particular,

the vertex that has the smallest distance estimate.

So, this is a priority queue. This is really an abuse of

notation for a data structure. OK, so this could be a heap or

whatever. The vertices are keyed on d,

our distance estimate. So, in particular,

S will have the, this is going to be a Min heap.

S will be the guy who has the minimum.

Everyone else has the same key initially.

And, we're going to repeatedly extract the minimum element from

this queue and do other things. OK, so this is initialization.

OK, I'm going to call that initialization.

It's a pretty simple thing. It just takes linear time,

nothing fancy going on. The heart of the algorithm is

all in six lines. And, so this is not really a

step. The first step here that we

need to do is we take the vertex whose distance estimate is

minimum. So that, among all the

vertices, not yet, and that's currently S is

empty. Q has everyone.

In general, Q will have everyone except S.

So, we'll take the vertex, u, that has the minimum key in

that priority queue. So, extract the Min from Q.

OK. We're going to add a little u

to S, claim that that is now, I mean, that's exactly what

we're saying here. We add to S that vertex that

has minimum distance estimate. And now, we need to update the

distances. So, we're going to look at each

We look at a few distances.

So that's the algorithm or more or less.

This is the key. I should define it a little bit

what's going on here. We talked mainly about

undirected graph last time. Here, we're thinking about

undirected graphs. And, the adjacency list for u

here is just going to mean, give me all the vertices for

which there is an edge from u to v.

So, this is the outgoing adjacency list,

Undirected graphs: you list everything.

Directed graphs: here, we're only going to care

about those ones. So, for every edge,

(u,v), is what this is saying, we are going to compare the

current estimate for v, and this candidate estimate,

which intuitively means you go from s to u.

That's d of u because we now know that that's the right

equals, we hope, assuming the algorithm is

correct, this should be the shortest path weight from s to u

because we just added u to S. And whenever we add something

to S, it should have the right value.

So, we could say, well, you take the shortest

path from S to u, and then you follow this edge

from u to v. That has weight,

w, of (u,v). That's one possible path from S

to v. And, if that's a shorter path

than the one we currently have in our estimate,

if this is smaller than that, then we should update the

estimate to be that sum because that's a better path,

so, add it to our database of paths, so to speak:

OK, very intuitive operation; clearly should not do anything

bad. I mean, these should be paths

that makes sense. We'll prove that in a moment.

That's the first part of correctness, that this never

screws up. And then, the tricky part is to

show that it finds all the paths that we care about.

This step is called a relaxation step.

Relaxation is always a difficult technique to teach to

MIT students. It doesn't come very naturally.

But it's very simple operation. It comes from optimization

terminology, programming terminology, so to speak.

And, does this inequality look familiar at all especially when

you start writing it this way? You say, the shortest path from

S to v and the shortest path from S to u in some edge from u

to v, does that look like anything we've seen?

In fact, it was on this board but I just erased it.

Triangle inequality, yeah.

So, this is trying to make the triangle inequality true.

Certainly, the shortest path from S to v should be less than

or equal to, not greater than. The shortest path from S to u,

plus whatever path from u to v, the shortest path should be,

at most, that. So, this is sort of a somewhat

more general triangle inequality.

And, we want to, certainly it should be true.

So, if it's not true, we fix it.

If it's greater than, we make it equal.

But we don't want to make it less than because that's not

always true. OK, but certainly,

it should be less than or equal to.

So, this is fixing the triangle inequality.

It's trying to make that constraint more true.

In optimization, that's called relaxing the

constraint. OK, so we're sort of relaxing

the triangle inequality here. In the end, we should have all

the shortest paths. That's a claim.

So: a very simple algorithm. Let's try it out on a graph,

and that should make it more intuitive why it's working,

and that the rest of the lecture will be proving that it

works. Yeah, this is enough room.

So, oh, I should mention one other thing here.

Sorry. Whenever we change d of v,

this is changing the key of v in the priority queue.

So, implicitly what's happening here in this assignment,

this is getting a bit messy, is a decreased key operation,

OK, which we talked briefly about last class in the context

of minimum spanning trees where we were also decreasing the key.

The point is we were changing the key of one element industry

like station step in the priority queue so that if it now

becomes the minimum, we should extract here.

And, we are only ever decreasing keys because we are

always replacing larger values with smaller values.

So, we'll come back to that later when we analyze the

running time. But, there is some data

structure work going on here. Again, we are abusing notation

a bit. OK, so here is a graph with

edge weights.

OK, and I want my priority queue over here.

And, I'm also going to draw my estimates.

OK, now I don't want to cheat. So, we're going to run the

algorithm on this graph. s will be A,

and I want to know the shortest path from A to everyone else.

So, you can check, OK, paths exist.

So, hopefully everything should end up a finite value by the

end. All the weights are

nonnegative, so this algorithm should work.

The algorithm doesn't even need connectivity,

but it does mean that all the weights are nonnegative.

So, we run the algorithm. For the initialization,

we set the distance estimate for our source to be zero

because, in fact, there's only one path from A to

A, and that to do nothing, the empty path.

So, I'm going to put the key of zero over here.

And, for everyone else, we're just going to put

infinity because we don't know any better at this point.

So, I'll put keys of infinity for everyone else.

OK, so now you can see what the algorithm does is extract the

minimum from the queue. And, given our setup,

we'll definitely choose s, or in this case,

A. So, it has a weight of zero.

Everyone else has quite a bit larger weight.

OK, so we look at s, or I'll use A here.

So, we look at A. We add A to our set,

S. So, it's now removed from the

queue. It will never go back in

vertices, and extract, and decrease keys.

But we never insert. So, A is gone.

OK, and now I want to update the keys of all of the other

vertices. And the claim is I only need to

look at the vertices that have edges from A.

So, there's an edge from A to B, and that has weight ten.

And so, I compare: well, is it a good idea to go

from A to A, which costs nothing, and then to go along

this edge, AB, which costs ten?

Well, it seems like a pretty good idea because that has a

total weight of zero plus ten, which is ten,

which is much smaller than infinity.

So, I'm going to erase this infinity; write ten,

and over in the queue as well. That's the decreased key

operation. So now, I know a path from A to

B. Good.

A to C is the only other edge. Zero plus three is less than

infinity, so, cool.

I'll put three here for C, and C is there.

OK, the other vertices I don't touch.

I'm going to rewrite them here, but the algorithm doesn't have

to copy them. Those keys were already there.

It's just touching these two. OK, that was pretty boring.

Now we look at our queue, and we extract the minimum

element. So, A is no longer in there,

so the minimum key here is three.

So, the claim is that this is a shortest path;

from A to C, here is the shortest path from

A to C. There's no other shorter way.

You could check that, and we'll prove it in a moment.

Cool, so we'll remove C from the list.

It's gone. Then we look at all of the

outgoing edges from C. So, there's one that goes up to

B, which has weight four, four plus three,

which is the shortest path weight from A to C.

So, going from A to C, and C to B should cost three

plus four, which is seven, which is less than ten.

So, we found an even better path to get to B.

It's better to go like this than it is to go like that.

So, we write seven for B, and there's an outgoing edge

from C to d which costs eight. Three plus eight is 11.

11 is less than infinity last time I checked.

So, we write 11 for d. Then we look at E.

We have three plus two is five, which is less than infinity.

So, we write five for the new key for E.

At this point, we have finite shortest paths

to everywhere, but they may not be the best

ones. So, we have to keep looking.

OK, next round of the algorithm, we extract the

minimum key among all these. OK, it's not B,

which we've seen though probably know the answer to.

But it's E. E has the smallest key.

So, we now declare this to be a shortest path.

The way we got to E was along this path: A to C,

C to E, declare that to be shortest.

We claim we're done with E. But we still have to update.

What about all the outgoing edges from E?

There's only one here. It costs five plus nine,

which is 14, which is bigger than 11.

So, no go. That's not an interesting path.

Our previous path, which went like this at a cost

of the 11, is better than the one we are considering now.

I'm drawing the whole path, but the algorithm is only

adding these two numbers. OK, good.

So, I don't change anything. Seven, 11, and five is removed,

or E is removed. Our new keys are seven and 11.

So, we take the key, seven, here,

which is for element B, vertex B.

We declare the path we currently have in our hands from

A to B, which happens to be this one.

Algorithm can't actually tell this, by the way,

but we're drawing it anyway. This path, A,

C, B, is the candidate shortest path.

The claim is it is indeed shortest.

Now, we look at all the outgoing edges.

There's one that goes back to C at a cost of seven plus one,

which is eight, which is bigger than three,

which is good. We already declared C to be

done. But the algorithm checks this

path and says, oh, that's no better.

And then we look at this other edge from B to d.

That costs seven plus two, which is nine,

which is better than 11. So, we, in fact,

found an even shorter path. So, the shortest path weight,

now, for d, is nine because there is this path that goes A,

C, B, d for a total cost of three plus four plus two is

nine. Cool, now there's only one

element in the queue. We remove it.

d: we look at the outgoing edges.

There's one going here which costs nine plus seven,

which is 16, which is way bigger than five.

So, we're done. Don't do anything.

At this point, the queue is empty.

And the claim is that all these numbers that are written here,

the final values are the shortest path weights.

This looks an awful lot like a five, but it's an s.

It has a weight of zero. I've also drawn in here all the

shortest paths. And, this is not hard to do.

We're not going to talk about it too much in this class,

but it's mentioned in a little bit more detail at the end of

the textbook. And it's something called the

shortest path tree. It's just something good to

know about if you actually want to compute shortest paths.

In this class, we mainly worry about the

weights because it's pretty much the same problem.

The shortest path tree is the union of all shortest paths.

And in particular, if you look at each vertex in

your graph, if you consider the last edge into that vertex that

was relaxed among all vertices, u, you look at the edges,

(u,v), say, was that last one to relax?

So, just look at the last edges we relaxed here.

You put them all together: that's called a shortest path

tree. And, it has the property that

from S to everywhere else, there is a unique path down the

tree. And it's the shortest path.

It's the shortest path that we found.

OK, so you actually get shortest paths out of this

algorithm even though it's not explicitly described.

All we are mainly talking about are the shortest path weights.

Algorithm clear at this point? Feels like it's doing the right

thing? You can check all those numbers

are the best paths. And now we're going to prove

that.

So: correctness.

So the first thing I want to prove is that relaxation never

makes a mistake. If it ever sets d of v to be

something, I want to prove that d of v is always an upper bound

on delta. So, we have this variant.

It's greater than or equal to delta of s, v for all v.

And, this invariant holds at all times.

So, after initialization, it doesn't hold before

initialization because d isn't defined then.

But if you do this initialization where you set S

to zero, and everyone else to infinity, and you take any

sequence of relaxation steps, then this variant will hold

after each relaxation step you apply.

This is actually a very general lemma.

It's also pretty easy to prove. It holds not only for

Dijkstra's algorithm, but for a lot of other

algorithms we'll see. Pretty much every algorithm we

see will involve relaxation. And, this is saying no matter

what relaxations you do, you always have a reasonable

estimate in the sense that it's greater than or equal to the

true shortest path weight. So, it should be converging

from above. So, that's the lemma.

Let's prove it. Any suggestions on how we

should prove this lemma? What technique might we use?

What's that? Cut and paste?

It would be good for optimal substructure.

Cut and paste: maybe sort of what's going on

here but not exactly. Something a little more

general. It's just intuition here;

it doesn't have to be the right answer.

In fact, many answers are correct, have plausible proofs.

Induction, yeah. So, I'm not going to write

induction here, but effectively we are using

induction. That's the answer I was

expecting. So, there is sort of an

induction already in time going on here.

We say after initialization it should be true.

That's our base case. And then, every relaxation we

do, it should still be true. So, we're going to assume by

induction that all the previous relaxations worked,

and then we're going to prove that the last relaxation,

whatever it is, works.

So, first let's do the base case.

So, this is after an initialization,

let's say, initially. So, initially we have d of s

equal to zero. And we have d of v equal to

infinity for all other vertices, for all vertices,

v, not equal to little s. OK, now we have to check that

this inequality holds. Well, we have delta of s,

that's zero. You can't get negative when

there are only nonnegative edge weights.

So, that's the best. So, certainly zero is greater

than or equal to zero. And, we have everything else,

well, I mean, delta of S, v is certainly less

than or equal to infinity. So this holds.

Everything is less than or equal to infinity.

So: base case is done. So, now we do an induction.

And, I'm going to write it as a proof by contradiction.

So, let's say, suppose that this fails to hold

at some point. So, suppose for contradiction

that the invariant is violated. So, we'd like to sue the

So, it's going to be violated. So, let's look at the first

violation, the first time it's violated.

So, this is, essentially,

again, a proof by induction. So, let's say we have some

violation, d of v is less than delta of s, v.

That would be bad if we somehow got an estimate smaller than the

shortest path. Well, then I think about

looking at the first violation is we know sort of by induction

that all other values are correct.

OK, d of v is the first one where we've screwed up.

So, the invariant holds everywhere else.

Well, what caused this to fail, this invariant to be violated,

is some relaxation, OK, on d of v.

So, we had some d of v, and we replaced it with some

other d of u plus the weight of the edge from u to v.

And somehow, this made it invalid.

So, d of v is somehow less than that.

We just set d of v to this. So, this must be less than

delta of s, v. The claim is that that's not

possible because, let me rewrite a little bit.

We have d of u plus w of (u,v). And, we have our induction

hypothesis, which holds on u, u of some other vertex.

We know that d of u is at least delta of s, u.

So, this has to be at least delta of s, u plus w of u,

v? Well, that's some path from u

to v. So, it's got to be bigger than

the shortest path or equal. So certainly,

this is greater than or equal to delta of u,

v. OK, it could be larger if

there's some multi-edged path that has a smaller total weight,

but it's certainly no smaller than delta of u,

v. And, this looks like a good

summation, delta of S to u, and u to v is a triangle

inequality, yeah. So, that is,

it's upside down here. But, the triangle S,

u, u to v, so this is only longer than S to v.

OK, so we have this thing, which is simultaneously greater

than or equal to the shortest path weight from S to v,

and also strictly less than the shortest path weight from S to

Maybe contradiction is the most intuitive way isn't the most

intuitive way to proceed. The intuition,

here, is whatever you assign d of v, you have a path in mind.

You inductively had a path from s to u.

Then you added this edge. So, that was a real path.

We always know that every path has weight greater than or equal

to the shortest path. So, it should be true,

and here's the inductive proof. All right, moving right along,

so this was an easy warm-up. We have greater than or equal

to. Now we have to prove less than

or equal to at the end of the algorithm.

This is true all the time; less than or equal to will only

be true at the end. So, we are not going to prove

less than or equal to quite yet. We're going to prove another

lemma, which again, so both of these lemmas are

useful for other algorithms, too.

So, we're sort of building some shortest path theory that we can

apply later. This one will give you some

it's actually good. Not only does it not screw up

anything, but it also makes progress in the following sense.

So, suppose you knew the shortest path from s to some

vertex. OK, so you go from s to some

other vertices. Then you go to u.

Then you go to v. Suppose that is a shortest path

from s to v. OK, and also suppose that we

already know in d of u the shortest path weight from s to

u. So, suppose we have this

equality. We now know that we always have

a greater than or equal to. Suppose they are equal for u,

OK, the vertex just before v in the shortest path.

OK, and suppose we relax that edge, (u,v), OK,

which is exactly this step. This is relaxing the edge,

(u,v). But we'll just call it

relaxation here. After that relaxation,

d of v equals delta of (s,v). So, if we had the correct

answer for u, and we relax (u,v),

then we get the correct answer for v.

OK, this is good news. It means, if inductively we can

somehow get the right answer for u, now we know how to get the

right answer for v. In the algorithm,

we don't actually know what the vertex just before v in the

shortest path is, but in the analysis we can

pretty much know that. So, we have to prove this

lemma. This is actually even easier

than the previous one: don't even need induction

because you just work through what's going on in relaxation,

and it's true. So, here we go.

So, we're interested in this value, delta of Ss v.

And we know what the shortest path is.

So, the shortest path weight is the weight of this path.

OK, so we can write down some equality here.

Well, I'm going to split out the first part of the path and

the last part of the path. So, we have,

I'll say, the weight from s, so, this part of the path from

s to u, plus the weight of this edge, u, v.

Remember, we could write w of a path, and that was the total

weight of all those edges. So, what is this,

the weight of this path from S to u?

Or, what property should I use to figure out what that value

is? Yeah?

s to v is the shortest path, right?

So, by optimal substructure, from s to u is also a shortest

path. So, this is delta of s,

u. Cool.

We'll hold on for now. That's all we're going to say.

On the other hand, we know from this lemma that

matter what we do, d of v is greater than or equal

to delta of s, v.

So, let's write that down. So, there's a few cases,

and this will eliminate some of the cases.

By that lemma correctness one, we know that d of v is greater

than or equal to delta of s, v.

So, it's either equal or greater than at all times.

So, I'm thinking about the time before we do the relaxation,

this (u,v). So, at that point,

this is certainly true. So, either they're equal before

relaxation or it's greater.

OK, if they are equal before relaxation, we're happy because

relaxation only decreases values by correctness one.

It can't get any smaller than this, so after relaxation it

will also be equal. OK, so in this case we're done.

So, that's a trivial case. So let's now suppose that d of

v is greater than delta of s, v before relaxation.

That's perfectly valid. Hopefully now we fix it.

OK, well the point is, we know this delta s,

v. It is this sum.

OK, we also know this. So, delta of s,

u we know is d of u. And, we have this w u,

v. So, delta of s,

v is d of u plus w of (u,v) because we are assuming we have

this shortest path structure where you go from s to u,

and then you follow the edge, (u,v).

So, we know this. So, we know d of v is greater

than d of u plus w of (u,v). By golly, that's this condition

in relaxation. So, we're just checking,

relaxation actually does something here.

OK, if you had the wrong distance estimate,

this if condition is satisfied. Therefore, we do this.

So, in this case, we relax.

So, I'm just relaxing. Then, we set d of v to d of u

plus WUV, which is what we want. OK, so we set d of v to d of u

plus w of (u,v). And, this equals,

as we said here, delta of S, v,

which is what we wanted to prove.

Done. OK, I'm getting more and more

excited as we get into the meat of this proof.

Any questions so far? Good.

Now comes the hard part. These are both very easy

lemmas, right? I'll use these two boards.

We don't need these proofs anymore.

We just need these statements: correctness one,

correctness lemma; great names.

So, now finally we get to correctness two.

So, we had one and one and a half.

So, I guess correctness is, itself, a mini-trilogy,

the mini-series. OK, so correctness two says

when the algorithm is done, we have the right answer.

This is really correctness. But, it's going to build on

correctness one and correctness lemma.

So, we want d of v to equal delta of s, v for all vertices,

v at the end of the algorithm. That is clearly our goal.

Now, this theorem is assuming that all of the weights are

nonnegative, just to repeat. It doesn't assume anything

else. So, it's going to get the

infinities right. But, if there are minus

infinities, all bets are off. OK, even if there's any

negative weight edge anywhere, it's not going to do the right

thing necessarily. But, assuming all the weights

are nonnegative, which is reasonable if they're

measuring time. Usually it costs money to

travel along edges. They don't pay you to do it.

But who knows? So, I need just to say a few

things. One of the things we've

mentioned somewhere along the way is when you add a vertex to

S, you never change its weight. OK, that actually requires

proof. I'm just going to state it

here. It's not hard to see.

d of v doesn't change. OK, this is essentially an

induction once v is added to S. OK, this will actually followed

by something we'll say in a moment.

OK, so all I really care about is when a vertex is added to S,

we better have the right estimate because after that,

we're not going to change it, let's say.

OK, we could define the algorithm that way.

second. So, all we care about is

whether d of v equals delta of s, v.

That's what we want to prove. So, it's clearly that.

It should be true at the end. But, it suffices to prove that

it holds when v is added to S, to capital S.

OK, this actually implies the first statement.

It has sort of a funny implication.

But, if we can prove this, that d of v equals delta of s,

v, when you add to S, we know relaxation only

decreases value. So, it can't get any smaller.

It would be from correctness one.

Correctness one says we can't get any smaller than delta.

So, if we get a quality at that point, we'll have a quality from

then on. So, that actually implies d of

v never changes after that point.

OK, so we're going to prove this.

Good. Well, suppose it isn't true.

So this would be a proof by a contradiction.

Suppose for contradiction that this fails to hold.

And, let's look at the first failure.

Suppose u is the first vertex --

I want to consider the time right before it's added to S,

for which we don't have what we want.

These are not equal. d of u does not equal delta of

s, u. Well, if they're not equal,

we know from correctness one that d of E is strictly greater

than delta of s, u, so, d of u.

So, we have d of u is strictly greater than delta of s,

u. OK, that's the beginning of the

proof, nothing too exciting yet, just some warm-up.

OK, but this, used already correctness one.

I think that's the only time that we use it in this proof.

OK, so I sort of just want to draw picture of what's going on.

But I need a little bit of description.

So, let's look at the shortest path.

Somehow, d of u is greater than the shortest path.

So, consider the shortest path or a shortest path.

Let p be a shortest path, not just any shortest path,

but the shortest path from s to u.

OK, so that means that the weight of this path is the

shortest path weight. So, we have some equations for

what's going on here. So, we care about delta of s,

u. Here's a path with that weight.

It's got to be one because shortest paths exist here;

slight exceptional cases if it's a plus infinity,

but I'm not going to worry about that.

So, let me draw a picture somewhere.

So, we have s. We have u.

Here is the shortest path from s to u.

That's p. No idea what it looks like so

far. Now, what we also have is the

notion of capital S. So, I'm going to draw capital

S. So, this is big S.

We know that little s is in big S.

We know that u is not yet in big S.

So, I haven't screwed up anything yet,

right? This path starts in S and

leaves it at some point because until we are about to add u to

S, so it hasn't happened yet, so u is not in S.

Fine. What I want to do is look at

the first place here where the path, p, exits S.

So, there is some vertex here. Let's call it x.

There's some vertex here. We'll call it y.

OK, possibly x equals S. Possibly y equals u.

But it's got to exit somewhere, because it starts inside and

ends up outside. And it's a finite path.

OK, so consider the first time it happens; not the second time,

the first. OK, so consider the first edge,

(x,y), where p exits capital S. The shortest path from s to u

exits capital S. It's got to happen somewhere.

Cool, now, what do we know? Little x is in S.

u to S, and that was the first violation of something in S that

has the wrong d of x estimate. So, d of x equals delta of s,

x. Because we are looking at the

first violation, x is something that got added

before. So, by induction on time,

or because we had the first violation, d of x equals the

shortest path weight from S to x.

So, that's good news. Now we are trying to apply this

lemma. It's the only thing left to do.

We haven't used this lemma for anything.

So, we have the setup. If we already know that one of

the d values is the right answer, and we relaxed the edge

that goes out from it, then we get another right

answer. So that's what I want to argue

over here. We know that the d of x equals

this weight because, again, subpaths of shortest

paths are shortest paths. We have optimal substructure,

so this is a shortest path, from S to x.

It might not be the only one, but it is one.

So we know that matches. Now, I want to think about

relaxing this edge, (x,y).

Well, x is in capital S. And, the algorithm says,

whenever you add a vertex, u, to the big set,

S, you relax all the edges that go out from there.

OK, so when we added x to S, and we now look far in the

Right after we added x to S, we relax this edge,

(x,y), because we relaxed every edge that goes out from x,

OK, whatever they were. Some of them went into S.

Some of them went out. Here's one of them.

So, when we added x to S, we got XS.

When we added x to S, we relaxed the edge,

(x,y). OK, so now we're going to use

the lemma. So, by the correctness lemma --

What do you get? Well, we add this correct

shortest path weight to x now. We relax the edge,

(x,y). So, now we should have the

correct shortest path weight for y.

d of y equals delta of s, y.

OK, this is sometime in the past.

In particular, now, it should still be true

because once you get down to the right answer you never change

it. OK, we should be done.

OK, why are we done? Well, what else do we know

here? We assumed something for

We assume somehow, d of u is strictly greater than

delta of s, u. So, d of u here is strictly

greater than the length of this whole path.

Well, we don't really know whether u equals y.

It could, could not. And, but what do we know about

this shortest path from S to y? Well, it could only be shorter

than from S to u because it's a subpath.

And it's the shortest path because it's the subpath of the

shortest path. The shortest path from S to y

has to be less than or equal to the shortest path from S to u.

OK, S to y: less than or equal to s, u, OK, just because the

subpath. I'm closer.

I've got delta of s, u now.

Somehow, I want to involve d of u.

So, I want to relate d of y to d of u.

What do I know about d of u? Yeah?

d of u is smaller because we have a Min heap,

yeah. We always chose,

let's erase, it's way down here.

We chose u. This is the middle of the

algorithm. It's the reason I kept this to

be the minimum key. This is keyed on d.

So, we know that at this moment, when we're trying to add

u to S, right, y is not in S,

and u is not in S. They might actually be the same

vertex. But both of these vertices,

same or not, are outside S.

We chose u because d of u has the smallest d estimate.

So, d of y has to be greater than or equal to d of u.

It might be equal if they're the same vertex,

but it's got to be greater than or equal to.

So, d of y here is greater than or equal to d of u.

So, here we're using the fact that we actually made a greedy

choice. It's the one place we're using

the greedy choice. Better use it somewhere because

you can't just take an arbitrary vertex and declare it to be

done. You've got to take the greedy

one. OK, now we have d of u is less

than or equal to delta of s, u, which contradicts this.

OK, sort of magical that that all just worked out.

But sort of like the previous proofs, you just see what

happens and it works. OK, that's the approximation.

The only real idea here is to look at this edge.

In fact, you could look at this edge too.

But let's look at some edge that comes from S and goes out

of S, and argue that while x has to be correct,

and now, why the hell are we looking at u?

y is the thing you should have looked at.

right answer. If u equals y,

that's fine, or if u and y were sort of

equally good, that's also fine if all these

weights were zero. So, the picture might actually

look like this. But, in that case,

d of u is the correct answer. It was delta SU.

We assumed that it wasn't. That's where we're getting a

Go over this proof. It's a bit complicated,

naturally. OK, we have a little bit more

to cover, some easier stuff. OK, the first thing is what's

the running time of this algorithm?

I'll do this very quick because we're actually seen this many

times before last class. There was some initialization.

The initialization, which is no longer here,

is linear time. No big deal.

OK, extract Min. Well, that's some data

structure. So, we have something like size

of V. Every vertex we extract the Min

once, and that's it. So, size of V,

extract mins. OK, so that's pretty simple.

OK, then we had this main loop. This is a completely conceptual

operation. S is not actually used in the

algorithm. It's just for thinking.

OK, so this takes zero time. Got to love it.

OK, and now the heart is here. So, how many times does this

loop iterate? That's the degree of u.

So, what is the total number of times that we execute a

relaxation step? It doesn't necessarily mean we

do this, but we at least execute this body.

Over the whole algorithm, how many times do we do this?

Every vertex, we look at all the outgoing

edges from there. So, the total would be?

Number of edges, yeah.

So, this number of edges iterations.

OK, this is essentially the handshaking lemma we saw last

time, but for directed graphs. And we are only looking at the

outgoing edges. So, it's not a factor of two

here because you're only outgoing from one side.

So, we have number of reiterations.

In the worst case, we do a decreased key for

everyone. So, at most:

E decreased keys. OK, so the time is,

well, we have v extract Mins, so the time to do an extract

Min, whatever that is. And we have E decreased keys,

whatever that is, and this is exactly the running

time we had for Prim's algorithm for a minimum spanning tree last

time. And, it depends what data

structure you use, what running time you get.

So, I'm going to skip the whole table here.

But, if you use an array, the final running time will be

V^2 because you have order of v extract Min, and you have

constant time decreased key. If you use a binary heap,

which we know and love, then we have order log v for

each operation. And so, this is V plus E log V.

And, so that's what we know how to do.

And, if you use this fancy data structure called a Fibonacci

heap, you get constant time decreased key amortized.

And, you get an E plus v log v worst case bound on the running

time. So, this is the best we know

how to solve shortest paths without any extra assumptions,

single source shortest paths with non-negative edge weights

in general. OK, this is almost as good and

this is sometimes better than that.

But these are essentially irrelevant except that you know

how to do these. You don't know how to do a

Fibonacci heap unless you read that in the chapter of the book.

That's why we mention the top two running times.

OK, I want to talk briefly about a simpler case,

which you may have seen before. And so it's sort of fun to

connect this up to breadth first search in a graph.

So, I mean that ends Dijkstra, so to speak.

But now I want to think about a special case where the graph is

unweighted, meaning w of (u,v) equals one for all vertices,

u and v. OK, suppose we had that

property. Can we do any better than

Dijkstra? Can we do better than this

running time? Well, we probably have to look

at all the edges and all the vertices.

So, the only thing I'm questioning is this log v.

Can I avoid that? I gave away the answer a little

first search, or BFS, which you have probably

seen before. Next to depth first search,

it's one of the standard ways to look at the graph.

But we can say a little bit more than you may have seen

before. Breadth for search is actually

Dijkstra's algorithm: kind of nifty.

There are two changes. First change is that breadth

for search does not use a priority queue.

I'll just tell you what it uses instead.

You can use a queue first in first out honest-to-goodness

queue instead of a priority queue.

OK, it turns out that works. Instead of doing extract Min,

you just take the first thing off the queue.

Instead of doing decreased key, OK, here's a subtlety.

But, this if statement changes a little bit.

So, here is the relaxation step.

So, in order to relax, you say this much simpler

thing. If we haven't visited v yet,

then we declare it to have the shortest path weight,

say, d of v is d of u plus one, which is the weight of the

edge, (u,v). And we add v to the end of the

queue empty. Actually, it will just contain

the vertex, S, because that's the only thing

we know the shortest path for. So, the queue is just for,

I know the shortest path of this thing.

Just deal with it when you can't look at all the outgoing

edges when you can. So, initially that's just S.

You say, well, for all the outgoing edges,

S has zero. All the outgoing edges from

there have weight one. The shortest path weight from

the source is one. You certainly can't do any

better than that if all the weights are one.

OK, so we add all those vertices to the end of the

queue. Then, we process things in

order, and we just keep incrementing,

if their value is d of u, add one to it.

That's d of v. And then we are going to add v

to S what we get to it in the queue.

OK, that is breadth for search, very simple.

And, you can look at the text for the algorithm and for an

example because I don't have time to cover that.

But the key thing is that the time is faster.

The time is order V plus E because as before,

we only look at each edge once we look at all the outgoing

edges from all the vertices. As soon as we set d of v to

something, it will remain that. We never touch it.

We are going to add it to S. That only happens once.

So, this if statement, and so on, in the in-queuing,

is done order E times, or actually E times,

exactly. An in-queuing to a queue,

and de-queuing from a queue, that's what we use instead of

extract Min, take constant time, so the total running time,

number of vertices plus the number of edges.

OK, not so obvious that this works, but you can prove that it

works using the Dijkstra analysis.

All you have to do is prove that the FIFO priority queue.

Once you know that, by the correctness of Dijkstra

you get the correctness of breadth for search.

So, not only is breadth for search finding all the vertices,

which is maybe what you normally use it for,

but it finds the shortest path weights from S to every other

vertex when the weights are all one.

So, there we go: introduction to shortest paths.

Next time we'll deal with negative weights.

The Description of Lec 17 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005