We're going to talk about shortest paths,
and we're going to talk about shortest paths for three
lectures. So, this is a trilogy.
Today will be Shortest Paths One.
I've been watching far too many versions of Star Wars this
weekend. I saw the musical yesterday,
matinee. That was an MIT musical.
That was fun, of all three movies in about
four hours. That was a bit long and then I
saw the one-man show on Friday. One-man Star Wars:
the original three movies in one hour.
That was the opposite of too long.
Both were fun. So I get my trilogy fix.
All episodes, first we're going to start with
The New Hope, and we're going to talk about
the shortest paths problem and solve one particular problem of
it, a very interesting version. And then we're going to look at
increasingly more general versions as we go on.
Shortest paths are sort of an application of dynamic
programming, which we saw last week, and greedy algorithms,
which we also saw last week. So, were going to build that
and get some pretty interesting algorithms for an important
problem, which is how to get from Alderon to,
I don't know, Cambridge as quickly as
possible, OK, when you live in a graph.
So, there's geometric shortest paths which is a little bit
harder. Here, we're just going to look
at shortest paths in graphs. Now, hopefully you all know
what a path in a graph is. But, so, very quick review in
particular because we're going to be looking at weighted
graphs. So, the usual setup:
suppose we have directed graph, G, have some vertices,
some edges. We have edge weights,
make it a little more interesting.
So, this is just a real number on each edge.
So, edge weights are usually given by function,
w. For every edge,
you get a real number.
And then, if we look at the paths in the graph,
so we're going to use some simple notation for paths called
a path, p, starts at some vertex, and it goes to some
other vertex, and so on.
Say the last vertex is v_k, and each of these should be a
directed edge in the digraph. So, this is a directed path.
It has to respect edges in here.
And, we'll say that the weight of such a path is just the sum
of the weights of the edges along the path.
And, we'll call that w(p). This is sum,
i equals one to k minus one of w(v_i, v_(i+1)) plus one.
OK, so just to rub it in, and in particular,
how general this can be, we have some path,
it starts at some vertex, there's some edge weights along
the way. This is some arbitrary path in
the graph, in some hypothetical graph.
OK, this is mainly to point out that some of the edge weights
could be negative. Some of them could be zero.
This sum here is minus two. So, the weight of this path is
minus two. And, presumably,
the graph is much bigger than this.
This is just one path in the graph.
We're usually thinking about simple paths that can't repeat a
vertex. But, sometimes we allow that.
And then, what we care about is the shortest path,
or a shortest path. Again, this may not be unique,
but we'll still usually call it the shortest path.
So, we want the shortest path from some A to some B.
Or, we'll call the vertices u and v.
And we want this to be some path of minimum possible weight,
subject to starting at u, and going to v.
OK, so that's what we're looking for.
In general, give you a vertex, u, give you a vertex,
v, find a shortest path as quickly as possible.
What's a good algorithm for that?
That's the topic for the next three lectures.
We'll usually think about a slightly simpler problem,
which is just computing the weight of that path,
which is essentially computing the distance from A to B.
So, we'll call this the shortest path weight from u to
v. And, we'll denote it by delta
of (u,v), delta . So, I mean, it's the weight of
the shortest path, or a weight of every shortest
path. Or, in other words,
it's the Min over the weight of each path from u to v.
So, p here is a path. OK, so you just consider,
there could be a lot of different paths.
There could, in principle,
be infinitely many, if you're allowed to repeat
vertices. You look at all those paths
hypothetically. You take the minimum weight.
Question? Good.
My next question was going to be, when do shortest paths not
exist? And you've hit upon one
version, which is when you have negative edge weights.
So, in principle, when you have negative edge
weights, some shortest paths may not exist in the sense that
there is no shortest paths. There are no shortest paths.
There is no shortest path from u to v.
OK, in particular, if I have two vertices,
u and v, and I want the shortest path between them,
and I have negative edge weights, well,
this is fine. I mean, I can still compute the
weight of a path that has negative weights.
But when specifically won't I have a single shortest path from
u to v? So, go ahead.
Good. So, if I can find the cycle
somewhere along here whose total weight, say, the sum of all the
weights of these images is negative, then I get there,
I go around as many times as I want.
I keep decreasing the weight because the weight is negative.
I decrease it by some fixed amount, and then I can go to v.
So, as long as there is a negative weights cycle reachable
from u that can also reach v, then there's no shortest path
because if I take any particular path, I can make it shorter by
going around a couple more times.
So, in some sense, this is not really a minimum.
It's more like an infimum for those who like to get fancy
about such things. But we'll just say that delta
of (u,v) is minus infinity in this case.
There's a negative weights cycle from u to v.
So, that's one case we have to worry about in some sense.
But, as long as there are no negative weight cycles,
delta of (u,v) will be something bigger than minus
infinity, bounded below by some finite value even if you could
have negative weights, but still no negative weights
cycle for example, there might not be any cycles
in your graph. So that's still interesting.
And, I guess it's useful to note that you can get from A to
B in negative infinite time. It's time travel,
if the weights happen that correspond to time.
But when else might shortest paths not exist?
So, this is one case, but there's another,
simpler case. It's not connected.
There might not be any path from u to v.
This path might be empty. There may be no path from u to
v. Here we have to define what
happens, and here, we'll say it's infinity if
there's no path from u to v. So, there are these exceptional
cases plus infinity and minus infinity, which are pretty
intuitive because it takes a really long time to get from u
to v if there's no path there. You can't get there from here.
OK, but that's the definition. Most of the time,
this is the case we care about, of course.
Usually this is a finite set. OK, good, so that's the
definition. We're going to get a few basic
structural properties about shortest paths that will allow
us to obtain good algorithms finding these paths when they
exist. And, in particular,
we want to use ideas from dynamic programming.
So, if I want to use dynamic programming to solve shortest
paths, what do I need to establish?
What's the first thing I should check?
You've all implemented dynamic programming by now,
so should make complete sense hopefully, at least more sense
than it did a couple of weeks ago, last week,
when we learned it. Dynamic programming is
something that grows on you. Every year I think I understand
it better than the previous year.
But, in particular, when you learned dynamic
programming in this class, there is this nice key property
that you should check. Yeah?
Optimal substructure: good.
This is the phrase you should keep in mind.
It's not really enough for dynamic programming to be useful
in an efficient way, but it at least tells you that
you should be able to try to apply it.
That's a pretty weak statement, but it's something that you
should check. It's definitely pretty much a
necessary condition for dynamic programming to make sense.
And so, optimal some structure here means that if I take some
shortest path, and I look at a subpath of that
shortest path, I claimed that it too is a
shortest path, OK, with its respective
endpoints; obviously not between the same endpoints.
But if I have some shortest path between two endpoints,
I take any subpath and that's also the shortest path.
This is one version of optimal substructure.
This one turns out to be true for this setup.
And, how should I prove an optimal substructure property?
Cut and paste. Yep, that works here too.
I mean, this isn't always true. But it's a good technique here.
So, we're going to think about, and I'll do essentially a proof
by picture here. So, suppose you have some
subpath of some shortest path. So, let's say the subpath is x
to y. And, the path goes from u to v.
So, we assume that (u,v) is a shortest path.
We want to prove that (x,y) is a shortest path.
Well, suppose (x,y) isn't a shortest path.
Then there is some shorter path that goes from x to y.
But, if you have some shorter path from x to y than this one.
Then I should just erase this part of the shortest path from u
to v, and replace it with this shorter one.
So, this is some hypothetical shorter path.
So, suppose this existed. If that existed,
then I should just cut the old path from x to y,
and paste in this new one from x to y.
It's strictly shorter. Therefore, I get a strictly
shorter path from u to v. But I assumed u to v was a
shortest path: contradiction.
OK, so there is no shorter path.
And that proves the lemma that we have this:
subpaths of shortest paths are shortest paths.
OK, this should now be a pretty familiar proof technique.
But, there is yet another instance of cut and paste.
OK, so that's a good sign for computing shortest paths.
I mean, in terms of dynamic programming, we won't look
directly at dynamic programming here because we are going to aim
for greedy, which is even stronger.
But, next Monday we'll see some dynamic programming approaches.
Intuitively, there are some pretty natural
sub-problems here. I mean, going from u to v,
if I want to find what is the shortest path from u to v,
well, that's a particular problem.
Maybe it involves computing shortest paths from u to some
intermediate point, x, and then from x to u,
something like that. That feels good.
That's like, quadratically,
many subproblems. And, V^2 subproblems,
it sounds like that would lead to a dynamic program.
You can make it work out; it's just a little bit trickier
than that. We'll see that next Monday.
But thinking about this intermediate point we get
something called the triangle inequality.
So, you've probably heard some form of the triangle inequality
before. It holds in all sorts of
geometric spaces, but it also holds for shortest
paths, which is slightly less obvious, or more obvious,
I guess, depending on your inclination.
So, if you have any triple of vertices, the shortest path from
u to v is, at most, the shortest path from u to x
plus the shortest path from x to v.
Of course, here I need a shortest path weight from u to
x, and shortest path weight from x to v.
So, this should be pretty natural just from the statement,
even more natural if you draw the picture.
So, we have some vertex, u.
I'm using wiggly lines to denote potentially long paths as
opposed to edges. We have some intermediate
point, x, and we have some target, v, and we are
considering these three shortest paths.
This is the shortest path from u to v, or this is its weights.
This is the shortest path from u to x.
And here's its weight, and the shortest path from x to
v. And here's its weight.
And, the point is, this should be the shortest
path or a shortest path from u to v.
And, in particular, one such path is you go from u
to x, and then you go from x to v.
So, I mean, this sum is just measuring the length of this
particular path. Take the shortest path here;
take the shortest path here. And, this is supposed to be the
Min over all paths. So, certainly this is,
at most, this particular path, the sum of these two values,
OK, another proof by picture. Clear?
OK, this stuff is easy. I assume we'll get into some
more set exciting algorithms in particular, which is always more
exciting. Today, we're going to look at a
particular version of shortest paths called,
or the shortest paths problem called the single source
shortest path problem. OK, it's a little bit more
general than go from A to B. The problem is,
you're given a source vertex, and you want to know how to get
from that source vertex to everywhere else.
So, we'll call this source vertex s.
And from that source, we want to find,
let's say, the shortest path weights from s to everyone.
In particular, we'd also like to know the
shortest paths, but that isn't too much harder.
So, that's delta of s, v for all vertices,
v. OK, so this is actually a
little bit harder than the problem we started with a
getting from Alderon to Cambridge.
Now, we want to get from Alderon to the entire universe.
OK, it turns out, this is one of the weird things
about shortest paths, according to the
state-of-the-art we know today, it seems like the following
statement will remain true for all time.
But we don't know. The best algorithms for solving
the A to B problem, given s, given t,
go from s to t, is no easier than this problem.
It's the best ways we know how to solve going from A to B is to
solve how to go from A to everywhere else.
So, we sort of can't help ourselves, but to solve this
problem it turns out. Today, we're going to look at a
further restriction on this problem because this is a bit
tricky. Will solve it next class.
But, today we're going to get rid of the negative weight cycle
issue by forbidding negative weights.
So, we're going to assume that all of the edge weights are
nonnegative, so, for all vertices,
u and v. So, in particular,
shortest paths exist, provided paths exist.
And, we don't have to worry about these minus infinities.
Delta of (u,v) is always bigger than minus infinity.
It still might be plus infinity if there is no path,
but this will make life a lot easier.
And the algorithm we'll cover today really requires this
property. You can't get away without it.
Next class, we'll get away without it with a fancier and
slower algorithm. So, as I hinted at,
the main idea we're going to use for the algorithm today is
greedy, which should be faster than dynamic programming
generally. And, the tricky part will be
proving that the greedy algorithm actually works.
So, I think there's pretty much only one natural way to go
about, well, there's one way that works to go about greedy,
let's say. This may be not the obvious
one. So, let me give you a little
bit of setup. The invariant we are going to
maintain is that at all times, we have estimates on the
distances from the source to every vertex.
When I say distance, I mean shortest path weight.
I'm going to use weight and distance interchangeably here
for more intuition. And, in particular,
I want to maintain the set of vertices where those estimates
are actually the right answer.
OK, this is little s. This is big S.
So, the big S will be the set of all vertices where I know the
answer. What is the shortest path
distance from little S to that vertex in big S?
So, for starters, which distance do I know?
Sorry? s.
I know the shortest path distance from s to s because if
I assume that all of my weights are nonnegative,
I really can't get from s to s any faster than not doing
anything. OK, if I had a negative weight
cycle, maybe the distance from s to s is minus infinity.
OK, but I can't have negative weights so there's no way I can
get from s to s any faster than zero time.
There might be a longer path that still has zero cost,
but it can't be any better than zero.
So, in particular, I know that.
So, initially, S is certainly an s.
OK, and the idea is we're going to accumulate more and more
vertices that we know. So, at some point we know the
distances from some of the vertices.
So, we have some cloud here. This is S, and this is
everything else. This is the graph,
G. This is the subset of the
vertices. And, there's some edges that go
out from there. And, so we have estimates on
how to get to these vertices. Some of them,
we may not have even seen yet. They may not be connected to
this portion of S. I mean: not directly.
They might be connected by some longer path.
They might be in a completely different connected component.
We don't know yet. Some of them,
we have estimates for because we've sort of seen how to get
there from S. And the idea is,
among all of these nodes where we have estimates,
and on to get from little S, which is some vertex in here,
to these vertices, we're going to take the one for
which the estimate is smallest. That's the greedy choice.
And, we're just going to add that vertex to S.
So, S grows one vertex per step.
Each step, we're going to add to S, the vertex.
Of course, again, this is not a unique,
it's a vertex, v, in V minus S.
So, it's something we haven't yet computed yet whose estimated
distance from S is minimum. So, we look at all the vertices
we haven't yet added to S. Just take the one where we have
the estimated smallest distance. The intuition is that that
should be a good choice. So, if I pick the one that's
closest to little s among all the ones that I've seen,
among all the paths that I've seen, I sort of have to buy into
that those are good paths. But, I mean,
maybe there's some path I didn't see.
Maybe you go out to here and then you take some other path to
some vertex, which we've already seen.
OK, the worry is, well, I'd better not say that
that's the shortest path because there may have been some other
way to get there. Right, as soon as I add
something to S, I declare I've solved the
problem for that vertex. I can't change my answer later.
OK, the estimates can change until they get added to S.
So, I don't want to add this vertex to S because I haven't
considered this path. Well, if all my weights are
nonnegative, and I take the vertex here that has the
shortest estimate from S, so let's suppose this one is
the shortest one, then this can't be a shorter
path because the distance estimate, at least,
from S to that vertex is larger from S to that vertex.
So, no way can I make the path longer and decrease the
distance. That's the intuition.
OK, it's a little bit fuzzy here because I don't have any
induction hypotheses set up, and it's going to be a lot more
work to prove that. But that's the intuition why
this is the right thing to do. OK, you have to prove something
about the distance estimates for that to be a proof.
But, intuitively, it feels good.
It was a good starting point. OK, and then presumably we have
to maintain these distance estimates.
So, the heart of the algorithm is updating distance estimates,
I mean, choosing the best vertex to add to S,
that's one step. Then, updating the distance
estimates is sort of where the work is.
And, it turns out we'll only need to update distance
estimates of some of the vertices, the ones that are
adjacent to v. v was the vertex we just added
to S. So, once we add somebody to S,
so we grow S by a little bit, then we look at all the new
edges that go out of S from that vertex.
We update something. That's the idea.
So, that's the idea for how we're going to use greedy.
Now I'll give you the algorithm.
So, this is called Dijkstra's algorithm.
Dijkstra is a famous, recently late,
if that makes sense, computer scientist from the
Netherlands. And, this is probably the
algorithm he is most famous for. So, the beginning of the
algorithm is just some initialization,
not too exciting. OK, but let me tell you what
some of the variables mean. OK, so d is some array indexed
by vertices, and the idea is that d of x is the distance
estimate for x, so, from S to x.
so in particular, it's going to equal the real
shortest path weight from S to x when we've added x to our set
capital, S. OK, so this is,
in particular, going to be the output to the
algorithm. Did you have a question?
Or were you just stretching? Good.
So, in d of x, when we are done,
d of x is the output. For every vertex,
it's going to give us the shortest path weight from S to
that vertex. Along the way,
it's going to be some estimated distance from S to that vertex.
And, we're going to improve it over time.
This is an infinity. So initially,
we know that the distance, we know the distance from S to
S is zero. So, we're going to set that to
be our estimate. It's going to be accurate.
Everything else we're going to just set to infinity because we
may not be connected. From the beginning,
we don't know much. S, initially,
is going to be infinity. Immediately,
we're going to add little s to big S.
And then, the interesting part here is Q, which is going to
consist of, initially all the vertices in the graph.
And, it's going to not just be a queue as the letter suggests.
It's going to be a priority queue.
So, it's going to maintain, in particular,
the vertex that has the smallest distance estimate.
So, this is a priority queue. This is really an abuse of
notation for a data structure. OK, so this could be a heap or
whatever. The vertices are keyed on d,
our distance estimate. So, in particular,
S will have the, this is going to be a Min heap.
S will be the guy who has the minimum.
Everyone else has the same key initially.
And, we're going to repeatedly extract the minimum element from
this queue and do other things. OK, so this is initialization.
OK, I'm going to call that initialization.
It's a pretty simple thing. It just takes linear time,
nothing fancy going on. The heart of the algorithm is
all in six lines. And, so this is not really a
step. The first step here that we
need to do is we take the vertex whose distance estimate is
minimum. So that, among all the
vertices, not yet, and that's currently S is
empty. Q has everyone.
In general, Q will have everyone except S.
So, we'll take the vertex, u, that has the minimum key in
that priority queue. So, extract the Min from Q.
OK. We're going to add a little u
to S, claim that that is now, I mean, that's exactly what
we're saying here. We add to S that vertex that
has minimum distance estimate. And now, we need to update the
distances. So, we're going to look at each
adjacent vertex for each v in the adjacency list for u.
We look at a few distances.
So that's the algorithm or more or less.
This is the key. I should define it a little bit
what's going on here. We talked mainly about
undirected graph last time. Here, we're thinking about
undirected graphs. And, the adjacency list for u
here is just going to mean, give me all the vertices for
which there is an edge from u to v.
So, this is the outgoing adjacency list,
not the incoming adjacency list.
Undirected graphs: you list everything.
Directed graphs: here, we're only going to care
about those ones. So, for every edge,
(u,v), is what this is saying, we are going to compare the
current estimate for v, and this candidate estimate,
which intuitively means you go from s to u.
That's d of u because we now know that that's the right
answer. This, in fact,
equals, we hope, assuming the algorithm is
correct, this should be the shortest path weight from s to u
because we just added u to S. And whenever we add something
to S, it should have the right value.
So, we could say, well, you take the shortest
path from S to u, and then you follow this edge
from u to v. That has weight,
w, of (u,v). That's one possible path from S
to v. And, if that's a shorter path
than the one we currently have in our estimate,
if this is smaller than that, then we should update the
estimate to be that sum because that's a better path,
so, add it to our database of paths, so to speak:
OK, very intuitive operation; clearly should not do anything
bad. I mean, these should be paths
that makes sense. We'll prove that in a moment.
That's the first part of correctness, that this never
screws up. And then, the tricky part is to
show that it finds all the paths that we care about.
This step is called a relaxation step.
Relaxation is always a difficult technique to teach to
MIT students. It doesn't come very naturally.
But it's very simple operation. It comes from optimization
terminology, programming terminology, so to speak.
And, does this inequality look familiar at all especially when
you start writing it this way? You say, the shortest path from
S to v and the shortest path from S to u in some edge from u
to v, does that look like anything we've seen?
In fact, it was on this board but I just erased it.
Triangle inequality, yeah.
So, this is trying to make the triangle inequality true.
Certainly, the shortest path from S to v should be less than
or equal to, not greater than. The shortest path from S to u,
plus whatever path from u to v, the shortest path should be,
at most, that. So, this is sort of a somewhat
more general triangle inequality.
And, we want to, certainly it should be true.
So, if it's not true, we fix it.
If it's greater than, we make it equal.
But we don't want to make it less than because that's not
always true. OK, but certainly,
it should be less than or equal to.
So, this is fixing the triangle inequality.
It's trying to make that constraint more true.
In optimization, that's called relaxing the
constraint. OK, so we're sort of relaxing
the triangle inequality here. In the end, we should have all
the shortest paths. That's a claim.
So: a very simple algorithm. Let's try it out on a graph,
and that should make it more intuitive why it's working,
and that the rest of the lecture will be proving that it
works. Yeah, this is enough room.
So, oh, I should mention one other thing here.
Sorry. Whenever we change d of v,
this is changing the key of v in the priority queue.
So, implicitly what's happening here in this assignment,
this is getting a bit messy, is a decreased key operation,
OK, which we talked briefly about last class in the context
of minimum spanning trees where we were also decreasing the key.
The point is we were changing the key of one element industry
like station step in the priority queue so that if it now
becomes the minimum, we should extract here.
And, we are only ever decreasing keys because we are
always replacing larger values with smaller values.
So, we'll come back to that later when we analyze the
running time. But, there is some data
structure work going on here. Again, we are abusing notation
a bit. OK, so here is a graph with
edge weights.
OK, and I want my priority queue over here.
And, I'm also going to draw my estimates.
OK, now I don't want to cheat. So, we're going to run the
algorithm on this graph. s will be A,
and I want to know the shortest path from A to everyone else.
So, you can check, OK, paths exist.
So, hopefully everything should end up a finite value by the
end. All the weights are
nonnegative, so this algorithm should work.
The algorithm doesn't even need connectivity,
but it does mean that all the weights are nonnegative.
So, we run the algorithm. For the initialization,
we set the distance estimate for our source to be zero
because, in fact, there's only one path from A to
A, and that to do nothing, the empty path.
So, I'm going to put the key of zero over here.
And, for everyone else, we're just going to put
infinity because we don't know any better at this point.
So, I'll put keys of infinity for everyone else.
OK, so now you can see what the algorithm does is extract the
minimum from the queue. And, given our setup,
we'll definitely choose s, or in this case,
A. So, it has a weight of zero.
Everyone else has quite a bit larger weight.
OK, so we look at s, or I'll use A here.
So, we look at A. We add A to our set,
S. So, it's now removed from the
queue. It will never go back in
because we never add anything to the queue, start with all the
vertices, and extract, and decrease keys.
But we never insert. So, A is gone.
OK, and now I want to update the keys of all of the other
vertices. And the claim is I only need to
look at the vertices that have edges from A.
So, there's an edge from A to B, and that has weight ten.
And so, I compare: well, is it a good idea to go
from A to A, which costs nothing, and then to go along
this edge, AB, which costs ten?
Well, it seems like a pretty good idea because that has a
total weight of zero plus ten, which is ten,
which is much smaller than infinity.
So, I'm going to erase this infinity; write ten,
and over in the queue as well. That's the decreased key
operation. So now, I know a path from A to
B. Good.
A to C is the only other edge. Zero plus three is less than
infinity, so, cool.
I'll put three here for C, and C is there.
OK, the other vertices I don't touch.
I'm going to rewrite them here, but the algorithm doesn't have
to copy them. Those keys were already there.
It's just touching these two. OK, that was pretty boring.
Now we look at our queue, and we extract the minimum
element. So, A is no longer in there,
so the minimum key here is three.
So, the claim is that this is a shortest path;
from A to C, here is the shortest path from
A to C. There's no other shorter way.
You could check that, and we'll prove it in a moment.
Cool, so we'll remove C from the list.
It's gone. Then we look at all of the
outgoing edges from C. So, there's one that goes up to
B, which has weight four, four plus three,
which is the shortest path weight from A to C.
So, going from A to C, and C to B should cost three
plus four, which is seven, which is less than ten.
So, we found an even better path to get to B.
It's better to go like this than it is to go like that.
So, we write seven for B, and there's an outgoing edge
from C to d which costs eight. Three plus eight is 11.
11 is less than infinity last time I checked.
So, we write 11 for d. Then we look at E.
We have three plus two is five, which is less than infinity.
So, we write five for the new key for E.
At this point, we have finite shortest paths
to everywhere, but they may not be the best
ones. So, we have to keep looking.
OK, next round of the algorithm, we extract the
minimum key among all these. OK, it's not B,
which we've seen though probably know the answer to.
But it's E. E has the smallest key.
So, we now declare this to be a shortest path.
The way we got to E was along this path: A to C,
C to E, declare that to be shortest.
We claim we're done with E. But we still have to update.
What about all the outgoing edges from E?
There's only one here. It costs five plus nine,
which is 14, which is bigger than 11.
So, no go. That's not an interesting path.
Our previous path, which went like this at a cost
of the 11, is better than the one we are considering now.
I'm drawing the whole path, but the algorithm is only
adding these two numbers. OK, good.
So, I don't change anything. Seven, 11, and five is removed,
or E is removed. Our new keys are seven and 11.
So, we take the key, seven, here,
which is for element B, vertex B.
We declare the path we currently have in our hands from
A to B, which happens to be this one.
Algorithm can't actually tell this, by the way,
but we're drawing it anyway. This path, A,
C, B, is the candidate shortest path.
The claim is it is indeed shortest.
Now, we look at all the outgoing edges.
There's one that goes back to C at a cost of seven plus one,
which is eight, which is bigger than three,
which is good. We already declared C to be
done. But the algorithm checks this
path and says, oh, that's no better.
And then we look at this other edge from B to d.
That costs seven plus two, which is nine,
which is better than 11. So, we, in fact,
found an even shorter path. So, the shortest path weight,
now, for d, is nine because there is this path that goes A,
C, B, d for a total cost of three plus four plus two is
nine. Cool, now there's only one
element in the queue. We remove it.
d: we look at the outgoing edges.
There's one going here which costs nine plus seven,
which is 16, which is way bigger than five.
So, we're done. Don't do anything.
At this point, the queue is empty.
And the claim is that all these numbers that are written here,
the final values are the shortest path weights.
This looks an awful lot like a five, but it's an s.
It has a weight of zero. I've also drawn in here all the
shortest paths. And, this is not hard to do.
We're not going to talk about it too much in this class,
but it's mentioned in a little bit more detail at the end of
the textbook. And it's something called the
shortest path tree. It's just something good to
know about if you actually want to compute shortest paths.
In this class, we mainly worry about the
weights because it's pretty much the same problem.
The shortest path tree is the union of all shortest paths.
And in particular, if you look at each vertex in
your graph, if you consider the last edge into that vertex that
was relaxed among all vertices, u, you look at the edges,
(u,v), say, was that last one to relax?
So, just look at the last edges we relaxed here.
You put them all together: that's called a shortest path
tree. And, it has the property that
from S to everywhere else, there is a unique path down the
tree. And it's the shortest path.
It's the shortest path that we found.
OK, so you actually get shortest paths out of this
algorithm even though it's not explicitly described.
All we are mainly talking about are the shortest path weights.
Algorithm clear at this point? Feels like it's doing the right
thing? You can check all those numbers
are the best paths. And now we're going to prove
that.
So: correctness.
So the first thing I want to prove is that relaxation never
makes a mistake. If it ever sets d of v to be
something, I want to prove that d of v is always an upper bound
on delta. So, we have this variant.
It's greater than or equal to delta of s, v for all v.
And, this invariant holds at all times.
So, after initialization, it doesn't hold before
initialization because d isn't defined then.
But if you do this initialization where you set S
to zero, and everyone else to infinity, and you take any
sequence of relaxation steps, then this variant will hold
after each relaxation step you apply.
This is actually a very general lemma.
It's also pretty easy to prove. It holds not only for
Dijkstra's algorithm, but for a lot of other
algorithms we'll see. Pretty much every algorithm we
see will involve relaxation. And, this is saying no matter
what relaxations you do, you always have a reasonable
estimate in the sense that it's greater than or equal to the
true shortest path weight. So, it should be converging
from above. So, that's the lemma.
Let's prove it. Any suggestions on how we
should prove this lemma? What technique might we use?
What's that? Cut and paste?
It would be good for optimal substructure.
Cut and paste: maybe sort of what's going on
here but not exactly. Something a little more
general. It's just intuition here;
it doesn't have to be the right answer.
In fact, many answers are correct, have plausible proofs.
Induction, yeah. So, I'm not going to write
induction here, but effectively we are using
induction. That's the answer I was
expecting. So, there is sort of an
induction already in time going on here.
We say after initialization it should be true.
That's our base case. And then, every relaxation we
do, it should still be true. So, we're going to assume by
induction that all the previous relaxations worked,
and then we're going to prove that the last relaxation,
whatever it is, works.
So, first let's do the base case.
So, this is after an initialization,
let's say, initially. So, initially we have d of s
equal to zero. And we have d of v equal to
infinity for all other vertices, for all vertices,
v, not equal to little s. OK, now we have to check that
this inequality holds. Well, we have delta of s,
s. We've already argued that
that's zero. You can't get negative when
there are only nonnegative edge weights.
So, that's the best. So, certainly zero is greater
than or equal to zero. And, we have everything else,
well, I mean, delta of S, v is certainly less
than or equal to infinity. So this holds.
Everything is less than or equal to infinity.
So: base case is done. So, now we do an induction.
And, I'm going to write it as a proof by contradiction.
So, let's say, suppose that this fails to hold
at some point. So, suppose for contradiction
that the invariant is violated. So, we'd like to sue the
violator and find a contradiction.
So, it's going to be violated. So, let's look at the first
violation, the first time it's violated.
So, this is, essentially,
again, a proof by induction. So, let's say we have some
violation, d of v is less than delta of s, v.
That would be bad if we somehow got an estimate smaller than the
shortest path. Well, then I think about
looking at the first violation is we know sort of by induction
that all other values are correct.
OK, d of v is the first one where we've screwed up.
So, the invariant holds everywhere else.
Well, what caused this to fail, this invariant to be violated,
is some relaxation, OK, on d of v.
So, we had some d of v, and we replaced it with some
other d of u plus the weight of the edge from u to v.
And somehow, this made it invalid.
So, d of v is somehow less than that.
We just set d of v to this. So, this must be less than
delta of s, v. The claim is that that's not
possible because, let me rewrite a little bit.
We have d of u plus w of (u,v). And, we have our induction
hypothesis, which holds on u, u of some other vertex.
We know that d of u is at least delta of s, u.
So, this has to be at least delta of s, u plus w of u,
v. Now, what about this w of u,
v? Well, that's some path from u
to v. So, it's got to be bigger than
the shortest path or equal. So certainly,
this is greater than or equal to delta of u,
v. OK, it could be larger if
there's some multi-edged path that has a smaller total weight,
but it's certainly no smaller than delta of u,
v. And, this looks like a good
summation, delta of S to u, and u to v is a triangle
inequality, yeah. So, that is,
it's upside down here. But, the triangle S,
u, u to v, so this is only longer than S to v.
OK, so we have this thing, which is simultaneously greater
than or equal to the shortest path weight from S to v,
and also strictly less than the shortest path weight from S to
v. So, that's a contradiction.
Maybe contradiction is the most intuitive way isn't the most
intuitive way to proceed. The intuition,
here, is whatever you assign d of v, you have a path in mind.
You inductively had a path from s to u.
Then you added this edge. So, that was a real path.
We always know that every path has weight greater than or equal
to the shortest path. So, it should be true,
and here's the inductive proof. All right, moving right along,
so this was an easy warm-up. We have greater than or equal
to. Now we have to prove less than
or equal to at the end of the algorithm.
This is true all the time; less than or equal to will only
be true at the end. So, we are not going to prove
less than or equal to quite yet. We're going to prove another
lemma, which again, so both of these lemmas are
useful for other algorithms, too.
So, we're sort of building some shortest path theory that we can
apply later. This one will give you some
intuition about why relaxation, not only is it not bad,
it's actually good. Not only does it not screw up
anything, but it also makes progress in the following sense.
So, suppose you knew the shortest path from s to some
vertex. OK, so you go from s to some
other vertices. Then you go to u.
Then you go to v. Suppose that is a shortest path
from s to v. OK, and also suppose that we
already know in d of u the shortest path weight from s to
u. So, suppose we have this
equality. We now know that we always have
a greater than or equal to. Suppose they are equal for u,
OK, the vertex just before v in the shortest path.
OK, and suppose we relax that edge, (u,v), OK,
which is exactly this step. This is relaxing the edge,
(u,v). But we'll just call it
relaxation here. After that relaxation,
d of v equals delta of (s,v). So, if we had the correct
answer for u, and we relax (u,v),
then we get the correct answer for v.
OK, this is good news. It means, if inductively we can
somehow get the right answer for u, now we know how to get the
right answer for v. In the algorithm,
we don't actually know what the vertex just before v in the
shortest path is, but in the analysis we can
pretty much know that. So, we have to prove this
lemma. This is actually even easier
than the previous one: don't even need induction
because you just work through what's going on in relaxation,
and it's true. So, here we go.
So, we're interested in this value, delta of Ss v.
And we know what the shortest path is.
So, the shortest path weight is the weight of this path.
OK, so we can write down some equality here.
Well, I'm going to split out the first part of the path and
the last part of the path. So, we have,
I'll say, the weight from s, so, this part of the path from
s to u, plus the weight of this edge, u, v.
Remember, we could write w of a path, and that was the total
weight of all those edges. So, what is this,
the weight of this path from S to u?
Or, what property should I use to figure out what that value
is? Yeah?
s to v is the shortest path, right?
So, by optimal substructure, from s to u is also a shortest
path. So, this is delta of s,
u. Cool.
We'll hold on for now. That's all we're going to say.
On the other hand, we know from this lemma that
matter what we do, d of v is greater than or equal
to delta of s, v.
So, let's write that down. So, there's a few cases,
and this will eliminate some of the cases.
By that lemma correctness one, we know that d of v is greater
than or equal to delta of s, v.
So, it's either equal or greater than at all times.
So, I'm thinking about the time before we do the relaxation,
this (u,v). So, at that point,
this is certainly true. So, either they're equal before
relaxation or it's greater.
OK, if they are equal before relaxation, we're happy because
relaxation only decreases values by correctness one.
It can't get any smaller than this, so after relaxation it
will also be equal. OK, so in this case we're done.
So, that's a trivial case. So let's now suppose that d of
v is greater than delta of s, v before relaxation.
That's perfectly valid. Hopefully now we fix it.
OK, well the point is, we know this delta s,
v. It is this sum.
OK, we also know this. So, delta of s,
u we know is d of u. And, we have this w u,
v. So, delta of s,
v is d of u plus w of (u,v) because we are assuming we have
this shortest path structure where you go from s to u,
and then you follow the edge, (u,v).
So, we know this. So, we know d of v is greater
than d of u plus w of (u,v). By golly, that's this condition
in relaxation. So, we're just checking,
relaxation actually does something here.
OK, if you had the wrong distance estimate,
this if condition is satisfied. Therefore, we do this.
So, in this case, we relax.
So, I'm just relaxing. Then, we set d of v to d of u
plus WUV, which is what we want. OK, so we set d of v to d of u
plus w of (u,v). And, this equals,
as we said here, delta of S, v,
which is what we wanted to prove.
Done. OK, I'm getting more and more
excited as we get into the meat of this proof.
Any questions so far? Good.
Now comes the hard part. These are both very easy
lemmas, right? I'll use these two boards.
We don't need these proofs anymore.
We just need these statements: correctness one,
correctness lemma; great names.
So, now finally we get to correctness two.
So, we had one and one and a half.
So, I guess correctness is, itself, a mini-trilogy,
the mini-series. OK, so correctness two says
when the algorithm is done, we have the right answer.
This is really correctness. But, it's going to build on
correctness one and correctness lemma.
So, we want d of v to equal delta of s, v for all vertices,
v at the end of the algorithm. That is clearly our goal.
Now, this theorem is assuming that all of the weights are
nonnegative, just to repeat. It doesn't assume anything
else. So, it's going to get the
infinities right. But, if there are minus
infinities, all bets are off. OK, even if there's any
negative weight edge anywhere, it's not going to do the right
thing necessarily. But, assuming all the weights
are nonnegative, which is reasonable if they're
measuring time. Usually it costs money to
travel along edges. They don't pay you to do it.
But who knows? So, I need just to say a few
things. One of the things we've
mentioned somewhere along the way is when you add a vertex to
S, you never change its weight. OK, that actually requires
proof. I'm just going to state it
here. It's not hard to see.
d of v doesn't change. OK, this is essentially an
induction once v is added to S. OK, this will actually followed
by something we'll say in a moment.
OK, so all I really care about is when a vertex is added to S,
we better have the right estimate because after that,
we're not going to change it, let's say.
OK, we could define the algorithm that way.
We are not, but we could. I'll say more about this in a
second. So, all we care about is
whether d of v equals delta of s, v.
That's what we want to prove. So, it's clearly that.
It should be true at the end. But, it suffices to prove that
it holds when v is added to S, to capital S.
OK, this actually implies the first statement.
It has sort of a funny implication.
But, if we can prove this, that d of v equals delta of s,
v, when you add to S, we know relaxation only
decreases value. So, it can't get any smaller.
It would be from correctness one.
Correctness one says we can't get any smaller than delta.
So, if we get a quality at that point, we'll have a quality from
then on. So, that actually implies d of
v never changes after that point.
OK, so we're going to prove this.
Good. Well, suppose it isn't true.
So this would be a proof by a contradiction.
Suppose for contradiction that this fails to hold.
And, let's look at the first failure.
Suppose u is the first vertex --
-- that's about to be added to S.
I want to consider the time right before it's added to S,
for which we don't have what we want.
These are not equal. d of u does not equal delta of
s, u. Well, if they're not equal,
we know from correctness one that d of E is strictly greater
than delta of s, u, so, d of u.
So, we have d of u is strictly greater than delta of s,
u. OK, that's the beginning of the
proof, nothing too exciting yet, just some warm-up.
OK, but this, used already correctness one.
I think that's the only time that we use it in this proof.
OK, so I sort of just want to draw picture of what's going on.
But I need a little bit of description.
So, let's look at the shortest path.
Somehow, d of u is greater than the shortest path.
So, consider the shortest path or a shortest path.
Let p be a shortest path, not just any shortest path,
but the shortest path from s to u.
OK, so that means that the weight of this path is the
shortest path weight. So, we have some equations for
what's going on here. So, we care about delta of s,
u. Here's a path with that weight.
It's got to be one because shortest paths exist here;
slight exceptional cases if it's a plus infinity,
but I'm not going to worry about that.
So, let me draw a picture somewhere.
So, we have s. We have u.
Here is the shortest path from s to u.
That's p. No idea what it looks like so
far. Now, what we also have is the
notion of capital S. So, I'm going to draw capital
S. So, this is big S.
We know that little s is in big S.
We know that u is not yet in big S.
So, I haven't screwed up anything yet,
right? This path starts in S and
leaves it at some point because until we are about to add u to
S, so it hasn't happened yet, so u is not in S.
Fine. What I want to do is look at
the first place here where the path, p, exits S.
So, there is some vertex here. Let's call it x.
There's some vertex here. We'll call it y.
OK, possibly x equals S. Possibly y equals u.
But it's got to exit somewhere, because it starts inside and
ends up outside. And it's a finite path.
OK, so consider the first time it happens; not the second time,
the first. OK, so consider the first edge,
(x,y), where p exits capital S. The shortest path from s to u
exits capital S. It's got to happen somewhere.
Cool, now, what do we know? Little x is in S.
So, it has the right answer because u, we were about to add
u to S, and that was the first violation of something in S that
has the wrong d of x estimate. So, d of x equals delta of s,
x. Because we are looking at the
first violation, x is something that got added
before. So, by induction on time,
or because we had the first violation, d of x equals the
shortest path weight from S to x.
So, that's good news. Now we are trying to apply this
lemma. It's the only thing left to do.
We haven't used this lemma for anything.
So, we have the setup. If we already know that one of
the d values is the right answer, and we relaxed the edge
that goes out from it, then we get another right
answer. So that's what I want to argue
over here. We know that the d of x equals
this weight because, again, subpaths of shortest
paths are shortest paths. We have optimal substructure,
so this is a shortest path, from S to x.
It might not be the only one, but it is one.
So we know that matches. Now, I want to think about
relaxing this edge, (x,y).
Well, x is in capital S. And, the algorithm says,
whenever you add a vertex, u, to the big set,
S, you relax all the edges that go out from there.
OK, so when we added x to S, and we now look far in the
future, we're about to add some other vertex.
Right after we added x to S, we relax this edge,
(x,y), because we relaxed every edge that goes out from x,
OK, whatever they were. Some of them went into S.
Some of them went out. Here's one of them.
So, when we added x to S, we got XS.
When we added x to S, we relaxed the edge,
(x,y). OK, so now we're going to use
the lemma. So, by the correctness lemma --
What do you get? Well, we add this correct
shortest path weight to x now. We relax the edge,
(x,y). So, now we should have the
correct shortest path weight for y.
d of y equals delta of s, y.
OK, this is sometime in the past.
In particular, now, it should still be true
because once you get down to the right answer you never change
it. OK, we should be done.
OK, why are we done? Well, what else do we know
here? We assumed something for
contradiction, so we better contradict that.
We assume somehow, d of u is strictly greater than
delta of s, u. So, d of u here is strictly
greater than the length of this whole path.
Well, we don't really know whether u equals y.
It could, could not. And, but what do we know about
this shortest path from S to y? Well, it could only be shorter
than from S to u because it's a subpath.
And it's the shortest path because it's the subpath of the
shortest path. The shortest path from S to y
has to be less than or equal to the shortest path from S to u.
OK, S to y: less than or equal to s, u, OK, just because the
subpath. I'm closer.
I've got delta of s, u now.
Somehow, I want to involve d of u.
So, I want to relate d of y to d of u.
What do I know about d of u? Yeah?
d of u is smaller because we have a Min heap,
yeah. We always chose,
let's erase, it's way down here.
We chose u. This is the middle of the
algorithm. It's the reason I kept this to
be the minimum key. This is keyed on d.
So, we know that at this moment, when we're trying to add
u to S, right, y is not in S,
and u is not in S. They might actually be the same
vertex. But both of these vertices,
same or not, are outside S.
We chose u because d of u has the smallest d estimate.
So, d of y has to be greater than or equal to d of u.
It might be equal if they're the same vertex,
but it's got to be greater than or equal to.
So, d of y here is greater than or equal to d of u.
So, here we're using the fact that we actually made a greedy
choice. It's the one place we're using
the greedy choice. Better use it somewhere because
you can't just take an arbitrary vertex and declare it to be
done. You've got to take the greedy
one. OK, now we have d of u is less
than or equal to delta of s, u, which contradicts this.
OK, sort of magical that that all just worked out.
But sort of like the previous proofs, you just see what
happens and it works. OK, that's the approximation.
The only real idea here is to look at this edge.
In fact, you could look at this edge too.
But let's look at some edge that comes from S and goes out
of S, and argue that while x has to be correct,
and what we made x correct, y had to be correct,
and now, why the hell are we looking at u?
y is the thing you should have looked at.
And, there you get a contradiction because y had the
right answer. If u equals y,
that's fine, or if u and y were sort of
equally good, that's also fine if all these
weights were zero. So, the picture might actually
look like this. But, in that case,
d of u is the correct answer. It was delta SU.
We assumed that it wasn't. That's where we're getting a
contradiction. Pretty clear?
Go over this proof. It's a bit complicated,
naturally. OK, we have a little bit more
to cover, some easier stuff. OK, the first thing is what's
the running time of this algorithm?
I'll do this very quick because we're actually seen this many
times before last class. There was some initialization.
The initialization, which is no longer here,
is linear time. No big deal.
OK, extract Min. Well, that's some data
structure. So, we have something like size
of V. Every vertex we extract the Min
once, and that's it. So, size of V,
extract mins. OK, so that's pretty simple.
OK, then we had this main loop. This is a completely conceptual
operation. S is not actually used in the
algorithm. It's just for thinking.
OK, so this takes zero time. Got to love it.
OK, and now the heart is here. So, how many times does this
loop iterate? That's the degree of u.
So, what is the total number of times that we execute a
relaxation step? It doesn't necessarily mean we
do this, but we at least execute this body.
Over the whole algorithm, how many times do we do this?
Every vertex, we look at all the outgoing
edges from there. So, the total would be?
Number of edges, yeah.
So, this number of edges iterations.
OK, this is essentially the handshaking lemma we saw last
time, but for directed graphs. And we are only looking at the
outgoing edges. So, it's not a factor of two
here because you're only outgoing from one side.
So, we have number of reiterations.
In the worst case, we do a decreased key for
everyone. So, at most:
E decreased keys. OK, so the time is,
well, we have v extract Mins, so the time to do an extract
Min, whatever that is. And we have E decreased keys,
whatever that is, and this is exactly the running
time we had for Prim's algorithm for a minimum spanning tree last
time. And, it depends what data
structure you use, what running time you get.
So, I'm going to skip the whole table here.
But, if you use an array, the final running time will be
V^2 because you have order of v extract Min, and you have
constant time decreased key. If you use a binary heap,
which we know and love, then we have order log v for
each operation. And so, this is V plus E log V.
And, so that's what we know how to do.
And, if you use this fancy data structure called a Fibonacci
heap, you get constant time decreased key amortized.
And, you get an E plus v log v worst case bound on the running
time. So, this is the best we know
how to solve shortest paths without any extra assumptions,
single source shortest paths with non-negative edge weights
in general. OK, this is almost as good and
this is sometimes better than that.
But these are essentially irrelevant except that you know
how to do these. You don't know how to do a
Fibonacci heap unless you read that in the chapter of the book.
That's why we mention the top two running times.
OK, I want to talk briefly about a simpler case,
which you may have seen before. And so it's sort of fun to
connect this up to breadth first search in a graph.
So, I mean that ends Dijkstra, so to speak.
But now I want to think about a special case where the graph is
unweighted, meaning w of (u,v) equals one for all vertices,
u and v. OK, suppose we had that
property. Can we do any better than
Dijkstra? Can we do better than this
running time? Well, we probably have to look
at all the edges and all the vertices.
So, the only thing I'm questioning is this log v.
Can I avoid that? I gave away the answer a little
bit. The answer is called breadth
first search, or BFS, which you have probably
seen before. Next to depth first search,
it's one of the standard ways to look at the graph.
But we can say a little bit more than you may have seen
before. Breadth for search is actually
Dijkstra's algorithm: kind of nifty.
There are two changes. First change is that breadth
for search does not use a priority queue.
I'll just tell you what it uses instead.
You can use a queue first in first out honest-to-goodness
queue instead of a priority queue.
OK, it turns out that works. Instead of doing extract Min,
you just take the first thing off the queue.
Instead of doing decreased key, OK, here's a subtlety.
But, this if statement changes a little bit.
So, here is the relaxation step.
So, in order to relax, you say this much simpler
thing. If we haven't visited v yet,
then we declare it to have the shortest path weight,
say, d of v is d of u plus one, which is the weight of the
edge, (u,v). And we add v to the end of the
queue. So, now, we start with the
queue empty. Actually, it will just contain
the vertex, S, because that's the only thing
we know the shortest path for. So, the queue is just for,
I know the shortest path of this thing.
Just deal with it when you can't look at all the outgoing
edges when you can. So, initially that's just S.
You say, well, for all the outgoing edges,
S has zero. All the outgoing edges from
there have weight one. The shortest path weight from
the source is one. You certainly can't do any
better than that if all the weights are one.
OK, so we add all those vertices to the end of the
queue. Then, we process things in
order, and we just keep incrementing,
if their value is d of u, add one to it.
That's d of v. And then we are going to add v
to S what we get to it in the queue.
OK, that is breadth for search, very simple.
And, you can look at the text for the algorithm and for an
example because I don't have time to cover that.
But the key thing is that the time is faster.
The time is order V plus E because as before,
we only look at each edge once we look at all the outgoing
edges from all the vertices. As soon as we set d of v to
something, it will remain that. We never touch it.
We are going to add it to S. That only happens once.
So, this if statement, and so on, in the in-queuing,
is done order E times, or actually E times,
exactly. An in-queuing to a queue,
and de-queuing from a queue, that's what we use instead of
extract Min, take constant time, so the total running time,
number of vertices plus the number of edges.
OK, not so obvious that this works, but you can prove that it
works using the Dijkstra analysis.
All you have to do is prove that the FIFO priority queue.
Once you know that, by the correctness of Dijkstra
you get the correctness of breadth for search.
So, not only is breadth for search finding all the vertices,
which is maybe what you normally use it for,
but it finds the shortest path weights from S to every other
vertex when the weights are all one.
So, there we go: introduction to shortest paths.
Next time we'll deal with negative weights.