MALE SPEAKER: I'd like to now introduce Professor Bill Pugh
from the University of Maryland, who works on the
FindBugs project, among several others.
He has been a consultant at Google before and visited us
several times, and given us a talk on the
FindBugs project before.
But now he would like to update us on what's going on
with FindBugs now.
WILLIAM PUGH: OK.
So I'm going to be talking a lot about
static analysis in general.
Although I'm actually going to be talking a lot about this
specifically in terms of FindBugs.
One of the things I think people are just starting to
really get a broad appreciation for is the fact
that static analysis really can help you improve the
quality of your code.
I mean, I think three years ago, if you'd asked a lot of
developers, are static analyses helpful, they'd say,
oh, there's Lint.
I hate Lint.
Or oh, halting problem.
You guys can't really do anything.
But now I think a lot of people are beginning to
appreciate that static analysis, usefully applied,
useful tools, can actually do things to help you improve the
quality of your code.
Lots of improvements are still needed.
And we're getting to the point now where it's good enough
that developers will talk to the people doing static
analysis, and we can get feedback and
we can start improving.
Three or four years ago, developer were just like, eh,
I don't care.
I'm going to be talking about FindBugs, which is a tool for
doing static analysis of Java.
But there are lots of other static analyzers out there.
And a lot of what I'm going to talk about
applies more generally.
So FindBugs is an open-source static analysis tool.
It's used within Google.
It actually analyzes class files rather than source code.
This is partially a historical accident, but it turned out to
be fortuitous in a number of ways.
We can generate either XML or text output.
We can run it in Netbeans, Swing, Eclipse, Ant.
At the moment, we have-- well, actually, this is out of date
as of yesterday, but 225,000 downloads from SourceForge.
It's used by Oracle, Wells Fargo, Bank of America, a
bunch of companies.
So we really tried to look for software defects.
There are also a lot of style checkers.
And actually, one of the things you have to overcome
when you tell people, oh, I've got a static analysis tool
that I want to use.
It's like, oh, I've used style checkers before, I hate them.
There are style detectors, and they're useful in some
contexts, but that's different than
actually defect detection.
We're actually trying to find mistakes that
people made in code.
And we can find hundreds of defects in each of several
large apps that we've looked at, real defects, things that
should be fixed.
Now this is the common wisdom, I think, that a lot of people
have about bugs.
I think this was certainly the common wisdom in the academic
research community, and I think it might have been the
common wisdom in a lot of the developer land.
It's that programmers are smart, smart people don't make
dumb mistakes, and we also have good techniques-- unit
testing, pair programming, code inspections--
for finding bugs.
And so, if there are any defects in my code, they must
be really subtle stuff that would require really
sophisticated analysis in order to be able to find.
But then when you actually look at code, you find code
that looks like this.
If in is equal to null, then try closing in.
And this is, of course, guaranteed to result in a null
pointer exception if that close method is ever invoked.
This was from Eclipse.
And this is one of the things that we find.
Now, this is partially because with FindBugs, we've been
looking for the easy bugs.
We haven't been spending six months doing
context-sensitive, path-sensitive, pointer alias
analysis scheme in order to find subtle bugs.
We've been looking stupid bugs.
And we have found more stupid bugs than I would have
So, nobody's perfect.
There are a bunch of common types of errors.
Misunderstood language features, misunderstood APIs.
Typos, there are lot of typos.
Some of them get caught as a syntax error.
Other typos don't get caught as a syntax error.
well, you meant to use equal and instead
you typed not equal.
You have an error.
Misunderstood class or method invariants.
So everybody makes syntax errors.
They rely upon the compiler to catch those.
But what about bugs one step removed from a syntax error?
OK, so, interaction time.
I'm going to put up some code, and I want to see who can
shout out first, what's the bug with this code?
Now, in a couple of cases, you might need to ask a question
about the code.
And if you if you want to ask a question, feel free to shout
out a code.
These are all real code.
So shout it out.
AUDIENCE: The "equals null." No object can equal null, so
that's always guaranteed to be false.
WILLIAM PUGH: Right.
So if columnName were actually null, you'd get a
null pointer exception.
So that won't ever work.
Also, comparing to be equal not with equal-equal is
probably a mistake.
Here's another one.
AUDIENCE: Infinite recursion.
WILLIAM PUGH: Infinite recursive loop.
Here is a method that, if invoked, is guaranteed to take
the string that was passed to it, append monitor action to
the front of it, and then call itself again, over and over
again, building up a longer and longer string until either
you run out of heap or stack.
And I don't know which one would
happen first. OK.
So, if name is not equal to null, or name.length is
greater than 0.
Shout it out.
WILLIAM PUGH: Sorry, what?
AUDIENCE: It's never a value of the second condition.
WILLIAM PUGH: No.
You will evaluate--
WILLIAM PUGH: Right.
So actually, if name is actually null, then the first
part will evaluate to false.
And since this is an or, you only evaluate the second part
when the first part evaluates false.
So if name is equal to null, you'll get a
null pointer exception.
We see a number of variants on that.
This one's a little tricky.
So we have, if some string is equal to the result of calling
so there's a little bit of a trick here.
It needs some context.
The problem here is, getLookAndFeel returns a
class, and we're comparing it to a string.
So this will always return false.
You see some weird code.
OK, so here's this one.
AUDIENCE: Is end-- what's the type of end?
WILLIAM PUGH: Int.
AUDIENCE: [INAUDIBLE] division [INAUDIBLE]
WILLIAM PUGH: Right.
So what it does here-- because end is int, it's going to take
int and divide it by two using integer division.
Then Math.ceiling takes a double, so it's going to
translate the result of that integer division into a
double, pass it to Math.ceiing,
which rounds it up--
except it was converted from an integer--
and then it converts it back down to an int.
So these were some of the examples of things we
find in real code.
Now one of the things that we try to do is, rather than just
finding four different things that we look for, we actually
try to identify bug patterns and try to identify a lot.
We have over 100, maybe 200, different types of bug
patterns that we look for.
Let me just talk about a couple of them.
So this is actually my favorite bug pattern, the
infinite recursive loop.
And this actually arose out of office hours.
A student came to me saying, I'm having trouble with my
I'm getting a stack overflow error, and I don't understand
what that means.
And so, here was the code.
He had a WebSpider, and I had actually given them-- it was
partially my fault.
I had given them a Javadoc comment that said, construct a
WebSpider for this method.
And so that's what the student proceeded
to do in this method.
He constructed a WebSpider.
Of course, this was the constructor for WebSpider.
The Javadoc should have said, initialize a WebSpider.
And so, after a second student came in with the same bug, I
said, ah, OK.
Well, this would never happen in production code, but
students are making this mistake.
So our students use FindBugs, so let me write this for
FindBugs so I can give the students feedback.
It found three other students with the same bug.
Now, before I actually added this to the FindBugs code
base, I wanted to make sure that I didn't
do something stupid.
I'd make stupid mistakes too.
And I wanted to make sure I didn't do something stupid
that would produce a lot of false positives.
So I ran it across the JDK, because there wouldn't be any
of these in the JDK.
And I found four infinite recursive loops in the JDK,
including one written by Joshua Bloch.
A one-line method that does nothing except return the
result of invoking itself--
now, what actually happened here is there was a variable
called foundType, a field, and the method was supposed to
return the value of the field.
And this is on the back of my shirt, and Josh has one of
these shirts that he proudly wears.
And one of the lessons that I think everybody has to
smart people make dumb mistakes.
You can't get embarrassed or uptight about the fact that a
tool found a dumb mistake in your code.
Everybody makes this.
If Josh makes a dumb mistake, you are allowed
to make a dumb mistake.
So what you need to do is you need to embrace your dumb
mistakes and fix them.
So this is actually--
I'll talk a little more, where we can actually look at stuff
And what this shows is, this is across all the JDK versions
that I have a copy of.
The blue line shows how many infinite recursive loops are
in that version of the JDK.
And the red are how many were in a previous version that
have now been fixed.
So red's guaranteed to be monotonic.
When you fix something, it turns from blue to red.
And you can see, early on-- and I didn't have a whole lot
of versions back in the early days, or I actually probably
would have found more.
And it was right about here, the first builds of 1.6, when
I wrote this and I sent email to Sun, saying, hey, I found
infinite recursive loops in your code.
And real quickly, they fixed all the ones that I found.
There's actually one that they didn't fix, but I didn't find
it right away, because I actually had to improve the
tool to find additional infinite recursive loops.
And I thought, wow, that was great.
I fixed infinite recursive loops.
And then about six months later, they start adding more.
And it's like, hey, guys, you're adding more, why aren't
you using FindBugs?
And I filed bug reports and they eventually fixed them.
And then a couple months ago they added another one.
Actually, this one got fixed in the latest build.
There is still one infinite recursive loop that was
introduced in Java 1.3 that I have not been able
to get them to fix.
And I need to sit down--
it's in code which is actually hard to test, because it's in
the crypto stuff, but I need to actually write a test case
to get them to fix the damn thing.
It is my goal--
I don't think I will accomplish it--
to get 1.6 to ship with no infinite recursive loops.
This is actually something else we can
get out of our tool.
So, this is for each infinite recursive loop.
This shows when the bug appeared, and when it
And so you can see that there's one bug that's
basically persisted for the entire duration, and these are
when they where born, when they died, and so forth.
So this is some of the information that we can get
out of looking at these tools.
So for anybody who feels too good about things, there was
one infinite recursive loop in Google Web
Toolkit, version 1.0.20.
It was fixed in version 21.
In TabPanel, there was a getWidgetIndex method, which
returned the result of invoking itself.
And basically, what's happening here-- you actually
see, for infinite recursive loops, there are a couple of
different ways this mistake comes up.
And one of the ways it comes up is you have a decorator
pattern, and you have a method where you forget to say who
you want to delegate the method to, in which case you
wind up delegating to yourself and you get an infinite
And there were also two infinite recursive loops that
are still in the current version, but they're not
They're just code that's shipped with it.
So let me talk about a couple of other bugs.
I want to give you a little bit of a feel for some of the
scope of things we look for.
So in Java, there's a general rule of thumb that, if you
want to have an object that can be properly put in a hash
table or a hash map, and you've defined an equals
method on it, so that two different objects could be
equal, then you need to define a hashcode method, because
otherwise you get the default hashcode method.
And there's really this invariant you're supposed to
have, that if two objects compare as equal, then they
need to have the same hashcode.
And so if you override equals, but you haven't provided any
hashcode method, we signal a warning.
And there are a number of bugs like this still in the JDK.
A couple of examples.
Now, sometimes, when I'll talk to people--
like, I talked to the AWT people about this one.
And they said, well, we don't think anybody's going to put
it in a hash table.
And besides, it would actually be pretty hard to calculate a
good hashcode function.
Actually, in their case, to actually come up with what
they were thinking of and it actually took me awhile to
figure out, what is really a good hashcode method if you
don't think your object will ever be put in a hash table?
And so, here's the good hashcode method.
You put in an assert false, in case you run it with
assertions enabled, to generate an error.
And otherwise you just return 42.
That satisfies the invariant, that if two objects compare as
equal, they have the same hashcode.
And if it turns out that you put a bunch of these guys in
the hash table, it won't be it efficient.
But not being efficient is better than not working.
And so one of the things we try to do is we--
it's not sufficient to just identify bugs.
You also have to help programmers understand the bug
and understand how to fix it.
And I think one of the reasons people weren't fixing this so
much is they just said, oh, well, but I don't know how to
create a good hash table.
And it just didn't occur to us.
We recently rewrote our stuff to say, just returned 42.
Null pointer dereferences.
Now, you can do very sophisticated analysis to try
to find null pointer dereferences, do
We do very simple stuff.
We don't do interprocedural--
and basically what we look for are statements or branches
that, if executed, guarantee that a null pointer exception
So if you've got this, that means either your code could
throw a null pointer exception, or you've got a
branch or statement which is dead.
And either of those are a little questionable.
So here's a case where if c is equal to null and invoking a
method on c returns true, then return.
The and should have been or.
This is a pattern that we see over and over again.
If sig is not equal to null or invoking something on sig--
same thing in the JDK.
A lot of stuff just occurs over and over again.
In this case, what happens is confusion between a variable
name and a field that are named the same.
And so there's a field which is passed in which could be
null, and as a result of this if statement--
the variable could be null--
it always initializes the field to be non-null.
But then here, they're using the variable
which could be null.
And so that could get a null pointer exception.
One of the ones which is a little bit harder to figure
out what to do is a redundant null comparison.
So you see a case where you see a
dereference of a variable.
And then after you see the dereference of the variable,
you see a check to see whether or not the variable's null.
The problem is, it's generally not specified any way that a
static analyzer can easily get at it, whether or not this
method parameter is allowed to be null.
Here at this point, there's some confusion on the code
writer about whether or not this method parameter is
allowed to be null.
So it's not as obvious a problem as the other one, but
it's definitely something to be concerned about.
I mean, I don't--
yeah, method names should start with a lowercase letter.
But you can get into worse problems. You can get into
places where you have a method which has the exact name and
signature as a method in a superclass, except the names
are capitalized differently.
So for example, here there's a method called getOKButton, and
in the subclass, there's a method called getOkButton,
except OK is capitalized differently.
You also see things like-- here's a place where somebody
wrote a hashcode method where hashcode is all lowercase.
So actually, now in Java 5, there's this overrides
annotation that you can put on a method that says, I intend
to override a method in the superclass, and if for some
reason I don't, signal an error, which is a little bit
useful in terms of indicating intention here.
But as I go through these things-- some of the things
that we find are things like, yeah, this is probably a bug,
but it's so fuzzy that I'm sure it should actually be
part of the language spec.
I actually think that almost overriding a method with a
different capitalization should probably actually be a
compiler bug, maybe even against the language spec.
Another thing we look for are methods that, if you invoke
them, you ought to be looking at the value
returned by the method.
So for example, in Java, there's a method that reads
into an array of bytes.
And it may not fill up that array, because if you're
reading from a network, and it will just say, well, OK, let
me read from what's available from this packet.
And rather than waiting for the next packet to arrive,
I'll do a short read.
You asked for 1,000 bytes, I'll give you 417, and you'll
have to come back and ask for more bytes.
And the number of bytes actually returned is the
return value of this method.
Now, in C, this is a big problem, where they don't have
exceptions and people return error codes but people don't
always check error codes.
In Java, we have exceptions, so you
don't have that problem.
But there are a number of methods like this that you
have to watch out for.
But actually it turns out there are a bunch of these.
So for example, in Java, there are a number of classes that
are immutable, things like String, BigInteger,
Like, the replace method returns a new string.
Because String is immutable.
You can't change it.
And so if you see a call to name.replace, and it's not
using a return value, then somebody, we're pretty sure,
is thinking that this is going to modify name.
But it doesn't.
It has no effect.
Now, another example.
They're calling replace.
This happens with replace a fair bit.
This one was slightly interesting.
so they need to create a thread to read from the error
stream, because otherwise something gets
clogged up in the works.
So they actually create a thread but
they don't start it.
One of the other ones--
I don't think I actually have one of the examples--
is where somebody will say, if such and such, new illegal
So they create the exception object, but they
forget to throw it.
Here's a substring.
So like I said, there are a lot of these places where
there's some method that, if you invoke this--
basically, it doesn't make sense to invoke this method as
a procedure, only as a function.
These are hard errors.
Our ability to correctly identify synchronization
deadlock issues is not as good as I would like it to be, and
that's because it's a hard problem.
And it's also an incredibly gnarly problem as far as
testing, as I'm sure most of you are aware, because these
concurrency problems don't reoccur reliably.
But it still turns out that there are some problems you
can find using simple techniques.
And so one of the things we looked for is, OK, if you have
a field, and almost every time you access this field you're
holding a lock, but there's just one method where you are
accessing this field and you're not holding a lock,
then it's quite possibly an error.
So this is inconsistent synchronization, where you're
holding a lock most of the time where
you're accessing it.
So for example, here--
this was from GNU Classpath--
there's a field, elementCount.
And almost all the time when you access elementCount,
you're holding a lock, except for the lastIndexOf method,
where you are not.
And as a result, the lastIndexOf method can throw
an index out of bounds exception.
So once again, one of the problems-- and this comes back
to one of the things which is tricky with a
lot of these tools--
is we actually don't have the design of the code.
We don't know, is this class intended to be thread-safe.
But we can look and say, gee, it sure looks like you've got
almost all your methods synchronized, and you've got
all these fields that you're protecting using
Maybe you actually intend for this class to be thread-safe.
If we actually had annotations, and some people
have proposed some that would give us some of the design of
the class, then this stuff would be a lot easier to do.
And of course, the question is how much annotation burden can
you put on developers before they start
saying, eh, not for me?
There are a bunch of other gotchas with synchronization.
So here's something that comes up from time to time.
This was an example in JBoss.
So we check a condition.
And if the condition isn't set, then we grab a lock, and
we wait for the condition to be set.
And what can happen is another thread can set the condition,
in between the time when we check it and
when we grab the lock.
And so it will actually set it and do the notification, and
then after that, we get the lock.
And so we will have missed the notification.
And so there are a couple of things here.
One of the things is, a call to wait should generally be in
the loop, and you should be checking some sort of
condition on it, in order to make sure that you will catch
the appropriate changes.
So those are just a couple of the bug patterns.
Let me show you some overviews of some of the things we found
on a couple different tools.
This is Glassfish, which is Sun's app server.
29 ignored returns values that should not be ignored, 59
classes that have an equals but not a hashcode, 9 calls to
equals that will always return false, because they're
comparing a string and a string buffer for equality, 18
statements and 98 branches that, if ever executed, are
guaranteed to throw a null pointer exception, 10 messages
that, if ever called, will invoke themselves again in an
infinite recursive loop, and a check cast that is guaranteed
to always throw a class cast exception.
Here's some stuff from WebLogic.
1,166 ignored return values--
I'll come back to that--
245 classes that define equals but not hashcode, 45 calls to
equals that always return false, 35 statements that are
guaranteed to throw a null pointer exception, 21 infinite
recursive loops, 4 impossible casts.
The ignored returned values--
there were like 1,000 calls to substring, and it turns out
that this was automatically generated code.
And actually, one of the things which is interesting
because of where we run is that if you have some process
that takes something from your database schema and uses that
to compile down to bytecode or something like that, we can
actually check that.
So something in their process for automatically generating
calls was actually generating a call to substring where they
were ignoring the return value.
Calls to BigDecimal.setScale.
Exception creations, where they were saying, like, new
runtime exception but not throwing it.
Call to BigInteger.add.
And one of the things we do-- like I said, when we don't
just look at a handful of things.
We try to come up with lots of little things, every time we
find some interesting bugs.
Actually, it turns out students are great at
But developers too.
So here's a case where somebody is trying to take an
array of bytes and turn it into a long.
And actually, this is just bad code in general.
But there's a very specific bug here.
it's that an array of bytes is an array of signed bytes.
And so here, we're taking the result, shifting it by eight
bits, and then or-ing in a signed byte.
And that's just not going to work.
And in a lot of these cases, you have to do tuning in order
to do it. just checking places where you're doing an or with
a signed byte--
that gives too many false positives.
And there's actually a fair bit of skill in terms of
figuring out, well, what wort of heuristics do I use to try
to get as many real bugs as possible and not too many
So that's the bug there.
So let me just talk real briefly
about some newish features.
So, behavioral annotations-- actually, I think that's not
the name we're using now.
Annotations for static analysis.
So one of the things, hopefully, we're going to
propose as a JSR--
we've actually got some of these features done--
is you can use annotations in your Java code to indicate
things that the Java libraries will check.
So for example, you can mark a method parameter as being
non-null, and say this parameter should never be
null, or that the return value from this method might be
null, so if you invoke me you'd better check it.
Check return value.
That says, if you invoke this method, you'd better not
ignore the return value.
These are things--
like, for example, for all these methods on String.
Right now, we've hard-coded it, but if somebody doing a
third-party library wants to be able to do that, they'll do
One of the things-- we haven't done this yet, but for
security purposes, one of the big issues that comes up is
going between untrusted data, data you would get from a web
form, versus things you're passing into the database,
things that you expect to be checked, that can't be just
And so it'd be nice to be able to mark things as being
Marking something as detainted would be something that you
can write anything into, but what you pull
out of it is untainted.
And that's how you would mark the methods that do your
blacklist or whitelisting and so forth.
And you might think, well, I could do some of this with
I could do interprocedural analysis to figure out which
method parameters are non-null.
And you could, and it's hard, but these are really great,
because they're a way of assigning blame.
So say you do static analysis, and you find out that this
method in Class A is calling a method in Class B, and this
method could pass a null which could cause a null pointer
dereference in Class B.
And these classes were developed by different teams
on different sides of the continent.
Whose fault is it?
Who has to go fix their code?
It's not clear, right?
Whereas, if the person writing this method said, hey, this
parameter better be non-null, then that absolves them from
any involvement in fixing this bug, because you're passing in
something that violates the contract.
And a lot of these things are things that I think will
really help people doing infrastructure libraries.
You put in these annotations and it will help.
Flash back to a story--
when I was a grad student, I was working at Cornell.
I worked on the synthesizer generator, which was a tool
that would generate programming environments.
And I actually did the logic to display the abstract syntax
tree, along with all the attributes and so forth.
And it turned out that, if somebody in an upstream pass
from me had done something that corrupted the tree, it'd
typically blow up in my code.
And I'd get called in all the time because, hey, it's
blowing up in your code.
So I quickly wrote something that would do a whole bunch of
integrity checks, so that I could run the integrity checks
and say, not my fault, somebody is passing me a
And that's one of the things you can do with these
annotations, is that you can wall off your code from having
to deal with the warnings being reported by the static
AUDIENCE: Isn't that what assert is for?
I mean you can say assert [INAUDIBLE]
and then it will blow up [INAUDIBLE] with someone else.
WILLIAM PUGH: Well, actually, there's some questions about--
So if you did static analysis and you found out that there
is a specific assertion or a specific check for something
being null, and if something's null then it throws an illegal
argument exception or something like that, then yes,
I think then you could say that's the equivalent as
marking something non-null.
this as well [INAUDIBLE]
WILLIAM PUGH: Well, another question is, how does this
work with inheritance, right?
WILLIAM PUGH: Right.
And another thing you can do, and we've actually got some
mechanisms that will allow you to say, within this package,
all method arguments should be non-null unless I say
otherwise explicitly for that method argument.
Whether or not people believe that method arguments should
be null is like a great religious debate.
The fact that Java collections, that a map can
allow you to have null as a key--
people have very strong opinions as to whether or not
that was a good idea, but it does.
But some people just say, I don't like null.
I'm just going to declare my package as saying, no null
unless I say you're allowed to.
AUDIENCE: So are you proposing this only for
Or are there also going to be some static checks
WILLIAM PUGH: OK, so, actually,
we've got some of these.
IntelliJ also has some of these.
Nobody is really eager to put in both the annotations from
FindBugs and from IntelliJ.
So what we really need to do is, first off, we need to get
We need to have this part of the Java process, so that we
can come up with standard names and actually really
agree on what "could be null" means, or "nullable" or
whatever it is.
What exactly that means.
And then it's really--
I mean, it's part of the documentation, but in a way
that could be checked by tools.
So for example, you could use it in a static analyzer, but
you could also imagine doing something, saying, OK, we're
going to do some rewriting of the class files so that if
you've marked something as non-null, an assertion will be
thrown if any of these things are null.
So, I mean, the problem is Javadoc sometimes describes
stuff well to humans, but it's really hard to get good stuff
out of that for automatic tools.
So it's documentation that can be used by tools for either
static or dynamic analysis.
AUDIENCE: How about the [INAUDIBLE] sum of these
fetures into the library itself, such as [INAUDIBLE]
API sometimes some API [INAUDIBLE]
WILLIAM PUGH: So the question is actually improving the
language to fix this.
The hurdles involved in actually changing the Java
language are substantial.
They're not completely impossible, but they are
I think the one thing that we may see is that there's a
proposal afoot to also expand the number of places an
annotation can occur, to basically allow you to do some
custom typing and so forth.
So, right now, you can't say that this is an array of
non-null strings, or an array list of non-null strings.
Because you can't put an annotation there.
You can't put it on a type parameter.
So one of the things to do would be to expand where the
annotations can go.
And the other thing is that if you make it part of the
language, then you're really saying that, OK, whatever list
of things we come up with, that's it, period.
Whereas, if you do make it like, well, it's a common set
of annotations, then it's like, well, and then next
version, maybe we add two more annotations, and a couple more
annotations, and you have a couple more.
Are you familiar with Guy Steele's talk
on growing a language?
It's a great talk, but basically Guy said, rather
than putting a lot of features in the language, what you
should try to do is you should try to make the language so
that it's possible for users of the language to extend it.
And I think what we should be thinking about is, how do we
change the Java language so that these annotations are
useful and can be applied to all the places where you'd
want to do them, but trying to hardcode into the Java
language, what are the set of annotations
you'd want to need--
that would be limiting.
People have thought about, do we really want to put
read-only or non-null into the language?
You might come up with two or three.
But there are certainly other things, like
check return value.
That's not going to be part of the language.
Not in Java.
Maybe the next language.
So another thing we've got--
it's one of the least-known features of FindBugs, for
those of you who've actually used it.
A couple of people I've talked to here have said, I didn't
know you had that.
We can keep track of a historical record of bugs.
So you run FindBugs on one version of the code, and now
the next night you do another build of your software.
You run FindBugs, and then you can actually combine those
results and do this every night or every build, so you
get a history.
And you can say, OK, now let me see, what are the FindBugs
warnings that have been introduced into my code since
the last time we did a release?
And those are the ones that you want to perhaps give
greater attention to, because it's often the case that you
run FindBugs over a large piece of code and you get a
lot of warnings like, oh, I don't want to deal with these.
But I think that saying, well, let me just look at what's
been introduced since the last time we
released this to users.
And that will hopefully be a pretty small list, and might
be something worth looking at.
So I've talked a little bit about what I've done.
Let me talk about some of the things I'm
trying to figure out.
What is the fruit distribution?
I've always described to people--
FindBugs looks for low-hanging fruit.
I've showed the detectors we look for.
They're all simple.
But I want more fruit.
Where do I go to get it?
Do we write more sophisticated analysis to do interprocedural
stuff that understands fields and so forth, do we go higher
up in the tree?
Or do we write more shallow but general bug detectors?
Or do we write
application-specific bug detectors?
It's really a question of, where are the useful bugs that
can be found with static analysis?
If I knew that having somebody slave for 12 months over doing
a very fancy context-sensitive pointer alias analysis scheme
would allow me to find 10 times as many important bugs,
then I might decide, oh, yeah, that would be a good
investment of time.
But it may turn out that doing all that sophistication only
gives you 20% more real bugs, in which case that's not a
good investment of time.
What kinds of bugs can be found by static analysis?
I would never have thought to look for infinite recursive
loops, unless some student came to me in office hours and
had that problem.
This is true for a lot of the things.
The thing about doing an integer division, converting
the result to a double, passing the result to
I didn't just think, oh, that would be a good bug.
I actually saw that in code.
So it's relatively easy to measure, to do something about
You can run your analyzer over a bunch of code, look at all
the reports, go through and say, which of
these are real errors?
What it's hard to get a handle on are, what are the types of
errors that could be detected by static analysis but aren't?
And it's often the case, as you look at these, you can
say, oh, I understand what it would take to catch this.
And you try it.
You find out, oh, a gazillion false positives, and then you
say, oh, I've got to tune it.
And sometimes you successfully tune it down to a good
detector, sometimes you don't.
And part of this is trying to turn bug
instances into bug patterns.
And one of the things I think we need to do, as far as
software testing and so forth, is we need to change our
development process so that we learn from our mistakes.
We need to evaluate bugs and see if they can be turned into
a general bug pattern.
A lot of the bugs that occur will occur elsewhere in a
large code base.
So for example the flaw that Josh
identified in binary search.
How many people know about this issue Josh found in
OK, so this binary search implementation in Java, and in
a lot of textbooks, including the textbook Josh was taught
from when he was at CMU, have a bad implementation of binary
search, because it takes the lower bound and the upper
bound, which are of type int, adds them together, and
divides the result by two.
If you get an integer overflow when doing that addition, you
get a negative result.
And it turns out--
like, in Java, what you just do is you add them together,
and then you do an unsigned right shift, and
it works just fine.
And somebody, maybe somebody at Google, was doing a binary
search in an array with more than 2 to the 30th elements
and got hit by this.
But I know took that and turned it into a bug detector
in FindBugs, and we now look for it.
An example of trying to do this-- one of the things I do
is I get every build of the JDK, and they have a list of
what bugs were fixed.
And I look through them and say, OK, are any of these
things-- and a lot of them are like, oh, cleaned up Javadoc.
Yeah, I don't care.
But I look through and say, OK, is this something that
could be part of a bug pattern that I could try to fix?
So in build 89, they fixed a serialization bug in
And basically, it took me five hours of work from start to
finish to write and tune a bug detector to find that problem.
And I actually found 17 other erroneous classes in the JDK
that all had the same problem.
So the problem was it was a class that was serializable
but it had a transient field.
A transient field is one that is not saved by the
And it wasn't being reset when you deserialized an object by
either a readObject or readResolve method.
And so you first do that and you find out, oh, it's giving
too many false positives.
There's this whole tweaking process.
So for example, it turned out in some of them, it was a
field that was set to minus 1 in the
constructor, but it was transient.
And so when it was deserialized it was set to 0,
and that gave the wrong semantics.
It was a cached hashcode, but for some reason they decided
to have the value to represent an uninitialized uncomputed
hashcode to be minus 1.
And so, for example, if the transient field is set to a
non-default value in the constructor, then I have to
elevate the priority.
And this was able to give us those bugs.
And that was 17 defective classes, and I think it was 49
classes where we got the warning.
So more than 1/3 build errors.
How can we make it easy to write bug detectors?
Now this was something that, when we were starting to do
this research, we said, well, why don't we start by figuring
out nice high-level patterns?
But I was like, you know, I don't really know the scope of
what are the types of bugs to find?
Lets just write bugs, and we can use Java-- it's
to try out our bug detectors.
Now I think we've got enough of a scope that we can start
trying to figure out.
So like I said, I can do this for the JDK.
I don't really want to be doing it for the JDK.
I want Martin or somebody else to be doing it for the JDK.
And I don't want to be doing it for Google.
And if we get lots of developers doing this--
and some of them may be company-specific practices,
some of them may be general--
I think this will be very helpful.
So I think we have enough examples to start thinking
How can we make static analysis pervasive?
So, state-of-the-art static analysis has a lot to offer,
more than I think many people suspected.
What are the practical and cultural issues that need to
be surmounted to make it pervasive?
Now, I am quite aware that there are things about
FindBugs that probably suck, things that make it really
hard for people to use.
And one of the things I'm hoping to do is figure out,
what are the points of pain in trying to use this?
Unlike companies that have field engineers, I haven't
done as many trips to companies to
see how that works.
That's one of the things I'm doing here this summer.
One of them is false positive suppression.
People worry about, with these tools, false positives--
well, if you deliver no more than 50% false positives,
that's not too bad.
The thing is, nobody wants to look at the same false
positive more than once.
It's one thing to look at it once.
People will tolerate that if you don't have too high a
false positive rate.
But if you run a tool and you fix all the real bugs, then
all you have left are false positives.
And the next time you run the tool, maybe you've introduced
one error, but you still have 500 false positives that
weren't fixed from the last time, and now nobody wants to
use the tool.
What are the other points of pain?
So a couple quick announcements.
So I'm here all summer.
I'm taking refuge from the heat and humidity in
Washington DC, and I hope to develop some substantial
collaborations at Google.
I'm trying to get everything set up so I'll be able to work
with people, look at Google Code Base,
sit down with people.
This is partially not only to understand FindBugs and so
forth, but also as an academic--
I tell students how to develop code, how to get prepared.
I really feel like I need to take some time, immerse myself
in the real world so I'm not just talking out
of the ivory tower.
So I'll be here.
I want to really understand how software development,
software quality works at Google.
Send me email, and I'll be here and we'll set up visits.
AUDIENCE: By here, do you mean, living locally or
working at Google?
WILLIAM PUGH: I am living in Palo Alto.
And I hopefully will be visiting Google about three
days a week for the next six weeks.
AUDIENCE: Under what non-disclosure [INAUDIBLE]
WILLIAM PUGH: We're working that out.
But hopefully I will be under an NDA that will allow me to
look at Google code.
Hopefully we'll get that all figured out today.
And now let me just briefly tell you about
one other cool project.
And hopefully I may get to do a TechTalk on this later
during my visit.
This is the Marmoset project.
So the Marmoset project is a total rethinking of how
student programming projects are submitted and tested.
So it's designed to provide students, instructors, and
research with lots of feedback, including feedback
before the submission deadline.
We're also collecting large datasets of student efforts
and we're starting to learn lots of things about how
students learn to program.
And I don't have a slide on this, but I've got a couple
So let me tell you the really cool feature about how
So traditionally, the way programming projects get
assessed for correctness is that the instructor publishes,
here's the project description.
Here are three sample test cases.
Go work on it.
And students work on it and they submit it electronically
somehow, and then there's a deadline.
And you give a tarball to the TA, who runs the whole thing
against a bunch of secret instructor test cases.
And a week later, maybe, he comes back with the results
and it's emailed to the students.
And the students don't get feedback from how they did on
the project until after they're already working on the
The instructors only really get feedback in terms of
questions in office hours and in lecture.
So what happens with Marmoset instead--
it works the same.
You post the project description, you post a couple
of sample test cases.
These are the public test cases.
Students go to work on the project.
They can submit whenever they want.
After they submit, they can go to a web server, where they
can see the results of compiling their program and
running it against the public test cases.
It shouldn't be a surprise, because they've got these, but
sometimes it turns up platform dependencies.
And then, if a student submission passes all of the
public test cases, then the students get an option, would
you like to release test your submission?
And if the student says, yes, I want to
perform a release test--
maybe this is the poker game project.
And it will say, OK, we ran the release tests
against your code.
There were nine tests.
You failed four of them.
The names of the first two tests you failed are
test_full_house and test_four_of_a_kind plus two
more tests where the Q&A team couldn't be bothered to write
down the details.
Now, students can think about this.
Oh, full house, yeah, I think I know what I did wrong.
They can go make a quick change and resubmit.
They can resubmit as often as they want, but performing a
release test requires a token.
Students are given some limited number of tokens, like
two or three, and they regenerate 24 hours after
So students, within a 24-hour period, they only have a
limited number of times to get feedback
from the release tests.
So this encourages students to start programming early, which
is a good thing.
It gets them into that mode of, OK, it's the last day,
I've only got one more release token left.
Have I done everything I can do to ensure the quality of my
code before I'm willing to burn up my last token.
So it's both a throwback to the idea of releasing your
code-- either putting it on the website or handing it over
to the Q&A team.
You really want to do your own stuff.
Or, for us old-timers, and I'm barely an old-timer, it's
turning your card deck over the computer operator, knowing
that it's going to take three hours to get your answer back,
and you really want to desk-check your code, because
it's really embarrassing to turn in a deck with a stupid
error and have to wait for that long turnaround.
And the other thing is instructors get feedback.
We run all the tests as soon as stuff is submitted, so
instructors can see, at any time, how many students are
passing each test case.
And you can get all sorts of feedback from that.
But we also collect snapshots of student code from every
time they save their files.
And here's some of the data.
I know Google people like data, so this is reasonably
From four semesters of a CS 2 course, we had 147,000
snapshots of student effort, all the files from one student
working on a project.
We take each one of those snapshots and we run them
against all the test cases.
So we have over 2 million runs of student code.
And then we can run our static analyzer over 147,000 things.
And we can actually look at what exceptions occur.
So it turned out that, out of those 2 million runs--
actually I think these exceptions are per snapshot,
not per run.
So if a snapshot got a null pointer exception in multiple
test runs, it's only counted once here.
So the most common runtime exception was a
null pointer exception.
Surprisingly, the second most common was
a class cast exception.
Index out of bounds, array index out of
bounds, stack overflows.
A stack overflow is typically a recursive loop.
And so one of the things we were actually able to do with
FindBugs is we were able to check, where are we predicting
an infinite recursive loop?
Where are students encountering a stack overflow?
What are the places where the students are getting a stack
overflow, but we're not predicting an infinite
And you can just eyeball that.
And in some cases, you can say, oh, well, this is mutual
recursion and our analysis doesn't detect that.
But in a couple cases, we were able to use that to improve
And actually, improving the analysis to catch more of the
student errors is what led us to improve it to find more
infinite recursive loops in the JDK.
So that's Marmoset.
And I'm done.
AUDIENCE: How does FindBugs [INAUDIBLE]
interact with test coverage?
Is it your experience that code that's more [INAUDIBLE]
WILLIAM PUGH: Yeah.
It's certainly the case that a lot of the errors we find
indicate code that has not been tested.
I mean, when you find a method that, if invoked, is
guaranteed to throw a stack overflow, obviously something
was lacking in the coverage.
And actually, in some ways, there's sort of
an interesting trade-off.
The nice thing about static analysis is you don't need to
have test cases.
And generally, in the stuff which has been tested well,
you're not going to find a lot of things that we look for.
One of the things which gets tricky
sometimes is library code.
What happens when you write a library?
Can you anticipate all the different uses of it?
So I think testing verses static analysis are very
AUDIENCE: You mentioned a question about [INAUDIBLE]
It seems like the Marmoset project is going to allow you
to get some measure of that.
WILLIAM PUGH: Right.
So with FindBugs, we've been able to do lots of stuff in
terms of looking at open-source projects.
But generally, what we don't have for open-source projects
is, well, what are actually the errors in this stuff?
To the extent that they actually have test cases, you
find that if you want to pull a bunch of different versions
of Apache off the source and actually run them all, it's
really hard, because they all have all sorts of
Whereas here, we have something that allows us to
say, oh, we've got all these versions.
We've got these test cases that are designed to be
exhaustive tests of the functionality of the code.
And we can look--
OK, they made this two-line change and it caused a warning
to go away and at the same time, three test
cases started working.
Maybe there's a correlation there.
So yes, I think that is very helpful.
It is student code.
The projects are smaller.
We found more connection than we would have thought between
bugs students make and bugs we can find in production code.
But yes, that is one of the reasons we're very--
AUDIENCE: Do you have a feel for just seeing the tip of the
iceberg when you do FindBugs, or [INAUDIBLE]
WILLIAM PUGH: Yeah.
So a lot of what FindBugs finds are
really coding errors.
What the person had in their mind didn't get translated
correctly into the paper.
Or that they just misunderstood some API.
Whereas if you've got something like--
one of the projects was Sudoku.
And if they simply didn't understand the logic of how to
do Sudoku, well, they're doing some different game.
It's not Sudoku, it's something else.
And we're not going to find that with FindBugs because we
don't understand the domain.
So I'd say that a majority of errors--
probably not a huge majority, but I'm guessing a majority--
are domain-specific, if you look at all
the errors in code.
I don't really know.
Somewhere between 1/2 and 2/3 of all errors
require domain knowledge.
Total wild-ass guess.
Maybe someone else has a different opinion.
And you're not going to be able to find that with a tool
But the stuff that you can easily find--
AUDIENCE: You talked about improving the language.
WILLIAM PUGH: Yeah.
AUDIENCE: You also complained that people weren't using
WILLIAM PUGH: That's starting to change.
AUDIENCE: And you've said that most of the stuff you're doing
is easy to do [INAUDIBLE].
How many of those things would you like to see
put into the compiler?
Not changing the language.
Just making additional checks for the compiler [INAUDIBLE]?
WILLIAM PUGH: The problem with putting something into the
compiler is you've really got to have an insanely low false
People really get cranky if the compiler won't compile
something that they know to be perfectly correct code.
So I think there are a couple of cases, like the overriding
a method except you've mis-capitalized the name.
That's something that, even if you think it should work, it
shouldn't, because it's going to confuse the hell out of the
person who comes after you.
And there are a number of these that should do.
I mean, should, for example, the JDK refuse to compile code
where there's a statement that, if executed, is
guaranteed to throw a null pointer exception?
I don't know.
I don't know.
AUDIENCE: I've seen a lot of test code intentionally trys
to throw out an exception, catches it, [UNINTELLIGIBLE]
WILLIAM PUGH: Well, and you know--
one of the things we look for is a checked cast that's
guaranteed to throw an error.
And so I look for that.
And I found a piece of code that said, hey, you're going
to get one of these errors.
And it says, OK, cast this to that, and there's a comment--
force a checked cast exception here.
Yeah, I think some of this stuff could be moved in.
In some ways, it gives me more freedom to innovate, to not be
part of javac.
But certainly I think there are a number of things that
would be worthwhile moving in.
It's not where I am probably going to be expending my
I'm probably more thinking for the next language.
And part of it is, Java--
I beat my head against changing Java in a few areas,
and I do it when there's a good payoff, and I don't think
that's a good payoff.
The next language I think [INAUDIBLE]
modern language is a hard language, and people are going
to make stupid mistakes.
That's the thing.
Either it's not possible, or if it was possible, it
wouldn't be desirable, to have a programming language in
which you couldn't make stupid mistakes.
And one of the things the job is done is it's limited the
impact of errors.
This is one of the things-- in C and C++, when you make a
stupid error, you can corrupt your memory in subtle ways
that can't be detected until far away.
And it's very hard.
Whereas in Java, a lot of stupid mistakes turn into
And those are easier to catch, easier to deduce.
So I'd say that the biggest problem as far as where
FindBugs needs to be better aware, software quality needs
to be better, is going to be concurrency.
Suddenly everybody is going to be running on 2- and 4- and 8-
and 16-way processors, and all this code that worked fine
when you were running it on a single-processor machine is
going to start falling down.
I talked with Cliff Click, who works at Azul Systems, and he
says, every time they have some customer that comes in,
and they say, well, let's see how our software scales on
And they run it on their 384 processor system, and the
software immediately falls over.
It's like, oh, you've got a crappy machine.
It's just like, OK, let's turn on a couple of switches to
oversynchronize everything, and it runs just fine.
So it's, ah, you have a synchronization
problem in your code.
So I think trying to deal with the fact that, within 10
years, 8-way and 16-way laptops might
be headed our way--
I think trying to figure out that and figure out the right
way to teach concurrency and write frameworks for
concurrency-- that's a big problem.
And a lot of these other things.
I haven't talked about all the problems with web frameworks
and so forth, and all the types of errors and so forth
that you can make there.
OK, well, I'll be here for lunch.
If anybody wants to join us for lunch, come
on up to the front.