Follow US:

Practice English Speaking&Listening with: Using Static Analysis For Software Defect Detection

Normal
(0)
Difficulty: 0

MALE SPEAKER: I'd like to now introduce Professor Bill Pugh

from the University of Maryland, who works on the

FindBugs project, among several others.

He has been a consultant at Google before and visited us

several times, and given us a talk on the

FindBugs project before.

But now he would like to update us on what's going on

with FindBugs now.

WILLIAM PUGH: OK.

Thank you.

OK.

So I'm going to be talking a lot about

static analysis in general.

Although I'm actually going to be talking a lot about this

specifically in terms of FindBugs.

One of the things I think people are just starting to

really get a broad appreciation for is the fact

that static analysis really can help you improve the

quality of your code.

I mean, I think three years ago, if you'd asked a lot of

developers, are static analyses helpful, they'd say,

oh, there's Lint.

I hate Lint.

Or oh, halting problem.

You guys can't really do anything.

But now I think a lot of people are beginning to

appreciate that static analysis, usefully applied,

useful tools, can actually do things to help you improve the

quality of your code.

Lots of improvements are still needed.

And we're getting to the point now where it's good enough

that developers will talk to the people doing static

analysis, and we can get feedback and

we can start improving.

Three or four years ago, developer were just like, eh,

I don't care.

I'm going to be talking about FindBugs, which is a tool for

doing static analysis of Java.

But there are lots of other static analyzers out there.

And a lot of what I'm going to talk about

applies more generally.

So FindBugs is an open-source static analysis tool.

It's used within Google.

It actually analyzes class files rather than source code.

This is partially a historical accident, but it turned out to

be fortuitous in a number of ways.

We can generate either XML or text output.

We can run it in Netbeans, Swing, Eclipse, Ant.

At the moment, we have-- well, actually, this is out of date

as of yesterday, but 225,000 downloads from SourceForge.

It's used by Oracle, Wells Fargo, Bank of America, a

bunch of companies.

So we really tried to look for software defects.

There are also a lot of style checkers.

And actually, one of the things you have to overcome

when you tell people, oh, I've got a static analysis tool

that I want to use.

It's like, oh, I've used style checkers before, I hate them.

There are style detectors, and they're useful in some

contexts, but that's different than

actually defect detection.

We're actually trying to find mistakes that

people made in code.

And we can find hundreds of defects in each of several

large apps that we've looked at, real defects, things that

should be fixed.

OK.

Now this is the common wisdom, I think, that a lot of people

have about bugs.

I think this was certainly the common wisdom in the academic

research community, and I think it might have been the

common wisdom in a lot of the developer land.

It's that programmers are smart, smart people don't make

dumb mistakes, and we also have good techniques-- unit

testing, pair programming, code inspections--

for finding bugs.

And so, if there are any defects in my code, they must

be really subtle stuff that would require really

sophisticated analysis in order to be able to find.

But then when you actually look at code, you find code

that looks like this.

If in is equal to null, then try closing in.

And this is, of course, guaranteed to result in a null

pointer exception if that close method is ever invoked.

This was from Eclipse.

And this is one of the things that we find.

Now, this is partially because with FindBugs, we've been

looking for the easy bugs.

We haven't been spending six months doing

context-sensitive, path-sensitive, pointer alias

analysis scheme in order to find subtle bugs.

We've been looking stupid bugs.

And we have found more stupid bugs than I would have

suspected existed.

OK.

So, nobody's perfect.

There are a bunch of common types of errors.

Misunderstood language features, misunderstood APIs.

Typos, there are lot of typos.

Some of them get caught as a syntax error.

Other typos don't get caught as a syntax error.

Instead, simply--

well, you meant to use equal and instead

you typed not equal.

You have an error.

Misunderstood class or method invariants.

So everybody makes syntax errors.

They rely upon the compiler to catch those.

But what about bugs one step removed from a syntax error?

OK, so, interaction time.

Programming puzzlers.

I'm going to put up some code, and I want to see who can

shout out first, what's the bug with this code?

Now, in a couple of cases, you might need to ask a question

about the code.

And if you if you want to ask a question, feel free to shout

out a code.

These are all real code.

So shout it out.

AUDIENCE: The "equals null." No object can equal null, so

that's always guaranteed to be false.

WILLIAM PUGH: Right.

So if columnName were actually null, you'd get a

null pointer exception.

So that won't ever work.

Also, comparing to be equal not with equal-equal is

probably a mistake.

OK.

Here's another one.

AUDIENCE: Infinite recursion.

WILLIAM PUGH: Infinite recursive loop.

Here is a method that, if invoked, is guaranteed to take

the string that was passed to it, append monitor action to

the front of it, and then call itself again, over and over

again, building up a longer and longer string until either

you run out of heap or stack.

And I don't know which one would

happen first. OK.

So, if name is not equal to null, or name.length is

greater than 0.

Shout it out.

AUDIENCE: [UNINTELLIGIBLE]

WILLIAM PUGH: Sorry, what?

AUDIENCE: It's never a value of the second condition.

WILLIAM PUGH: No.

You will evaluate--

AUDIENCE: [INAUDIBLE]

WILLIAM PUGH: Right.

So actually, if name is actually null, then the first

part will evaluate to false.

And since this is an or, you only evaluate the second part

when the first part evaluates false.

So if name is equal to null, you'll get a

null pointer exception.

We see a number of variants on that.

This one's a little tricky.

So we have, if some string is equal to the result of calling

getLookAndFeel--

so there's a little bit of a trick here.

It needs some context.

The problem here is, getLookAndFeel returns a

class, and we're comparing it to a string.

So this will always return false.

You see some weird code.

OK, so here's this one.

AUDIENCE: Is end-- what's the type of end?

WILLIAM PUGH: Int.

AUDIENCE: [INAUDIBLE] division [INAUDIBLE]

WILLIAM PUGH: Right.

So what it does here-- because end is int, it's going to take

int and divide it by two using integer division.

Then Math.ceiling takes a double, so it's going to

translate the result of that integer division into a

double, pass it to Math.ceiing,

which rounds it up--

except it was converted from an integer--

and then it converts it back down to an int.

OK.

So these were some of the examples of things we

find in real code.

Now one of the things that we try to do is, rather than just

finding four different things that we look for, we actually

try to identify bug patterns and try to identify a lot.

We have over 100, maybe 200, different types of bug

patterns that we look for.

Let me just talk about a couple of them.

So this is actually my favorite bug pattern, the

infinite recursive loop.

And this actually arose out of office hours.

A student came to me saying, I'm having trouble with my

constructor.

I'm getting a stack overflow error, and I don't understand

what that means.

And so, here was the code.

He had a WebSpider, and I had actually given them-- it was

partially my fault.

I had given them a Javadoc comment that said, construct a

WebSpider for this method.

And so that's what the student proceeded

to do in this method.

He constructed a WebSpider.

Of course, this was the constructor for WebSpider.

The Javadoc should have said, initialize a WebSpider.

And so, after a second student came in with the same bug, I

said, ah, OK.

Well, this would never happen in production code, but

students are making this mistake.

So our students use FindBugs, so let me write this for

FindBugs so I can give the students feedback.

It found three other students with the same bug.

Now, before I actually added this to the FindBugs code

base, I wanted to make sure that I didn't

do something stupid.

I'd make stupid mistakes too.

And I wanted to make sure I didn't do something stupid

that would produce a lot of false positives.

So I ran it across the JDK, because there wouldn't be any

of these in the JDK.

And I found four infinite recursive loops in the JDK,

including one written by Joshua Bloch.

A one-line method that does nothing except return the

result of invoking itself--

now, what actually happened here is there was a variable

called foundType, a field, and the method was supposed to

return the value of the field.

And this is on the back of my shirt, and Josh has one of

these shirts that he proudly wears.

And one of the lessons that I think everybody has to

understand--

smart people make dumb mistakes.

You can't get embarrassed or uptight about the fact that a

tool found a dumb mistake in your code.

Everybody makes this.

If Josh makes a dumb mistake, you are allowed

to make a dumb mistake.

So what you need to do is you need to embrace your dumb

mistakes and fix them.

So this is actually--

I'll talk a little more, where we can actually look at stuff

across time.

And what this shows is, this is across all the JDK versions

that I have a copy of.

The blue line shows how many infinite recursive loops are

in that version of the JDK.

And the red are how many were in a previous version that

have now been fixed.

So red's guaranteed to be monotonic.

When you fix something, it turns from blue to red.

And you can see, early on-- and I didn't have a whole lot

of versions back in the early days, or I actually probably

would have found more.

And it was right about here, the first builds of 1.6, when

I wrote this and I sent email to Sun, saying, hey, I found

infinite recursive loops in your code.

And real quickly, they fixed all the ones that I found.

There's actually one that they didn't fix, but I didn't find

it right away, because I actually had to improve the

tool to find additional infinite recursive loops.

And I thought, wow, that was great.

I fixed infinite recursive loops.

And then about six months later, they start adding more.

And it's like, hey, guys, you're adding more, why aren't

you using FindBugs?

And I filed bug reports and they eventually fixed them.

And then a couple months ago they added another one.

Actually, this one got fixed in the latest build.

There is still one infinite recursive loop that was

introduced in Java 1.3 that I have not been able

to get them to fix.

And I need to sit down--

it's in code which is actually hard to test, because it's in

the crypto stuff, but I need to actually write a test case

to get them to fix the damn thing.

It is my goal--

I don't think I will accomplish it--

to get 1.6 to ship with no infinite recursive loops.

OK.

This is actually something else we can

get out of our tool.

So, this is for each infinite recursive loop.

This shows when the bug appeared, and when it

disappeared.

And so you can see that there's one bug that's

basically persisted for the entire duration, and these are

when they where born, when they died, and so forth.

So this is some of the information that we can get

out of looking at these tools.

So for anybody who feels too good about things, there was

one infinite recursive loop in Google Web

Toolkit, version 1.0.20.

It was fixed in version 21.

In TabPanel, there was a getWidgetIndex method, which

returned the result of invoking itself.

And basically, what's happening here-- you actually

see, for infinite recursive loops, there are a couple of

different ways this mistake comes up.

And one of the ways it comes up is you have a decorator

pattern, and you have a method where you forget to say who

you want to delegate the method to, in which case you

wind up delegating to yourself and you get an infinite

recursive loop.

And there were also two infinite recursive loops that

are still in the current version, but they're not

Google code.

They're just code that's shipped with it.

OK.

So let me talk about a couple of other bugs.

I want to give you a little bit of a feel for some of the

scope of things we look for.

So hashcode/equals.

So in Java, there's a general rule of thumb that, if you

want to have an object that can be properly put in a hash

table or a hash map, and you've defined an equals

method on it, so that two different objects could be

equal, then you need to define a hashcode method, because

otherwise you get the default hashcode method.

And there's really this invariant you're supposed to

have, that if two objects compare as equal, then they

need to have the same hashcode.

And so if you override equals, but you haven't provided any

hashcode method, we signal a warning.

And there are a number of bugs like this still in the JDK.

A couple of examples.

Now, sometimes, when I'll talk to people--

like, I talked to the AWT people about this one.

And they said, well, we don't think anybody's going to put

it in a hash table.

And besides, it would actually be pretty hard to calculate a

good hashcode function.

Actually, in their case, to actually come up with what

they were thinking of and it actually took me awhile to

figure out, what is really a good hashcode method if you

don't think your object will ever be put in a hash table?

And so, here's the good hashcode method.

You put in an assert false, in case you run it with

assertions enabled, to generate an error.

And otherwise you just return 42.

That satisfies the invariant, that if two objects compare as

equal, they have the same hashcode.

And if it turns out that you put a bunch of these guys in

the hash table, it won't be it efficient.

But not being efficient is better than not working.

And so one of the things we try to do is we--

it's not sufficient to just identify bugs.

You also have to help programmers understand the bug

and understand how to fix it.

And I think one of the reasons people weren't fixing this so

much is they just said, oh, well, but I don't know how to

create a good hash table.

And it just didn't occur to us.

We recently rewrote our stuff to say, just returned 42.

That works.

OK.

Null pointer dereferences.

Now, you can do very sophisticated analysis to try

to find null pointer dereferences, do

interprocedural stuff.

We do very simple stuff.

We don't do interprocedural--

and basically what we look for are statements or branches

that, if executed, guarantee that a null pointer exception

will occur.

So if you've got this, that means either your code could

throw a null pointer exception, or you've got a

branch or statement which is dead.

And either of those are a little questionable.

So here's a case where if c is equal to null and invoking a

method on c returns true, then return.

The and should have been or.

This is a pattern that we see over and over again.

If sig is not equal to null or invoking something on sig--

same thing in the JDK.

A lot of stuff just occurs over and over again.

In this case, what happens is confusion between a variable

name and a field that are named the same.

And so there's a field which is passed in which could be

null, and as a result of this if statement--

the variable could be null--

it always initializes the field to be non-null.

But then here, they're using the variable

which could be null.

And so that could get a null pointer exception.

One of the ones which is a little bit harder to figure

out what to do is a redundant null comparison.

So you see a case where you see a

dereference of a variable.

And then after you see the dereference of the variable,

you see a check to see whether or not the variable's null.

The problem is, it's generally not specified any way that a

static analyzer can easily get at it, whether or not this

method parameter is allowed to be null.

Here at this point, there's some confusion on the code

writer about whether or not this method parameter is

allowed to be null.

So it's not as obvious a problem as the other one, but

it's definitely something to be concerned about.

Confusing/bad names.

I mean, I don't--

yeah, method names should start with a lowercase letter.

Yeah.

But you can get into worse problems. You can get into

places where you have a method which has the exact name and

signature as a method in a superclass, except the names

are capitalized differently.

So for example, here there's a method called getOKButton, and

in the subclass, there's a method called getOkButton,

except OK is capitalized differently.

You also see things like-- here's a place where somebody

wrote a hashcode method where hashcode is all lowercase.

So actually, now in Java 5, there's this overrides

annotation that you can put on a method that says, I intend

to override a method in the superclass, and if for some

reason I don't, signal an error, which is a little bit

useful in terms of indicating intention here.

But as I go through these things-- some of the things

that we find are things like, yeah, this is probably a bug,

but it's so fuzzy that I'm sure it should actually be

part of the language spec.

I actually think that almost overriding a method with a

different capitalization should probably actually be a

compiler bug, maybe even against the language spec.

OK.

Another thing we look for are methods that, if you invoke

them, you ought to be looking at the value

returned by the method.

So for example, in Java, there's a method that reads

into an array of bytes.

And it may not fill up that array, because if you're

reading from a network, and it will just say, well, OK, let

me read from what's available from this packet.

And rather than waiting for the next packet to arrive,

I'll do a short read.

You asked for 1,000 bytes, I'll give you 417, and you'll

have to come back and ask for more bytes.

And the number of bytes actually returned is the

return value of this method.

Now, in C, this is a big problem, where they don't have

exceptions and people return error codes but people don't

always check error codes.

In Java, we have exceptions, so you

don't have that problem.

But there are a number of methods like this that you

have to watch out for.

But actually it turns out there are a bunch of these.

So for example, in Java, there are a number of classes that

are immutable, things like String, BigInteger,

BigDecimal.

Like, the replace method returns a new string.

Because String is immutable.

You can't change it.

And so if you see a call to name.replace, and it's not

using a return value, then somebody, we're pretty sure,

is thinking that this is going to modify name.

But it doesn't.

It has no effect.

Now, another example.

They're calling replace.

This happens with replace a fair bit.

This one was slightly interesting.

Here--

so they need to create a thread to read from the error

stream, because otherwise something gets

clogged up in the works.

So they actually create a thread but

they don't start it.

One of the other ones--

I don't think I actually have one of the examples--

is where somebody will say, if such and such, new illegal

argument exception.

So they create the exception object, but they

forget to throw it.

Here's a substring.

So like I said, there are a lot of these places where

there's some method that, if you invoke this--

basically, it doesn't make sense to invoke this method as

a procedure, only as a function.

OK.

Synchronization, concurrency.

These are hard errors.

Our ability to correctly identify synchronization

deadlock issues is not as good as I would like it to be, and

that's because it's a hard problem.

And it's also an incredibly gnarly problem as far as

testing, as I'm sure most of you are aware, because these

concurrency problems don't reoccur reliably.

But it still turns out that there are some problems you

can find using simple techniques.

And so one of the things we looked for is, OK, if you have

a field, and almost every time you access this field you're

holding a lock, but there's just one method where you are

accessing this field and you're not holding a lock,

then it's quite possibly an error.

So this is inconsistent synchronization, where you're

holding a lock most of the time where

you're accessing it.

So for example, here--

this was from GNU Classpath--

there's a field, elementCount.

And almost all the time when you access elementCount,

you're holding a lock, except for the lastIndexOf method,

where you are not.

And as a result, the lastIndexOf method can throw

an index out of bounds exception.

So once again, one of the problems-- and this comes back

to one of the things which is tricky with a

lot of these tools--

is we actually don't have the design of the code.

We don't know, is this class intended to be thread-safe.

But we can look and say, gee, it sure looks like you've got

almost all your methods synchronized, and you've got

all these fields that you're protecting using

synchronization.

Maybe you actually intend for this class to be thread-safe.

If we actually had annotations, and some people

have proposed some that would give us some of the design of

the class, then this stuff would be a lot easier to do.

And of course, the question is how much annotation burden can

you put on developers before they start

saying, eh, not for me?

There are a bunch of other gotchas with synchronization.

So here's something that comes up from time to time.

This was an example in JBoss.

So we check a condition.

And if the condition isn't set, then we grab a lock, and

we wait for the condition to be set.

And what can happen is another thread can set the condition,

in between the time when we check it and

when we grab the lock.

And so it will actually set it and do the notification, and

then after that, we get the lock.

And so we will have missed the notification.

And so there are a couple of things here.

One of the things is, a call to wait should generally be in

the loop, and you should be checking some sort of

condition on it, in order to make sure that you will catch

the appropriate changes.

OK.

So those are just a couple of the bug patterns.

Let me show you some overviews of some of the things we found

on a couple different tools.

This is Glassfish, which is Sun's app server.

29 ignored returns values that should not be ignored, 59

classes that have an equals but not a hashcode, 9 calls to

equals that will always return false, because they're

comparing a string and a string buffer for equality, 18

statements and 98 branches that, if ever executed, are

guaranteed to throw a null pointer exception, 10 messages

that, if ever called, will invoke themselves again in an

infinite recursive loop, and a check cast that is guaranteed

to always throw a class cast exception.

Here's some stuff from WebLogic.

1,166 ignored return values--

I'll come back to that--

245 classes that define equals but not hashcode, 45 calls to

equals that always return false, 35 statements that are

guaranteed to throw a null pointer exception, 21 infinite

recursive loops, 4 impossible casts.

The ignored returned values--

there were like 1,000 calls to substring, and it turns out

that this was automatically generated code.

And actually, one of the things which is interesting

because of where we run is that if you have some process

that takes something from your database schema and uses that

to compile down to bytecode or something like that, we can

actually check that.

So something in their process for automatically generating

calls was actually generating a call to substring where they

were ignoring the return value.

Calls to BigDecimal.setScale.

Exception creations, where they were saying, like, new

runtime exception but not throwing it.

Call to BigInteger.add.

OK.

And one of the things we do-- like I said, when we don't

just look at a handful of things.

We try to come up with lots of little things, every time we

find some interesting bugs.

Actually, it turns out students are great at

generating bugs.

But developers too.

So here's a case where somebody is trying to take an

array of bytes and turn it into a long.

And actually, this is just bad code in general.

But there's a very specific bug here.

it's that an array of bytes is an array of signed bytes.

And so here, we're taking the result, shifting it by eight

bits, and then or-ing in a signed byte.

And that's just not going to work.

And in a lot of these cases, you have to do tuning in order

to do it. just checking places where you're doing an or with

a signed byte--

that gives too many false positives.

And there's actually a fair bit of skill in terms of

figuring out, well, what wort of heuristics do I use to try

to get as many real bugs as possible and not too many

false positives.

So that's the bug there.

OK.

So let me just talk real briefly

about some newish features.

So, behavioral annotations-- actually, I think that's not

the name we're using now.

Annotations for static analysis.

So one of the things, hopefully, we're going to

propose as a JSR--

we've actually got some of these features done--

is you can use annotations in your Java code to indicate

things that the Java libraries will check.

So for example, you can mark a method parameter as being

non-null, and say this parameter should never be

null, or that the return value from this method might be

null, so if you invoke me you'd better check it.

Check return value.

That says, if you invoke this method, you'd better not

ignore the return value.

These are things--

like, for example, for all these methods on String.

Right now, we've hard-coded it, but if somebody doing a

third-party library wants to be able to do that, they'll do

those annotations.

One of the things-- we haven't done this yet, but for

security purposes, one of the big issues that comes up is

going between untrusted data, data you would get from a web

form, versus things you're passing into the database,

things that you expect to be checked, that can't be just

arbitrary strings.

And so it'd be nice to be able to mark things as being

tainted, untainted.

Marking something as detainted would be something that you

can write anything into, but what you pull

out of it is untainted.

And that's how you would mark the methods that do your

blacklist or whitelisting and so forth.

And you might think, well, I could do some of this with

static analysis.

I could do interprocedural analysis to figure out which

method parameters are non-null.

And you could, and it's hard, but these are really great,

because they're a way of assigning blame.

So say you do static analysis, and you find out that this

method in Class A is calling a method in Class B, and this

method could pass a null which could cause a null pointer

dereference in Class B.

And these classes were developed by different teams

on different sides of the continent.

Whose fault is it?

Who has to go fix their code?

It's not clear, right?

Whereas, if the person writing this method said, hey, this

parameter better be non-null, then that absolves them from

any involvement in fixing this bug, because you're passing in

something that violates the contract.

And a lot of these things are things that I think will

really help people doing infrastructure libraries.

You put in these annotations and it will help.

Flash back to a story--

when I was a grad student, I was working at Cornell.

I worked on the synthesizer generator, which was a tool

that would generate programming environments.

And I actually did the logic to display the abstract syntax

tree, along with all the attributes and so forth.

And it turned out that, if somebody in an upstream pass

from me had done something that corrupted the tree, it'd

typically blow up in my code.

And I'd get called in all the time because, hey, it's

blowing up in your code.

So I quickly wrote something that would do a whole bunch of

integrity checks, so that I could run the integrity checks

and say, not my fault, somebody is passing me a

corrupted tree.

And that's one of the things you can do with these

annotations, is that you can wall off your code from having

to deal with the warnings being reported by the static

analysis tools.

Yeah?

AUDIENCE: Isn't that what assert is for?

I mean you can say assert [INAUDIBLE]

and then it will blow up [INAUDIBLE] with someone else.

WILLIAM PUGH: Well, actually, there's some questions about--

right.

So if you did static analysis and you found out that there

is a specific assertion or a specific check for something

being null, and if something's null then it throws an illegal

argument exception or something like that, then yes,

I think then you could say that's the equivalent as

marking something non-null.

AUDIENCE: [INAUDIBLE]

this as well [INAUDIBLE]

WILLIAM PUGH: Well, another question is, how does this

work with inheritance, right?

AUDIENCE: [INAUDIBLE]

WILLIAM PUGH: Right.

And another thing you can do, and we've actually got some

mechanisms that will allow you to say, within this package,

all method arguments should be non-null unless I say

otherwise explicitly for that method argument.

Whether or not people believe that method arguments should

be null is like a great religious debate.

The fact that Java collections, that a map can

allow you to have null as a key--

people have very strong opinions as to whether or not

that was a good idea, but it does.

But some people just say, I don't like null.

I'm just going to declare my package as saying, no null

unless I say you're allowed to.

Yeah?

AUDIENCE: So are you proposing this only for

documentation purposes?

Or are there also going to be some static checks

[INAUDIBLE]?

WILLIAM PUGH: OK, so, actually,

we've got some of these.

IntelliJ also has some of these.

Nobody is really eager to put in both the annotations from

FindBugs and from IntelliJ.

So what we really need to do is, first off, we need to get

a standard.

We need to have this part of the Java process, so that we

can come up with standard names and actually really

agree on what "could be null" means, or "nullable" or

whatever it is.

What exactly that means.

And then it's really--

I mean, it's part of the documentation, but in a way

that could be checked by tools.

So for example, you could use it in a static analyzer, but

you could also imagine doing something, saying, OK, we're

going to do some rewriting of the class files so that if

you've marked something as non-null, an assertion will be

thrown if any of these things are null.

So, I mean, the problem is Javadoc sometimes describes

stuff well to humans, but it's really hard to get good stuff

out of that for automatic tools.

So it's documentation that can be used by tools for either

static or dynamic analysis.

AUDIENCE: How about the [INAUDIBLE] sum of these

fetures into the library itself, such as [INAUDIBLE]

API sometimes some API [INAUDIBLE]

WILLIAM PUGH: So the question is actually improving the

language to fix this.

The hurdles involved in actually changing the Java

language are substantial.

They're not completely impossible, but they are

substantial.

I think the one thing that we may see is that there's a

proposal afoot to also expand the number of places an

annotation can occur, to basically allow you to do some

custom typing and so forth.

So, right now, you can't say that this is an array of

non-null strings, or an array list of non-null strings.

Because you can't put an annotation there.

You can't put it on a type parameter.

So one of the things to do would be to expand where the

annotations can go.

And the other thing is that if you make it part of the

language, then you're really saying that, OK, whatever list

of things we come up with, that's it, period.

Whereas, if you do make it like, well, it's a common set

of annotations, then it's like, well, and then next

version, maybe we add two more annotations, and a couple more

annotations, and you have a couple more.

Are you familiar with Guy Steele's talk

on growing a language?

It's a great talk, but basically Guy said, rather

than putting a lot of features in the language, what you

should try to do is you should try to make the language so

that it's possible for users of the language to extend it.

And I think what we should be thinking about is, how do we

change the Java language so that these annotations are

useful and can be applied to all the places where you'd

want to do them, but trying to hardcode into the Java

language, what are the set of annotations

you'd want to need--

that would be limiting.

People have thought about, do we really want to put

read-only or non-null into the language?

You might come up with two or three.

But there are certainly other things, like

check return value.

That's not going to be part of the language.

And tainted?

Not in Java.

Maybe the next language.

So another thing we've got--

it's one of the least-known features of FindBugs, for

those of you who've actually used it.

A couple of people I've talked to here have said, I didn't

know you had that.

We can keep track of a historical record of bugs.

So you run FindBugs on one version of the code, and now

the next night you do another build of your software.

You run FindBugs, and then you can actually combine those

results and do this every night or every build, so you

get a history.

And you can say, OK, now let me see, what are the FindBugs

warnings that have been introduced into my code since

the last time we did a release?

And those are the ones that you want to perhaps give

greater attention to, because it's often the case that you

run FindBugs over a large piece of code and you get a

lot of warnings like, oh, I don't want to deal with these.

But I think that saying, well, let me just look at what's

been introduced since the last time we

released this to users.

And that will hopefully be a pretty small list, and might

be something worth looking at.

OK.

So I've talked a little bit about what I've done.

Let me talk about some of the things I'm

trying to figure out.

What is the fruit distribution?

I've always described to people--

FindBugs looks for low-hanging fruit.

I've showed the detectors we look for.

They're all simple.

But I want more fruit.

Where do I go to get it?

Do we write more sophisticated analysis to do interprocedural

stuff that understands fields and so forth, do we go higher

up in the tree?

Or do we write more shallow but general bug detectors?

Or do we write

application-specific bug detectors?

It's really a question of, where are the useful bugs that

can be found with static analysis?

If I knew that having somebody slave for 12 months over doing

a very fancy context-sensitive pointer alias analysis scheme

would allow me to find 10 times as many important bugs,

then I might decide, oh, yeah, that would be a good

investment of time.

But it may turn out that doing all that sophistication only

gives you 20% more real bugs, in which case that's not a

good investment of time.

What kinds of bugs can be found by static analysis?

I would never have thought to look for infinite recursive

loops, unless some student came to me in office hours and

had that problem.

This is true for a lot of the things.

The thing about doing an integer division, converting

the result to a double, passing the result to

Math.ceiling.

I didn't just think, oh, that would be a good bug.

I actually saw that in code.

So it's relatively easy to measure, to do something about

false positives.

You can run your analyzer over a bunch of code, look at all

the reports, go through and say, which of

these are real errors?

What it's hard to get a handle on are, what are the types of

errors that could be detected by static analysis but aren't?

And it's often the case, as you look at these, you can

say, oh, I understand what it would take to catch this.

And you try it.

You find out, oh, a gazillion false positives, and then you

say, oh, I've got to tune it.

And sometimes you successfully tune it down to a good

detector, sometimes you don't.

And part of this is trying to turn bug

instances into bug patterns.

And one of the things I think we need to do, as far as

software testing and so forth, is we need to change our

development process so that we learn from our mistakes.

We need to evaluate bugs and see if they can be turned into

a general bug pattern.

A lot of the bugs that occur will occur elsewhere in a

large code base.

So for example the flaw that Josh

identified in binary search.

How many people know about this issue Josh found in

binary search?

OK, so this binary search implementation in Java, and in

a lot of textbooks, including the textbook Josh was taught

from when he was at CMU, have a bad implementation of binary

search, because it takes the lower bound and the upper

bound, which are of type int, adds them together, and

divides the result by two.

If you get an integer overflow when doing that addition, you

get a negative result.

And it turns out--

like, in Java, what you just do is you add them together,

and then you do an unsigned right shift, and

it works just fine.

And somebody, maybe somebody at Google, was doing a binary

search in an array with more than 2 to the 30th elements

and got hit by this.

But I know took that and turned it into a bug detector

in FindBugs, and we now look for it.

An example of trying to do this-- one of the things I do

is I get every build of the JDK, and they have a list of

what bugs were fixed.

And I look through them and say, OK, are any of these

things-- and a lot of them are like, oh, cleaned up Javadoc.

Yeah, I don't care.

But I look through and say, OK, is this something that

could be part of a bug pattern that I could try to fix?

So in build 89, they fixed a serialization bug in

ArrayBlockingQueue.

And basically, it took me five hours of work from start to

finish to write and tune a bug detector to find that problem.

And I actually found 17 other erroneous classes in the JDK

that all had the same problem.

So the problem was it was a class that was serializable

but it had a transient field.

A transient field is one that is not saved by the

serialization mechanism.

And it wasn't being reset when you deserialized an object by

either a readObject or readResolve method.

And so you first do that and you find out, oh, it's giving

too many false positives.

There's this whole tweaking process.

So for example, it turned out in some of them, it was a

field that was set to minus 1 in the

constructor, but it was transient.

And so when it was deserialized it was set to 0,

and that gave the wrong semantics.

It was a cached hashcode, but for some reason they decided

to have the value to represent an uninitialized uncomputed

hashcode to be minus 1.

And so, for example, if the transient field is set to a

non-default value in the constructor, then I have to

elevate the priority.

And this was able to give us those bugs.

And that was 17 defective classes, and I think it was 49

classes where we got the warning.

So more than 1/3 build errors.

How can we make it easy to write bug detectors?

Now this was something that, when we were starting to do

this research, we said, well, why don't we start by figuring

out nice high-level patterns?

But I was like, you know, I don't really know the scope of

what are the types of bugs to find?

Lets just write bugs, and we can use Java-- it's

Turing-complete--

to try out our bug detectors.

Now I think we've got enough of a scope that we can start

trying to figure out.

So like I said, I can do this for the JDK.

I don't really want to be doing it for the JDK.

I want Martin or somebody else to be doing it for the JDK.

And I don't want to be doing it for Google.

And if we get lots of developers doing this--

and some of them may be company-specific practices,

some of them may be general--

I think this will be very helpful.

So I think we have enough examples to start thinking

about this.

How can we make static analysis pervasive?

So, state-of-the-art static analysis has a lot to offer,

more than I think many people suspected.

What are the practical and cultural issues that need to

be surmounted to make it pervasive?

Now, I am quite aware that there are things about

FindBugs that probably suck, things that make it really

hard for people to use.

And one of the things I'm hoping to do is figure out,

what are the points of pain in trying to use this?

Unlike companies that have field engineers, I haven't

done as many trips to companies to

see how that works.

That's one of the things I'm doing here this summer.

One of them is false positive suppression.

People worry about, with these tools, false positives--

well, if you deliver no more than 50% false positives,

that's not too bad.

The thing is, nobody wants to look at the same false

positive more than once.

It's one thing to look at it once.

People will tolerate that if you don't have too high a

false positive rate.

But if you run a tool and you fix all the real bugs, then

all you have left are false positives.

And the next time you run the tool, maybe you've introduced

one error, but you still have 500 false positives that

weren't fixed from the last time, and now nobody wants to

use the tool.

What are the other points of pain?

OK.

So a couple quick announcements.

So I'm here all summer.

I'm taking refuge from the heat and humidity in

Washington DC, and I hope to develop some substantial

collaborations at Google.

I'm trying to get everything set up so I'll be able to work

with people, look at Google Code Base,

sit down with people.

This is partially not only to understand FindBugs and so

forth, but also as an academic--

I tell students how to develop code, how to get prepared.

I really feel like I need to take some time, immerse myself

in the real world so I'm not just talking out

of the ivory tower.

So I'll be here.

I want to really understand how software development,

software quality works at Google.

Send me email, and I'll be here and we'll set up visits.

AUDIENCE: By here, do you mean, living locally or

working at Google?

WILLIAM PUGH: I am living in Palo Alto.

And I hopefully will be visiting Google about three

days a week for the next six weeks.

AUDIENCE: Under what non-disclosure [INAUDIBLE]

WILLIAM PUGH: We're working that out.

But hopefully I will be under an NDA that will allow me to

look at Google code.

Hopefully we'll get that all figured out today.

OK.

And now let me just briefly tell you about

one other cool project.

And hopefully I may get to do a TechTalk on this later

during my visit.

This is the Marmoset project.

So the Marmoset project is a total rethinking of how

student programming projects are submitted and tested.

So it's designed to provide students, instructors, and

research with lots of feedback, including feedback

before the submission deadline.

We're also collecting large datasets of student efforts

and we're starting to learn lots of things about how

students learn to program.

And I don't have a slide on this, but I've got a couple

extra minutes.

So let me tell you the really cool feature about how

Marmoset works.

So traditionally, the way programming projects get

assessed for correctness is that the instructor publishes,

here's the project description.

Here are three sample test cases.

Go work on it.

And students work on it and they submit it electronically

somehow, and then there's a deadline.

And you give a tarball to the TA, who runs the whole thing

against a bunch of secret instructor test cases.

And a week later, maybe, he comes back with the results

and it's emailed to the students.

And the students don't get feedback from how they did on

the project until after they're already working on the

next project.

The instructors only really get feedback in terms of

questions in office hours and in lecture.

So what happens with Marmoset instead--

it works the same.

You post the project description, you post a couple

of sample test cases.

These are the public test cases.

Students go to work on the project.

They can submit whenever they want.

After they submit, they can go to a web server, where they

can see the results of compiling their program and

running it against the public test cases.

It shouldn't be a surprise, because they've got these, but

sometimes it turns up platform dependencies.

And then, if a student submission passes all of the

public test cases, then the students get an option, would

you like to release test your submission?

And if the student says, yes, I want to

perform a release test--

maybe this is the poker game project.

And it will say, OK, we ran the release tests

against your code.

There were nine tests.

You failed four of them.

The names of the first two tests you failed are

test_full_house and test_four_of_a_kind plus two

more tests where the Q&A team couldn't be bothered to write

down the details.

Now, students can think about this.

Oh, full house, yeah, I think I know what I did wrong.

They can go make a quick change and resubmit.

They can resubmit as often as they want, but performing a

release test requires a token.

Students are given some limited number of tokens, like

two or three, and they regenerate 24 hours after

they're used.

So students, within a 24-hour period, they only have a

limited number of times to get feedback

from the release tests.

So this encourages students to start programming early, which

is a good thing.

It gets them into that mode of, OK, it's the last day,

I've only got one more release token left.

Have I done everything I can do to ensure the quality of my

code before I'm willing to burn up my last token.

So it's both a throwback to the idea of releasing your

code-- either putting it on the website or handing it over

to the Q&A team.

You really want to do your own stuff.

Or, for us old-timers, and I'm barely an old-timer, it's

turning your card deck over the computer operator, knowing

that it's going to take three hours to get your answer back,

and you really want to desk-check your code, because

it's really embarrassing to turn in a deck with a stupid

error and have to wait for that long turnaround.

And the other thing is instructors get feedback.

We run all the tests as soon as stuff is submitted, so

instructors can see, at any time, how many students are

passing each test case.

And you can get all sorts of feedback from that.

But we also collect snapshots of student code from every

time they save their files.

And here's some of the data.

I know Google people like data, so this is reasonably

large data.

From four semesters of a CS 2 course, we had 147,000

snapshots of student effort, all the files from one student

working on a project.

We take each one of those snapshots and we run them

against all the test cases.

So we have over 2 million runs of student code.

And then we can run our static analyzer over 147,000 things.

And we can actually look at what exceptions occur.

So it turned out that, out of those 2 million runs--

actually I think these exceptions are per snapshot,

not per run.

So if a snapshot got a null pointer exception in multiple

test runs, it's only counted once here.

So the most common runtime exception was a

null pointer exception.

Surprisingly, the second most common was

a class cast exception.

Index out of bounds, array index out of

bounds, stack overflows.

A stack overflow is typically a recursive loop.

And so one of the things we were actually able to do with

FindBugs is we were able to check, where are we predicting

an infinite recursive loop?

Where are students encountering a stack overflow?

What are the places where the students are getting a stack

overflow, but we're not predicting an infinite

recursive loop?

And you can just eyeball that.

And in some cases, you can say, oh, well, this is mutual

recursion and our analysis doesn't detect that.

But in a couple cases, we were able to use that to improve

the analysis.

And actually, improving the analysis to catch more of the

student errors is what led us to improve it to find more

infinite recursive loops in the JDK.

So that's Marmoset.

And I'm done.

Questions?

AUDIENCE: How does FindBugs [INAUDIBLE]

interact with test coverage?

Is it your experience that code that's more [INAUDIBLE]

bugs?

WILLIAM PUGH: Yeah.

It's certainly the case that a lot of the errors we find

indicate code that has not been tested.

I mean, when you find a method that, if invoked, is

guaranteed to throw a stack overflow, obviously something

was lacking in the coverage.

And actually, in some ways, there's sort of

an interesting trade-off.

The nice thing about static analysis is you don't need to

have test cases.

And generally, in the stuff which has been tested well,

you're not going to find a lot of things that we look for.

One of the things which gets tricky

sometimes is library code.

What happens when you write a library?

Can you anticipate all the different uses of it?

So I think testing verses static analysis are very

complementary.

Yeah?

AUDIENCE: You mentioned a question about [INAUDIBLE]

fruit distribution.

It seems like the Marmoset project is going to allow you

to get some measure of that.

WILLIAM PUGH: Right.

So with FindBugs, we've been able to do lots of stuff in

terms of looking at open-source projects.

But generally, what we don't have for open-source projects

is, well, what are actually the errors in this stuff?

To the extent that they actually have test cases, you

find that if you want to pull a bunch of different versions

of Apache off the source and actually run them all, it's

really hard, because they all have all sorts of

dependencies.

Whereas here, we have something that allows us to

say, oh, we've got all these versions.

We've got these test cases that are designed to be

exhaustive tests of the functionality of the code.

And we can look--

OK, they made this two-line change and it caused a warning

to go away and at the same time, three test

cases started working.

Maybe there's a correlation there.

So yes, I think that is very helpful.

It is student code.

The projects are smaller.

We found more connection than we would have thought between

bugs students make and bugs we can find in production code.

But yes, that is one of the reasons we're very--

AUDIENCE: Do you have a feel for just seeing the tip of the

iceberg when you do FindBugs, or [INAUDIBLE]

WILLIAM PUGH: Yeah.

So a lot of what FindBugs finds are

really coding errors.

What the person had in their mind didn't get translated

correctly into the paper.

Or that they just misunderstood some API.

Whereas if you've got something like--

one of the projects was Sudoku.

And if they simply didn't understand the logic of how to

do Sudoku, well, they're doing some different game.

It's not Sudoku, it's something else.

And we're not going to find that with FindBugs because we

don't understand the domain.

So I'd say that a majority of errors--

probably not a huge majority, but I'm guessing a majority--

are domain-specific, if you look at all

the errors in code.

I don't really know.

Wild-ass guess.

Somewhere between 1/2 and 2/3 of all errors

require domain knowledge.

Total wild-ass guess.

Maybe someone else has a different opinion.

And you're not going to be able to find that with a tool

like FindBugs.

But the stuff that you can easily find--

yeah.

AUDIENCE: You talked about improving the language.

WILLIAM PUGH: Yeah.

AUDIENCE: You also complained that people weren't using

FindBugs now.

WILLIAM PUGH: That's starting to change.

AUDIENCE: And you've said that most of the stuff you're doing

is easy to do [INAUDIBLE].

How many of those things would you like to see

put into the compiler?

Not changing the language.

Just making additional checks for the compiler [INAUDIBLE]?

WILLIAM PUGH: The problem with putting something into the

compiler is you've really got to have an insanely low false

positive rate.

People really get cranky if the compiler won't compile

something that they know to be perfectly correct code.

So I think there are a couple of cases, like the overriding

a method except you've mis-capitalized the name.

That's something that, even if you think it should work, it

shouldn't, because it's going to confuse the hell out of the

person who comes after you.

And there are a number of these that should do.

I mean, should, for example, the JDK refuse to compile code

where there's a statement that, if executed, is

guaranteed to throw a null pointer exception?

I don't know.

I don't know.

AUDIENCE: I've seen a lot of test code intentionally trys

to throw out an exception, catches it, [UNINTELLIGIBLE]

WILLIAM PUGH: Well, and you know--

one of the things we look for is a checked cast that's

guaranteed to throw an error.

And so I look for that.

And I found a piece of code that said, hey, you're going

to get one of these errors.

And it says, OK, cast this to that, and there's a comment--

force a checked cast exception here.

Yeah, I think some of this stuff could be moved in.

In some ways, it gives me more freedom to innovate, to not be

part of javac.

But certainly I think there are a number of things that

would be worthwhile moving in.

It's not where I am probably going to be expending my

energy, though.

I'm probably more thinking for the next language.

And part of it is, Java--

I beat my head against changing Java in a few areas,

and I do it when there's a good payoff, and I don't think

that's a good payoff.

The next language I think [INAUDIBLE]

modern language is a hard language, and people are going

to make stupid mistakes.

That's the thing.

Either it's not possible, or if it was possible, it

wouldn't be desirable, to have a programming language in

which you couldn't make stupid mistakes.

And one of the things the job is done is it's limited the

impact of errors.

This is one of the things-- in C and C++, when you make a

stupid error, you can corrupt your memory in subtle ways

that can't be detected until far away.

And it's very hard.

Whereas in Java, a lot of stupid mistakes turn into

runtime exceptions.

And those are easier to catch, easier to deduce.

So I'd say that the biggest problem as far as where

FindBugs needs to be better aware, software quality needs

to be better, is going to be concurrency.

Suddenly everybody is going to be running on 2- and 4- and 8-

and 16-way processors, and all this code that worked fine

when you were running it on a single-processor machine is

going to start falling down.

I talked with Cliff Click, who works at Azul Systems, and he

says, every time they have some customer that comes in,

and they say, well, let's see how our software scales on

your system.

And they run it on their 384 processor system, and the

software immediately falls over.

It's like, oh, you've got a crappy machine.

And, like--

no.

It's just like, OK, let's turn on a couple of switches to

oversynchronize everything, and it runs just fine.

So it's, ah, you have a synchronization

problem in your code.

So I think trying to deal with the fact that, within 10

years, 8-way and 16-way laptops might

be headed our way--

I think trying to figure out that and figure out the right

way to teach concurrency and write frameworks for

concurrency-- that's a big problem.

And a lot of these other things.

I haven't talked about all the problems with web frameworks

and so forth, and all the types of errors and so forth

that you can make there.

Other questions?

OK, well, I'll be here for lunch.

If anybody wants to join us for lunch, come

on up to the front.

Thank you.

The Description of Using Static Analysis For Software Defect Detection