NICO WEBER: Welcome to the last time slot at Google I/O.
I'm Nico.
I'm glad you guys could make it.
So did you have a good time so far?
Any favorite sessions so far?
[AUDIENCE SHOUTING ANSWERS]
NICO WEBER: Yeah I saw that one.
I really liked that.
So you might have noticed it's kind of hard to
get tickets for I/O.
One of the easiest ways for me to get in was to give a talk.
So that's why I'm here.
And it's very different to watch all the talks when you
know you'll be talking later.
So one thing I've been paying attention to a lot is what the
presenters do with their hands, knowing that I have to
do something with my hands.
So all the professional presenters did like this.
I guess they read some book on body language and read that
this means open and relaxed.
And all the more engineering types of guys were like this.
And then they did a stand in between.
And I guess my conclusion is, talks are more fun if you
don't pay attention to the hands of the
presenter so much.
So don't look at my hands.
I'm going to be talking a little bit about 3D graphics
on Android.
It Earlier this year I ported Google Body to Android 3.0.
And I'll just share my experience there, I guess.
So who here has Google Body?
Quite a few.
So for those who haven't, if you just do a web search for
Google Body and pick the first hit on most search engines,
you'll go to this thing here.
And it's basically a human anatomy app.
Users call it Google Earth for the body.
So there's a 3D app of the human body.
You can zoom in, pan around.
There's a transparency slide on the left here, where you
can look at skeleton and whatnot.
There's a search box up here where you
can search for stuff.
And you might not have known that the liver has kind of an
interesting 3D structure from the back.
You can click on things to learn how they're called.
So this is the colon.
And that's basically Google Body.
For April 1 we had Google Cow, which was kind of popular.
So it will be a little while until it loads.
So that's the same thing for a cow.
It was pretty popular so we left it in the app.
So that's Google Body.
So Google Body is obviously a web app.
It lives in the browser.
And for 3D display, it uses a technology that's pretty
that's called WebGL, which was also demoed in the keynote
this morning and there were a few talks on that.
So there's no plugins or anything needed for that.
You just need a new browser.
So for example, Chrome supports WebGL.
Firefox 4 does.
WebKit, which is the Safari prerelease version, I guess,
kind of, supports WebGL.
There's an Opera 11 preview that
supports WebGL on Windows.
But sadly the Android browser does not support WebGL yet.
Google Body is a 20% project by about
five people at Google.
So Google has this concept of 20% time.
One day of the week you can work on whatever you
want, if you want.
And they were looking for someone to make Google Body
happen on Android.
So I figured, yeah that sounds like fun.
I'll do that.
And let me show you how is looks.
So Google Body for tablets is available in the market today.
So if you're looking for something to do with your
tablets, you can download this.
And it's basically the same thing.
So there's a 3D view of a model that
you can move around.
You can zoom in, zoom out.
Look at different layers up here.
You have a search box where you can--
I don't know-- search for skull.
Oh, it's right there.
You can tap on things.
So these things here are called teeth.
And there's a fun bug where I don't do modular
interpolations.
So if you spin the model a bit and then click on the reset
view button it spins a bit too often.
So that basically Google Body for Android.
The cow is not in there yet.
But it'll come eventually.
So that's what I did.
And currently this is tablets only and currently I am
working on getting this to work on phones.
And I'd like just to share my experience writing this a
little bit.
So Google Body was released December 2010.
I did the port after that.
So they send out a mail saying, anyone interested in
porting this to Android?
And I was like, yeah, if nobody else stands up.
Sure I'll do it.
And then they told me, awesome.
And we want this for tablets and you have two weeks and go.
So my point is, I don't have a ton of Android experience.
So I'm not on the Android team.
What I'm saying is my personal opinion; not an official
recommendation.
It might be factually wrong.
Parts are, probably.
And what I'm mostly focusing on is doing
3D graphics on Android.
I kind assume that you are somewhat
familiar with Android.
So who here knows what an activity is?
Everyone?
Awesome.
Who here has used OpenGL before in any?
Also most people.
Awesome.
Who here has done OpenGL on Android?
OK, so not as many.
That's perfect.
So I think this talk is perfect for you if you have
some experience with Android, some experience with OpenGL,
but not so much with the combination.
And if you are completely new to Android, I gave a version
of this talk at the Game Developers Conference
earlier this year.
And if you just do a web search for GDC 2011 Android
OpenGL, you'll find this page, which has a slightly more
basic version of this talk with uglier slides.
So Google Body for Android is a native Java app and it uses
OpenGL ES 2.0 for the 3D display.
So let's see what I'll be talking about.
So I'll very quickly tell you what OpenGL ES 2.0 is.
It's actually faster than saying the whole
word, OpenGL ES 2.0.
And then I'll give you a very, very rough
mental model of GPUs.
Tell you a few pitfalls with textures.
A few best practices and pitfalls with geometry, that
is Vertex Buffer Objects.
Then I'll tell you quickly how to quickly get data into
Vertex Buffer--
into ByteBuffers, which you need to
upload them to the GPU.
And then I'll say a few words about performance tweaks.
So OpenGL ES 2.0.
So I guess everyone here knows OpenGL.
It's looks like this, right?
Looks familiar to anyone?
Awesome.
So OpenGL is basically the 3D API.
There are implementations on Windows,
MacOS, Linux, many phones.
It's been around forever, so it's very versatile.
As I said, it's been around for a long
time, 20 years I think.
And it has accumulated some crap during that time.
And they are currently cleaning that up, but by the
time they wanted to do 3D on phones, OpenGL was kind of
messy, so they decided to release mostly a subset.
OpenGL for Embedded Systems. That's what the ES stands for.
And OpenGL ES is basically OpenGL with fewer functions.
So they got rid of glbegin and many other things.
And there are two versions of OpenGL ES: OpenGL ES 1
corresponds to OpenGL 1, more or less.
And it has a fixed function type line.
So that means every model you draw will do vertex transform
and rasterization and some predefined lighting
functions and so on.
And there's OpenGL ES 2, which roughly corresponds to OpenGL
2, which has fully programmable vertex and
fragment shaders and all that.
And Android supports both OpenGL ES1 and OpenGL ES 2.
If you do a web search for Android OpenGL you'll find
some official Android that tells you-- that proudly
proclaims--
Android supports OpenGL ES 1.
And that's factually not wrong, I guess, but it also
supports OpenGL ES 2 and that's what you
want to use in practice.
And I think they are updating their documentation there but
they are not there yet.
And just as an aside, WebGL is basically binding
for OpenGL ES 2.0.
So in theory, mobile browsers could support WebGL in the
future but they don't yet.
And WebGL is very exciting but nothing that I'll talk about
in this talk.
Because Body for Android doesn't use it.
So as I said, I'm currently porting
Body 2 to mobile phones.
So I kind of need to decide which Android versions I want
to support.
If you're just writing a tablet app, you just support
Android 3.0 That's easy.
But for phones you need to take a look at this chart,
which is at developer.android.com/blog.
And Android 1.5 and 1.6 are less than--
I think are about 5% of the market share these days.
So I don't think it's really worth supporting.
Android 2.1 is, I think, about 24%.
Which is pretty sizable.
Android 2.2 is at 65-ish% and Android 2.3 is 4%.
And that adds up to about 100, I hope.
So Android 2.1 is the first version of Android that
supports OpenGL ES2.
But only in the native code.
So they are no Java bindings or anything like that.
So if you want to do OpenGL ES 2.0 and support Android 2.1
you need to add your own driver bindings, which is not
hard but annoying.
And I personally haven't used Android 2.1 at all yet.
So I won't say a lot about Android 2.1 or anything.
Android 2.2 is the first version that adds Java
bindings for OpenGL ES2.
So I think that's a reasonable lower bound, at least for the
first iteration of your project to target.
It also added support for compressed textures, or added
API support for compressed textures.
And many other cool things.
And finally, Android 2.3.
From a Java OpenGL perspective,
added only bug fixes.
If you're writing native code 2.3 added a lot of cool stuff.
But for just graphics applications like Google Body,
I think Java is fast enough.
You're just pushing data through the
graphics card anyway.
And Java is kind of like, the better paved way to write
Android applications.
So Google Body is written in Java.
And my plan is to port it to 2.3 first and then if stuff
works there reasonably well then get it working on 2.2 and
maybe eventually onto 1.
I think no new project should use OpenGL ES1.
I think 90% of all phones support OpenGL ES2.0.
All new phones support it.
And if you feel that you really want to support the
last 10% that don't support the OpenGL ES2 these phones
are also pretty slow.
Weak CPUs, weak RAM.
So you probably are writing a second, lo-res version of the
app anyway.
So I think every new app should go OpenGL ES2.
So let's take a little look at how you actually do this.
So the class that does OpenGL rendering in Android is
GLSurfaceView.
And actually it's pretty easy to use.
In your activity and your onCreate method, you just
create a GLSurfaceView.
And then you say, setEGLContentClientVersion to
inform the view that you want to use ES 2.0, which has the
programmable shaders and all that.
And then you set a render object, which is your own
class that implements
GLSurfaceView.render interface.
We'll get to that in a second.
And then you also forward on parse and on resume to the
view so that when your application goes in the
background it stops rendering and that stuff.
So that's all you have to do in your activity.
Then in your manifest you just add uses-feature ES Version
2.0, require true.
And that way the market knows that the application requires
OpenGL ES2 and it will only show it to phones
that support that.
And finally you need to write your own little renderer.
So if you're using OpenGL ES2 you call static
functions on GLES20.
So I recommend doing an import static for
everything in there.
And then you can just so normal OpenGL calls like you
used to do that in other languages.
So you don't have to do GLES20 dot GLClear or whatever.
You can just write GLClear.
And this interface has three methods.
One is on surface created, which is called when your
context is first created and then a couple more times.
We'll get to that in a second.
And there's onDrawFrame, which is called every time you
should render.
By default this is called 60 times per second [? GLE. ?]
But you can also tell the system to only draw
your view on demand.
And there's onSurfaceChange, which is not very interesting
in practice.
So I'd like to do a tiny demo of how this looks in practice.
Coworkers inform me that it's too risky to go switch back to
Eclipse for demo.
So I'll just do this right on my slide.
So onSurfaceCreated will do GLClear, color, and a line--
some redder shade of gray.
And I'll also call the view to not draw at 60 frames per
second but only when needed.
And in here we'll just clear the background.
If I click this run button, hopefully the code will be
copied into some Java file in the background and then
uploaded to the tablet.
So it still says compiling.
So it says uploading.
Let's switch to the other box.
Now I just opened the IO OpenGL app.
I switched slightly too slowly to see it starting.
And that hardware makes an accelerated flashlight app.
So, mission accomplished.
No, at just one frame because it's when dirty.
So that's the OpenGL Hello world, I guess.
And that's about 20 lines.
Not too bad.
So one cool thing that GLSurfaceView gives you is
that it creates an educated renderer thread for you.
All the GL stuff will execute on the renderer thread.
Which means that if your UI thread is overloaded, you
still have smooth rendering.
And if your rendering is kind of slow, your app is still
responsive to tap events and all that.
So one thing that you need to do every now and then is to
relay an event from the UI thread to the OpenGL thread.
Because UI land and GL land are kind of single threaded,
so every OpenGL call has to be done on the GL thread.
Every UI call has to be done on the UI thread.
So for example, on click, I guess that should be on touch
or something like that.
When on touch is called on your UI thread, you might want
to tell the renderer to draw--
I don't know-- a particle system at the touch location
or something.
So you need to somehow relay the event from the UI thread
to the renderer.
So the way you do this, you just call
.queueEvent on the GL view.
And pass a Runnable.
And then this will be executed on the GL thread.
So if you want to, for example, access item in here,
then Java has this mutation that int has to be final.
So you just put a final in there.
And then you can just use item in here on the other thread.
One little pitfall there is, if you want use a class that's
passed in, for example on a touch event, and you just do
final touch event event.
And then use event down here.
And then by the time the GL thread executes the runnable,
the UI thread has already reused the
touch event up here.
Changed it internally and passed it on
to a different view.
Because the UI thread reuses objects so it doesn't allocate
memory all that much.
So by the time your renderer looks at the touch event
object, the data is all wrong for you to use.
So you should make a copy of all parameters and then have a
final local variable and use that in the runnable.
The other direction from the renderer thread to the UI
thread isn't needed all that often.
Just for completeness you can do activity dot run on UI
thread and pass in the runnable and then this is
executed in the UI thread.
In Body I used this, for example, when you would touch
muscles, I need to see what muscle was tapped.
And so I basically render the polygons in some made up
colors and then I read the screen and see what color was
below the finger and then have them mapping colors to objects
and then tell the UI thread, this thing was touched.
Use GLSurfaceView.
My advice.
That makes happy Android for that, which is, I guess, kind
of like a gold star.
It's very easy to use.
It gives you a dedicated renderer thread for free and
it's very well tested.
So some people on the internet recommend that you run your
own little surface holder thing.
For example, Chris Pruett, who talked here earlier today, has
an open source game called Replica island and he has his
own GlSurfaceView fork.
And he has one screen full of comments about something that
went wrong.
Like, a few graphics drivers misbehaved under very specific
circumstances and it took him two weeks to track that down.
So don't be Chris Pruett.
Use GLSurfaceView.
A little word of warning though.
GLSurfaceView loses its OpenGL context very often.
So every time you call onPause it'll forget all OpenGL state,
like uploaded pictures and so on.
It'll call onCreate on your renderer object and then you
need to re-upload all your pictures and so on and that
can be slow.
So make that fast. If you're talking 3.0 or later you can
call types setPreserveEGLContextOnPause.
But if your device supports only one OpenGL context and
the user switches to another app that uses OpenGL and he
switches back to your app then your stuff is gone anyway.
So make loading your data is the lesson here, I guess.
Alright so that's basically the OpenGL Hello world.
Here's a very high level picture about how GPUs work.
So up there there's the CPU, which executes your Java code.
And then there's this OpenGL API, where all the data that
needs to be rendered needs to be pushed through.
And then the data ends up in graphics memory here.
And then the GPU reads vertically to there, runs
vertex shaders, rasterizes all the varyings, sends them
through the fragment processor, which runs all your
fragment shaders.
And that's written to the frame buffer.
So as I said, this is very simplistic.
There's no planning stage in here.
On some GPUs, vertex processors and fragment
processors are executed on the same
silicon and thus is shared.
But basically my point is, you want to send not a lot of data
over this bus because that's very slow.
And also many GPUs cache vertex data pre-transform,
post-transform, they cache textures.
So to make these cache efficiently, you also want to
keep your data very small.
and if it And that's basically how GPUs work.
Now you know.
My point basically is, don't send lots of data to the GPU
on every frame.
And if you do, then don't do it in many small calls.
Just do big, bursty calls.
So here's a piece of OpenGL 101 that I
think everyone knows.
If you do GLTexImage2D with a texture data at the end to
basically set the current texture and then you draw your
model, then this will upload the current
texture every frame.
And that's expensive, so don't do that.
Instead, in your own surface creator method, you create an
identifier for the texture, which is just an int.
You tell OpenGL make this, make texture--
I don't know-- number five current.
Then you upload the data once into texture number five.
And then in your onDraw method you just find
texture number five once.
And then you call it raw model.
So everybody knows that.
I'm saying this because nearly the same is true for vertex
buffer objects later and it's not as well known there.
You should also use texture compression.
So ETC, which is short for Ericsson Texture Compression.
So what the heading says is use Ericcson Texture
Compression for RGB texture compression.
An extension to OpenGL ES2 that's supported on virtually
all devices out there.
Or on all devices that I know of.
And if you use ETC even every pixel need only four bits,
effectively.
So that's compared to 16 bits per pixel.
That's a 75% memory win.
And sadly, iPhone, this isn't documented very well.
So I didn't know about this.
So I launched Google Body without doing this.
Then I read about this, enabled texture compression
and that saved, like, 10 megabytes of Ram, which is
quite a bit.
So there's this binary ETC 1 tool in the Android SDK tools
folder that I didn't know about.
So when I used this the first time I did a web search for
ETC 1 compression and I found some binary on some Erikson
website that ran only on Windows and concluded the
source code didn't build on MacOS so I patched
that and used this.
Turns out there's a binary in the Android SDK.
It's just nobody tells you.
Nobody told me, at least. And if you then add
supports-gl-texture to your manifest, then the market
knows about this.
And the Android is happy again.
Hooray.
And it's very easy to load textures.
So on your I/O thread you just do ETC1utilcreatetexture and
pass in input stream.
And then on your GL thread, this loads the texture into
memory and then on your GL thread you can upload this.
Obviously you never want to do I/O on the UI thread or the GL
thread because I/O can be unpredictable and might just
take 100 milliseconds and you don't want your UI or your
rendering to stutter, so you should always have a
dedicated UI thread.
One small word of warning.
If the width or the height is not a multiple of four, then
the PowerVR GPUs just display noise for your texture.
So for example, PowerVR is used on
the Nexus S for example.
In practice that's not a huge problem because most textures
are power of two sized anyway.
And most powers of two are also multiples of four.
And for heads-up displays, you can make your texture sizes a
multiple of four.
Something to keep in mind.
So now we know how to upload textures.
Now the same for geometry.
So same thing as for textures, if you do
glvertexAttribPointer and pass the attrib data in the last
parameter here, then this uploads all the
vertex data to the GPU.
And if you do this on every frame, then you are copying
lots of data around.
So don't do that.
And this is, for some reason, less well known.
The OpenGL ES2.0 example and Android SDK does this.
So don't look at that example.
I guess the excuse is OpenGL ES1 only supported this way
and they haven't updated this since then, I guess.
So instead, just like with textures, you create a numeric
ID then you bind this.
So Array_BUFFER is used for attributator like positions,
normal [UNINTELLIGIBLE]
coordinates.
And then you do GL Buffer data with your data.
And the same for the indices.
Then at run time you just bind the array buffer and the
element array buffer.
And you pass zero for the last parameter instead of data.
And that's way faster.
So it's quite faster.
So two things to keep in mind here.
One is you need to have this attrib data and this index
data somehow.
And these need to be direct ByteBuffers, which I'll talk
about in a second.
And then, as I said, in call GL Vertex attrib pointer with
a zero back here and GL draw elements
with a zero back here.
And if you run this on FroYo, your application will crash.
And the reason for that is that they forgot to add the
bindings for these two method calls.
So it compiles just fine.
But at run time, when Android tries to call the c method
that backs this OpenGL draw, it doesn't find anything.
Which is a bit annoying but it's pretty easy to fix.
You basically need to add your own bindings
for these two functions.
So if you're familiar with the NDK, that's pretty easy and if
not then I guess it's kind of magic.
You just copy paste and you're done.
Who here has used the NDK?
Not many people.
OK, basically what you do is you create a normal Java class
and then you put your method there.
But instead of putting implementation there you put
native in front.
And this tells Java that this method exists.
It takes these parameters and that it should look somewhere
else for the implementation.
It's not implemented in Java.
And the same for the other function that's missing.
And then you do a system dot load library down here in the
static initializer.
And then this is something you write with the NDK.
So you create a jni subfolder in your project and you paste
in this bit of code.
So there's a function with this weird naming convention--
Java_com_example _io_GLE20Fix_glDrawElements.
And Java uses this function name to
associate it with a class.
So it starts with Java then it has the package name,
com.example.io.
Then it has the class name and then it has the method name.
In here you put the implementation of your method.
So the first two parameters of jnimethod are always
jnin and jclass c.
And then the rest are the parameters from the functions.
So this is int, int, int, int just like here,
int, int, int, int.
Only with a j in front.
So JNI is Java Native Interface, which is basically
the technology you use to call C from Java.
And then we just call the C function
for our GL Draw elements.
And exactly the same for vertex attrib pointer.
And then you copy this android.mk file, put that also
in your JNI folder.
Go into that folder and then you call NDK build from the
NDK and this will create some library.
And then you do a clean build in Eclipse, which will pick up
that library and copy it into your APK and then you can call
GLE20Fix_glDrawElements and then that works on FroYo.
So that's that.
Let me say a few words about filling ByteBuffers.
So ByteBuffers are the things that you pass through GL
Buffer data.
And it's basically a block of raw C memory.
So if you're not familiar with--
Who here knows C?
Are the same people who used the NDK, roughly.
No surprise.
So Java obviously has managed memory.
C doesn't.
And in some JVMs, these memories live
in different areas.
And OpenGL needs to have the raw C memory, for some reason.
You just need to know you need to use direct
ByteBuffers for that.
And it turns out, doing element-wise access on these
is pretty slow.
So you get a ByteBuffer by doing
ByteBuffer.allocatedirect and then some size.
And then if you want to load data from a resource into a
direct ByteBuffer, don't just get the input stream and
element-wise put stuff.
Basically don't read one byte from the input stream and put
it into the direct ByteBuffer.
This is very slow for some reason.
Behind the scenes this does several method calls.
One JNI hop and so on.
It's much better to do this in blocks.
So in Body I think I used four kilobyte code blocks.
And this sped up loading by, I think, 8 seconds.
So it's still a bit slow, but it's done in in parallel so
that's fine.
And you can do even better than that if you are willing
to make some compromises.
So as you might know, APK files are just Zip files.
And if you give your resources some magic extensions, your
resources won't be compressed in the zip file.
They will just be an uncompressed part of the zip
file somewhere.
So for example, PNGs and JPGs are compressed already, so
they aren't recompressed again.
And also the extension, JET, is one of these magic
extensions.
I have no idea what file format this actually is.
But if I want to have a resource that's not compressed
I call it dot JET and put it in my response folder and then
it's not compressed.
And the cool thing about uncompressed resourced is that
they are basically just a chunk of your APK file.
And you can get a file handle through that.
You begin to get a sense that openfd, which gives you a set
file descriptor from which you can get a file input stream
instead of just an input stream.
And from a file input stream you can then get a channel and
channel you can mmap.
And mmap returns a MappedByteBuffer and
MappedByteBuffers are always direct.
So in this case no conversions at all have to be done.
You can just use this and pass this through GL buffer data.
And this is another 10x or so faster than
the previous thing.
So if you're willing to not to compress your resources you
can have really, rally fast loading this way.
So a small word of warning.
ByteBuffer dot allocateDirect allocates more memory than you
tell it to.
So if you just do a tiny test program that does ByteBuffer
dot allocateDirect with 15 megabytes and then look at
Lock Add and Lock Add will tell you I paid
to allocate 60 megabytes.
So it overallocates by a factor of four.
Which is an Android bug that's being fixed, I think.
But not yet.
So keep your buffers small, I guess.
In Google Body if you look at the market page, there are two
one-star comments that tell you this app is crap.
It's crashes all the time.
And that was because of this bug.
Basically when Body was loading and people pressed on
the screen, a lot of time.
So as I said, to do touch detection i basically rendered
the whole scene into a back buffer.
And I created an off-screen buffer for the whole screen,
which is about a million pixels.
And then two bytes per pixel for color, two bytes per pixel
for that buffer.
So that's about four megabytes.
With over allocation by 4x that's 15 megabytes.
And if loadings going on in parallel,
that's too much memory.
So Body crashed with out of memory.
And I fixed that by not rendering the whole screen
into back buffer but only the 20x20 pixels
around the touch event.
So just something to keep in mind.
And one thing that I also learned is that, if you don't
have many users and you get two one-star ratings, that
really hurts.
I used to have a 4.5 average and then then it went down.
Tough times.
Another pitfall: compressed files can be,
at most, one megabyte.
And uncompressed on Android 2.2.
And the reason that is, I guess, is because the Android
guys have a static buffer that's one megabyte that they
use to uncompress in.
And if the uncompressed size is larger than that they say,
sorry you can't do that.
So the things you can do there are, split your files into one
megabytes chunks.
Which kind of sucks, so I wouldn't do that.
Or you can basically use uncompressed resources.
And then if you really need the compression you can
compress them yourself and uncompress them yourself and
you can be smarter than the Android guys and use.
I hope everybody knows how to write a decompressor.
Or how to use zlib, which does the decompression for you but
don't have a static max size buffer.
So they fixed that in 2.3.
And that's that about ByteBuffer.
So that's already our last section.
We're doing fantastic on time.
So I'd like to say a few words about performance here.
The first word is measure.
So if you're trying to do performance improvements,
always measure if they actually help and if they
don't then don't do them.
And I have a little demo for that.
About a little pitfall, I guess, when you're measuring
performance, that I found.
So this little program here, basically just clears the
color in the depth buffer seven times per frame.
Which is obviously not a very useful thing to do.
But it's interesting for measuring performance.
As you might know, tablets are fill rate limited, and this
can give you an idea of how much fill rate you can get and
the best case.
So it turns out seven clear screens is the upper bound you
can do to still get 60 frames.
So if you draw every pixel seven times per frame, you
probably won't get 60 frames per second.
And that's with the cheapest filling possible, right?
Normally you'd also do some geometry
transforms and whatnot.
So I was interested in finding out what this number here is.
So I wrote this program.
And let's run this.
So what this will do, it will, again, compile the thing and
upload it to the device.
The device will measure how fast it's drawing and send
that back to the laptop and it'll hopefully
show up here on screen.
And for demonstration purposes, the app apparently
measures the frame time every frame and sesnd it.
So normally you'd want to measure for the last second
and display an average for the last second.
But if you do this for every frame, you'll
see a curious thing.
Every frame either takes exactly 1/60 of a second or
1/30 of a second.
So that oscillates between 60 frames per second and 30
frames per second.
Or if you think a millisecond per frame is better, either 16
or 32 milliseconds per frame.
And I'm not sure why that is, exactly.
But my theory is that the, oh--
gold star!
Someone suggests it's the Vsync.
So Vsync is what the old tube monitors use, I guess.
So I guess there's some kind of double buffering going on
and the compositor that draws the Android interface
basically only wants to render at 60 hertz.
And if your frame takes just a millisecond longer than 60
milliseconds, then you have to wait for the next time Android
allows you to paint.
And this makes it kind of hard to do performance
measurement, right?
Because if you're one millisecond too slow then you
pay another 60 millisecond a seconds for your frame.
And that makes it hard to evaluate if any rendering
changes actually value performance.
And as it turns out there's some hack that happens to undo
this effect somehow.
So I guess it somehow enables triple buffering, but I don't
know what's going on there exactly.
I stumbled upon this.
So this hack is done by this function, which I'll show on
the next slide.
So if you do this call here.
And it's compiling again, uploading again.
And now you see that this is a pretty constant function.
Just a little bit about over 60 milliseconds, which caused
this jiggering.
So since I don't really know what this function does up
there, I wouldn't recommend using it in your shipping
application.
But it's pretty useful for doing performance
measurements, right?
So I guess double buffering is what's causing this, somehow.
But who knows?
If you call egl--
you need to call some function that's not
exposed through Java.
So you need JNI again.
All this code is on some Google code site and I'll post
the link at the end so you can play with this at home.
So if you call eglSurfaceAttrib Swap Behavior
preserved, then this somehow magically disables something,
or enables something, that allows you to do better
performance measurements.
If you do a web search for swap behavior, then I think
there's one page on this.
And this page tells you never use GL buffer preserve because
it makes things slow.
And I guess that's true.
But on some hardware it allows you to do useful time
measurements.
So this is on the Tegra 2 on tablets.
I guess also on your Samsung that you got
also used Tegra 2.
If you run this on a [UNINTELLIGIBLE]
it doesn't support this attribute and just crashed.
So it's very dangerous but useful for measurement.
So measure your stuff.
Now that we know how to measure, let's see how we can
improve performance.
So here are the basics.
You always want you vertex buffer objects.
So don't upload your vertex data every frame.
Instead upload them into a VBO and then only upload the
integer into OpenGL.
Always use index geometry.
So as most of you will know, when you render two triangles
that are right next to each other, you basically first
send these three vertices to the GPU and
then these three vertices.
And if you send the four vertices, then you're
basically sending this and this vertex twice.
And that's expensive.
So in practice you usually have only send indexes.
You say, draw a triangle with vertex one, two, three and
then one, three, four.
And that way you only need to transfer the index twice,
which is almost always a win.
So do that.
OpenGL gives you the flexibility to either, order
your vertices by basically have one chunk of memory,
where all the vertex positions are and then another chunk of
memory, where all the normals are, another chunk where all
the texture coordinates are.
But don't do that.
You should always keep one vertex in a small, contained
element of memory.
So you want to have vertex position right next to normal
or texture coordinate.
And then, as I said, there aren't many caches
on some of the GPUs.
So you want to keep your attributes small.
So for normals you can usually get away with just assigned
u8, so assigned byte is usually enough
resolution for normal.
For texture coordinates, you might get
away with half loads.
So half loads are not officially supported by ES
2.0, but like ETC textures they are supported virtually
everywhere.
So think about doing this.
Also, since your code will run on different devices, with
different frame rates, you should make animation
time-based not frame rate-based.
So if you have some animation and some device renders your
app at 30 frames and the next at 60 frames, the animation
should take the same length and not be twice as fast just
because the device renders twice as fast.
So that's the basics, basically.
And now once you've written your app and it's kind of
slow, the first thing you do is you set the glViewport to a
1x1 pixel thingee.
And then either frame rate goes up or it doesn't.
If it does go up, then you are either fragment processor
bound or texture fetch bound.
And you differentiate that by making all your textures
really small.
And if stuff gets--
if that doesn't help then you are fragment processor bound.
And if that helps, you're texture bound.
So if you're fragment processor bound, there are a
few things you can do.
You can work from the fragment shader to the vertex shader.
In my experience, fragment shaders on mobile devices have
to be, like, one or two lines.
So you can't do lots of fancy effect there.
If you want to do very fancy lighting you can basically
pre-compute all your lighting formulas for the resizing the
texture and then do a texture look up instead of doing your
own calculations.
You shouldn't draw backfacing.
Strings that face the other way.
And you shouldn't use discard in your fragment shaders.
But the main point is do less work in you
fragment shaders here.
If you're texture fetch bound, if you're not using texture
compression yet, you should.
One thing that also helps is to use mipmaps because of
cache coherency.
And of course use smaller textures.
One thing I forgot to mention on the ETC slide, on the
texture compression slide is that ETC doesn't support an
alpha channel.
So if you have textures that use an alpha channel, then
there's a not a single compressed texture format that
works on all devices.
So in that case, you probably have to download the right
compressed textures on first run, depending
on the device type.
Or if you don't have many alpha textures, not use
compression.
But if you're running into this problem and not all your
textures are compressed then try that first.
So if you're not fragment processor bound, you're
probably vertex processor bound.
So if using a very small viewport doesn't really help
you, you're probably vertex processor bound.
In that case, use fewer small attributes.
So try using assigned bytes for your normals, and so on.
You can play with the position framework.
Position keyword in OpenGL ES.
You can do, instead of doing lighting for vertex, instead
of transforming the light vector into model space at
every vertex, you can, instead transform the light vector
once and then read the transformed light vector.
You can use level of detail.
And you can call objects that are outside of the viewport.
So that's all the--
I guess, pretty normal--
performance stuff that's also true on normal devices.
Finally if you are CPU bound, then use less CPU.
So one thing that's expensive, can be expensive, is if you
allocate memory a lot in your inner loops.
In that case, reuse memory.
Batch draw calls.
So don't have a for loop in your drawMethod type that
basically tells the GPU, draw this triangle, now this, now
this, now this.
And loop for all triangles.
Instead have one call that fetches all triangles.
And if all else fails, you can look at the NDK and try to
write native code for your time-critical functions.
In my experience, that doesn't help all that much.
And that's that, I think.
So thanks for listening.
Whoa.
Watch me type my password.
So code, slides and so on are available at this website.
If you do a web search for io 2011 OpenGL Android
it might show up.
So the project used to be hidden earlier today.
I don't know if it's visible now.
So we have these feedback links that are completely
impossible to pronounce.
So, goo.gl/TUMU4 if you want to tell me anything.
And that's that.
And I'll download Body for Android and play with it a
little bit.
So do we have any questions?
[APPLAUSE]
AUDIENCE: Do you know how to do OpenGL to a widget?
NICO WEBER: I don't.
I haven't looked at the widget stuff at all, yet.
AUDIENCE: Aside from using compressed textures, how can I
speed up the process of reloading my textures when my
surface is recreated?
NICO WEBER: How do you do the reading?
Do you just use the ETC1 text [UNINTELLIGIBLE]
to read the texture, or?
AUDIENCE: I'm not using [UNINTELLIGIBLE].
I'm writing for older versions of Android.
NICO WEBER: So one thing that I think might work, which I
want to do for Body but haven't done yet , is
basically you read all your texture data applications data
once and then you keep them in memory cache ready for upload.
AUDIENCE: A memory cache?
NICO WEBER: Yeah, basically.
You keep them around so just upload them immediately.
And if your activity is [? on low ?]
memory is called you drop these cache.
And then basically you have them in memory already and you
don't need to reload them.
That's something I would try.
AUDIENCE: I'd like to get some detail on that.
NICO WEBER: OK.
Maybe later.
Yes?
AUDIENCE: Hi.
Do you have any issues with transparency?
Because I know that it looks like Body makes pretty heavy
use of showing some kind of opaque model of
a translucent shell.
And in GL that can be tricky to get order right.
NICO WEBER: Yeah.
So that's a known deficiency with OpenGL.
So Body, I think, just doesn't care that much.
So it doesn't look perfect.
But it looks good enough, I think.
So basically, one thing you can use so that-- the usual
way to do this is to draw your non-transparent stuff first,
and then basically sort your transparent [UNINTELLIGIBLE]
on CPU and draw them back to front.
So that's slow, because you need sort stuff.
There's this depth peeling technique by Cass Everitt,
that means you need to render the C [UNINTELLIGIBLE]
for that.
So I don't think there's a good general
answer to that question.
You need to see what works for your app.
In Body, I just don't really care at the moment.
AUDIENCE: So what did you do for Body?
Did you just--
NICO WEBER: I just say GL bend mode one source of--
AUDIENCE: But you drew the opaque part and then just--
NICO WEBER: No, I just draw everything.
So Body basically has these layers.
There's organs, skeleton and so on.
And I draw the inner layer first and the outer layer in
transparent doing the [UNINTELLIGIBLE].
But per layer I just say, transparency
on and do your thing.
And I think I draw the opaque things first.
AUDIENCE: Cool.
NICO WEBER: Yes?
AUDIENCE: So have you considered using something
like the [?
SP3s so you can get the transparency right?
NICO WEBER: Yeah, I have considered it.
But it doesn't seem like the most critical thing I should
be working on right now.
So as I said--
AUDIENCE: Right, right.
As a 20% thing.
Your spare time, other than sleeping.
NICO WEBER: Well it's my Friday, basically.
So I thought about that, but I haven't done yet.
AUDIENCE: But I guess the question is, do you see
problems in trying to take that approach
with Java on GL 2.0?
Are you going to get hung up on computation?
Are you going to get hung up on
pushing the indices through?
NICO WEBER: Try it, I guess.
So writing a demo for that should take, maybe, two hours?
And then you know.
That's what I would do.
But I don't know.
So I guess if stuff turns out to be slow you can always go
to native code.
But it worked on really slow machines 12 years ago, or even
longer than that, so I guess it should work fine.
AUDIENCE: Have you considered or looked at Renderscript
by'the way?
NICO WEBER: So when I wrote this, 3.0, there was even less
documentation on Renderscript than there is today.
I think I had heard of a name, but nothing else.
So not really.
And also I think Renderscript is 3.0-only and
Android-only and so on.
So I think, not yet.
AUDIENCE: OK, thanks.
NICO WEBER: Yes?
AUDIENCE: When you're using GLSurfaceView and on top of
which, you might want to use an Android 2-D graphics widget
or ListView, let's say, the performance drops
tremendously.
I can understand there's 2D computation, there's the 3D
computation in the background happening.
But have you guys thought about it?
Like how do you deal with this in the future?
How about out combining 2D graphic APIs and 3D graphic?
NICO WEBER: So with you guys, do you mean me the Google Body
developer or us the Android framework guys?
AUDIENCE: Generally Android framework.
NICO WEBER: So I have no idea what the
Android guys are doing.
I'm sorry.
AUDIENCE: Any tips and tricks you might have seen?
AUDIENCE: [INAUDIBLE].
NICO WEBER: OK, so I am told to recommend the Office Hours.
So what Google Body does, if you tap things it draws these
little text widgets.
And I'm using OpenGL text just for that because I didn't want
to deal with mixing 3D and 2D.
But I think Google Maps puts 2D, which is on top of the
map, and the map is a GLSurfaceView,
so it kind of works.
So I guess it depends on if you're writing a game where
you really need that 60 frames per second.
And in that case, you don't want to put anything on top of
your thing.
Or if you're writing an app, in that case it might be fine.
AUDIENCE: OK, thanks.
NICO WEBER: More questions?
AUDIENCE: I've got a question about the cow.
NICO WEBER: Yes?
AUDIENCE: Specifically, why are its teats on show,
compared to the female model?
NICO WEBER: So I don't know.
The web version did that.
I hadn't ported the cow yet to the tablet version.
So I haven't looked into that yet.
Thought I can see the cow on the tablet being really useful
if you go to a steak house you can be like,
can I have this piece?
So that's my motivation, there.
But I haven't had time yet.
What's that?
Oh yeah, that's pretty fancy, huh?
So locally on the notebook, there's a little ghost server
running that basically--
so I have a web socket connection to the local ghost
server and then it copies that into a Java file, invokes and
to compile this thing, then invokes ADB to copy it over
and then ADB lock add to [? grab the output for the frame
stuff, sends it back up the web socket.
Yeah, that's also on the slide project.
And this took way longer to do that than useful.
But oh, well.
Yeah, the question was, how did the run button work in the
presentation.
More questions?
Come on guys, we have five minutes left.
No questions?
All right, then.
Thanks for listening, again.
[APPLAUSE]
NICO WEBER: Oh, there's one last question.
AUDIENCE: In the fill rate example, you cleared both the
color buffer and the depth buffer.
I mean if you, actually I was confused.
Does the fill rate work-- were you trying look for both depth
and color buffer?
NICO WEBER: Yeah so if you just clear the color buffer
then you can go higher than seven.
So that's faster, if that's the question.
But you can just try it yourself.
AUDIENCE: I mean like the depth buffer is like, you
don't write to it so often, is what I got.
It's much smaller than the color buffer, so--
NICO WEBER: I think the Tegra 2 has, it usually uses 16-bit
colors for performance for the color buffer.
And the Tegra also only has 16-bit C buffers so
it's those same size.
AUDIENCE: Both are 16-bit?
NICO WEBER: Both are 16-bit per pixel.
AUDIENCE: OK.
Thank you.
NICO WEBER: And phones actually have, I guess, 32-bit
def per pixel, so you can get some z fighting artifacts on
Tegra if you're not careful.
All right then, I'll just say thank you, again.
And usually someone else pops up.
AUDIENCE: It's not about the cow this time.
The question was on, did you try the fill buffer test with
textures as well and see what the throughput was on that?
NICO WEBER: I think I did and I think it was identical.
AUDIENCE: It was lower than 7 per?
NICO WEBER: I think it was the same.
AUDIENCE: There was no difference?
So the fill rate is identical for texture and
also uniform color?
NICO WEBER: I think so, yes.
Pretty sure I tried that.
But don't believe me anything.
Just try everything yourself.
It's easy and quick to do.
All right, that's all folks.
Thanks.