Practice English Speaking&Listening with: Google I/O 2008 - Painless Python Part 2 of 2

Normal
(0)
Difficulty: 0

Alex Martelli: Welcome back to the second part,

and now we're starting to speak about functions.

A function is defined by the key word def,

def, name, open paren, parameters, close paren,

colon, body.

Note that the parentheses are mandatory.

Even if you have no parameters,

you just write def, name, open, close.

That's all there is to it.

The body gets compiled

at the time you execute def.

It gets compiled and connected

to the function object that is created.

The function object is also assigned

to the name that you're giving.

It's not executed at this time.

Later, as in not necessarily after that in your program text,

but later in time, you can call the function.

The call is-- the syntax is simply

name, open paren, arguments, close paren.

Again, the parentheses are what makes a call.

This is going to be familiar to any C or C++ programmer.

If you just mention the name, you're mentioning the object.

You could put the function object

in a list or something.

If you want to call it,

you have to have parentheses there.

The arguments and parameters must respond.

Arguments are what you are--

are values that you are initializing the names

in the parameters with.

Their correspondence is typically one to one,

but there is a few ways

that you can use more flexibility there.

The parameters at the end

can have default values like in C++.

So, you can use the syntax parameter name equal expression

meaning, unless otherwise specified,

this object is the value for this parameter.

Be careful, this expression is evaluated once

at the time you run the def statement

and that object is going to be used for every call

that doesn't specify that name.

You may or may not override this

by explicitly specifying a corresponding argument

in the call.

Similarly, the last few arguments

you give can be named.

Instead of just taking the positional correspondent

first argument goes to first parameter,

second argument goes to second parameter,

you can say explicitly

what parameter name you're setting

to the value you're passing.

Finally, you may define a function

that takes arbitrary positional

and/or arbitrary named arguments.

The syntax is star-named.

It will give you a tuple name of extra positional arguments,

and star, star, name will give you a dictionary

of optional named args.

So, there's a lot of flexibility there,

which is typically used, for example,

to wrap a function in another function

and pass arguments through

without needing to know which ones they are.

Here's a simple example. A totally elementary function,

takes two arguments, does the sum of squares.

So, that's all there is to it,

but it's not very general.

It will work if we're in the middle

of computing a hypotenuse,

but what if we have more than two numbers

and we want the sum of their squares?

This is the Pythonic way of expressing that.

The sum of squares, arbitrary arguments,

is the sum of the square

for every item in the arguments.

So, the sum of all squares

is the sum of the square of each item.

That's just about as high-level

as you can get and stay general.

If you're used to lower level languages,

you may think in these terms.

Well, start with a total of zero,

then go over all arguments

incrementing the total by the square,

and finally return the total.

That will work, too.

As I said, we do bend over backwards

to try and support people who are so attuned

to lower level programming languages

that they can't just express their problem anymore.

But the problem you're trying to solve is this one,

and this is much faster and much more natural in Python.

So, if you find yourself basically writing

for turning Python, try and write Python instead.

It's going to be much faster and make you much happier.

Generators are functions

which instead of returning a value

yield values.

The key word is yield, y-i-e-l-d,

instead of return.

What's the difference?

A return gives a result

and terminates the execution of the function.

A yield gives a result

and suspends the execution of the function

at that very spot until further notice.

Specifically, the mechanism we've used

to let you resume and get the next result,

the next thing to be yielded, is iterators,

which are basically objects with a next method.

So, when you call the generator, you get the iterator.

Every time you call next on the iterator,

you get the next thing to be yielded.

And when everything is done,

when there is no more next value,

the iterator will raise a StopIteration exception.

If you use a for loop, remember to finish out

the for loop that we gave in the first half,

this is basically done for you for free.

So, I mentioned enumerate before is actually a built-in,

but if it wasn't a built-in,

these are two ways we could actually program it.

One is the lower-level way,

the other one is the higher obstruction

and faster way.

So, the job of enumerate is yielding each item

of the sequence with its index.

So, zero, first item, that's the first yield.

One, second item, second yield and so on.

So, you could think, well, I'll start at zero,

loop over the sequence,

and yield the current index, n, and the item,

and then increment for the next item.

So, when the yield executes, the function freezes.

The execution of that particular run

of the function freezes.

And when the next method is called on the iterator,

it can by incrementing and going back.

This is the higher abstraction way of doing it.

It uses the standard library module itertools,

these tools for iteration,

by doing a element by element parallel zipping and count,

which is an iterator which yields the unending sequence

zero, one, two, three, and so on forever.

What we're doing is zipping the sequence,

zero, one, two, three,

with the sequence we were giving as an argument.

So, this single statement does the job of these four.

Again, think in these terms when you're programming Python.

You'll be so much happier.

Let's give a slightly less elementary example.

One of my favorite mathematicians of all times,

Leonardo Pisano known as Fibonacci,

wrote an exercise in his "Liber Abaci"

about computing the number of rabbits produced

by a hypothetical rabbit warren.

You're maybe familiar with a Fibonacci sequence,

basically the sequence here.

So, we start with one, one,

and then every item is the sum of the previous two.

So, this is a low-level,

but perfectly acceptable way of programming.

We'll start with i and j both equal to one.

Note something I hadn't mentioned before,

you can change assignment like in C.

I equal j equal one gives both names, i and j,

to the object one.

And then while true, which of course means

this loop will never terminate, not per se,

we do another form of multiple assignment.

We assign i, j, and i plus j to r, i, and j respectively.

If this isn't perfectly obvious,

the right hand side with the multiple values

is evaluated left to right.

So, i and then j and then the sum,

and after all the evaluations, then all the name bindings.

So, r is i,

i is j,

j is i plus j.

You could write it as three assignments,

but there's no point.

And finally, we yield this current result.

If you just did a for on this, this will run forever.

So, it is crucial that the for

has a break inside it at some point

when you have enough stuff.

So, this is, as I mentioned,

the purpose of the exercise was computing

the number of rabbits born in a farm, in a certain year.

So, for rabbits and Fibonacci,

note we have to call it to get the iterator,

we just print it with a comma to avoid breaking the line,

and then we decide when to break.

We break when we have more than 100 rabbits

because that's a large enough farm

and this is the result sequence.

Another trick,

which will be very familiar to JavaScript programmers,

but is not common in the other language,

def is an executable statement.

I did mention there's no such thing as a declaration,

everything is a statement.

Def is an executable statement.

Every time def execute,

it builds a new function object and binds it to that name.

Also, scoping is lexical.

That is, if I'm looking for a variable name

and it's not bound in my current context,

I go to the context containing it,

where contexts are functions in Python.

So, consider this: Def makeAdder given an addend.

When def makeAdder compiles, it just compiles this--

when def, the outer def, executed just compiles this.

When makeAdder gets called, this def executes,

meaning this code gets compiled and bound to name adder

and that name adder, of an inner function,

is returned from the outer function.

This is known as a closure.

The inner function adder doesn't have anything named addend,

and so it looks outside

and finds it in the outer function.

So, for example, a23 equal makeAdder(23).

Now, a23 is a function

that will sum 23 to whatever it's given.

And similarly makeAdder(42), so now a42 is function

that will sum 42 to whatever is given.

So, we can print a23(100), a42(100) or compose them,

they're totally independent now, because every def,

every time the inner def runs, it builds a new function object,

but you don't interfere,

so we have the expected results.

A syntax form known as decorators makes

higher order functions even smoother to use in Python.

Basically, whenever you write at decorator def something

or in Python 2-6, which is about to go beta now,

similarly for at decorator class something,

but we won't be covering anything beyond 2-5.

Whenever you write this,

it's as if you've done your defs or created your function object,

and then rebound the same name, which you used in the def,

to whatever happens passing the function object

as the argument to the decorator.

So, basically, decorator has to be a higher order function

or it could be a call to a function

returning a higher order function,

a higher order squared function.

This is a very handy syntax for all sorts of things.

Particularly it's used in side classes as we'll see.

Okay, there's new-style classes and old-style classes.

I will not be mentioning old-style classes at all.

They're totally obsolete,

only exist for backwards compatibility.

We'll talk about new-style ones.

A class is similar to a def.

In a sense, it's also a key word.

Class, the name, and then bases in parenthesis.

If you have no bases--

Well, you should always have at least object as a base.

So, at the very least,

you'll have open paren, object, close paren, and then a body.

So, it's different from a def in this sense:

As I mentioned, when you call-- when you execute def,

the body only gets compiled, not executed yet.

When you execute class, the body gets executed.

The body's typically a series of assignment and def

and possibly other statements

and it's executed at the time you execute class.

It basically binds names to values

and those names become attributes of the class.

So, functions, for example, become methods for the class.

Also, anything that is an attribute of a base

is also found when you look up things on the class itself,

except that you can override just like in C++ or Java.

If something is defined in a base class

and you define it differently in the drive class,

the drive class wins.

So, in C++, this is likely different

because you explicitly have to say

that something is virtual in Python.

Like in Java, there's no such need.

Everything can be overridden.

Let's give a very boring example just to show.

So, class eg, that's just an example,

iterates from object, that means it has no real basis,

just the multi-generic object of them all.

It starts with assigning an empty list to cla,

so that's a class attribute.

The class itself holds the list,

and then it has a def

with the special name dunder init dunder.

That's the initializer

for the instance of the class which is called self.

And so it binds the new dictionary,

empty dictionary, to self.ins,

which is an attribute of the instance,

not of the whole class.

And then, we have two methods.

One does append to the class attribute,

and this one does an insertion into the instance attribute.

How we instantly change classes is simply by calling them.

There is no need

for such redundant operators as new.

We just call the class

and each time we call the class, we get a fresh new instance.

So, having just initiated them,

I see that the cla of the instances

is the empty list

because since the instance doesn't have one,

it goes back to the class

and the ins our empty dictionary,

there's one per instance.

Once I've called a few methods,

note that the cla is the same for both

while the ins has changed because one is a class attribute

and the other is the instance attribute.

So, if I ask are they identical objects?

Are they the same object for the cla?

The answer is true because they both go to the class,

while if I ask it for the ins,

which is by instance, then the answer is false.

They're not the same object.

When, uh-- Python is kind of peculiar

among very high-level languages because it tries to make

everything very explicit for you.

Remember that rule in the Zen of Python:

explicit is better than implicit.

In particular, what does it mean

to look up an attribute on something

that is very clearly, very detailedly specified?

So, for example,

normally something like inst, dot, method, arg1, arg2

will be exactly like looking at the method

on the type of the instance and passing the instance

as an implied first paragraph.

More generally, whenever you do inst dot name,

whether you are about to call it or not doesn't matter.

There's a single namespace

for executable and non-executable attributes.

First of all, the string name is looked up

as a possible key into a dictionary

which belongs to the instance.

Every object, just about every object,

owns a dictionary with a special name,

dunder, dict, dunder, which is basically

where all of its attributes are organized.

If it's not there, then we try the same thing

for the type that is the class.

Type and class are more or less synonyms in Python,

except class is a key word

and type is a normal identifier.

If it's there, then we try exactly the same thing

along all the bases.

The base classes are recorded as one

of the attributes of the type,

as part of the class statement.

If we haven't found it in any of the bases,

then we try a special method, dunder getattr dunder,

which is basically there to let you compute

attributes on the fly

just in time when necessary.

If that works, then it's a result,

otherwise you finally get an attribute error

which is an exception meaning

that attribute name does not exist anywhere

in the search space, for this instance.

Now, subclassing,

and this is where overriding gets in,

means you can specify that eg is a base class of sub,

and then you can define the same thing,

the same name, meth 2 in this case,

that you had already defined in the base class

and this is an override.

In the override, you can call the method

on the base class explicitly by base class dot meth2.

And remember, in this case,

you have to pass self explicitly

or you can do it implicitly with a super,

which is a kind of magical piece of--

a magic built-in which does the look-up

in the base classes for you.

So, another example is that you can override data as well.

So, I've defined a subclass of list

where every append is actually done twice.

So, basically every time you append something,

it's appended twice to the list.

Not particularly useful, just an example.

The point is to show that when you subclass,

you can override a piece of data,

in this case cla,

just as much as you can override a method.

Again, there is no distinction,

just define it with a different name.

Every use of that attribute

will use the overridden attribute.

Another very useful concept is that of a property.

Suppose you have two method, call them getter and setter.

Getter just takes the object and returns a value.

Setter takes the object and a value

and somehow sets the value appropriately

into the object.

You can give a very nice syntax

by calling the built-in property with a getter and setter,

as part of the class assigning it to a name.

Now, every time you access that name,

Python internally calls the getter method for you.

And every time you assign something to that name,

it calls the setter method for you.

So, that's very nice syntax

to be substituted to do function calls,

do method calls.

Note that I did say you cannot override the equal.

That only applies

when the left hand side is a simple identifier.

When the left hand side is like here,

a attribute excess, you can actually do

pretty advanced stuff even to the assignment,

and property does it for you.

Why am I dwelling so much on properties?

Because there's a bean going around

particularly in Java,

but to some extent in C++ as well,

that you should not expose attribute.

To keep flexibility, you should hide attributes

behind getter and setter methods.

Thanks to properties, this is useless in Python.

You just expose the interesting attributes directly.

If and when you need a getter and a setter,

you write them

and you drop them into a property,

so that all client-code using your class

doesn't need to be changed.

It can still assign to attributes C attributes,

and then method calls will happen intrinsically

on your behalf.

Avoid boilerplate. Don't waste pixels.

Do not code this kind of thing, just name the attribute

without the leading underscore so it's visible

and let people use them and rapid,

if and when that becomes necessary.

I did mention Python has operator overloading.

How does it do operator overloading

is by defining a huge number of special names.

Special names start and end with two underscores.

Two underscores,

you could say underscore-underscore,

but that's a bit long,

so a common way of pronouncing it

is dunder, for double-underscore.

Anything beginning and ending with dunder is reserved

to the Python language.

Do not use this form of identifier

for your own arbitrary names.

They could conflict with special names

in the future.

There's a lot of things you can do with special method.

There's a constructor, new, initializer, init.

Note there is a difference

between constructing and initializing.

If you're familiar

with the so-called two-step constructor design pattern,

Python gives that to you.

New actually makes a new object that's still bare,

and init let's you initialize the object.

Del, which is what happens when the object goes away.

It's not a destructor, it's more of a finalizer.

If you're familiar with C++ destructors,

that's a bit different.

This is more like a Java finalized.

And then, there is a way to convert things,

not just to wrap a string into flow,

too complex, and so on.

Many ways to compare things:

less than, greater than, equal.

A lot of method for arithmetic:

addition, subtraction, multiplication.

Methods to make it like a function,

so callable, hashable,

so it can go into a setattr dictionary.

Dealing with attempts to get set and delete attributes

or items as in a container, other container stuff.

Getting set to define what are called descriptors.

Enter and exit describe what are called contexts.

So, there's a huge number of special methods

you may want to define.

The point to retain,

you will find out the special methods,

which are basically a high syntax convenience,

if and when you need them.

Python will call the type special method for you

when you attempt the appropriate operation.

So, for example,

when you write foo, open, close,

Python looks up the type of foo,

finds the dunder call method, and that's what gets called.

It would be like operator, open, close, paren in C++

and sometimes it does so in a more structured way.

For example, if you ask if a greater than b,

and a doesn't have a gt method,

Python continues by looking if b as an lt method,

so that it's basically going to compute

as a second possibility if b is less than a

rather than if a is greater than b.

More generally, built-ins do things right for you,

but let's get a simple example first.

Remember the Fibonacci generator we did a few minutes ago?

This is the same thing done much more detailedly as a class.

At initialization, we set one to i and j.

It's got to be self.i and self.j

because of their instance attribute.

We have to specify an iterator by saying

if somebody wants to iterate and use me directly.

And then there's the next,

which is unfortunately not marked by dunder.

This is fixed in Python 3.0, but this is in Python 2.

We didn't place the double-underscore,

but it is a special method because it's called,

for example, by the for loop.

And this does essentially the same thing

that we did more simply in the generator,

but it does it explicitly with self dot variables,

and this is basically intended to be used

in exactly the same way as the generator I gave before.

So, it's exactly the same semantics.

Basically, what the generator does

is generate for you an object which is more or less like this.

Okay, built-in functions are what calls the special methods.

You never call a special method directly, essentially.

Think of the double-underscore

as a way to make special method ugly,

so you're not even tempted to call them.

Never call x, dot, dunder, len, open, close.

Call len, open, x, close, it will do the right job,

which in this case is calling dunder len.

Another example, abs, don't call this directly.

You don't really know what the abs built-in does.

It will probably call dunder abs,

but suppose it doesn't find it?

It may be able to do a change sign test

if greater than zero changes the sign for you.

It may or may not,

but always go through the built-in function.

There are a lot of built-in functions,

not just ones corresponding directly to special methods.

We'll see some of them in examples.

And also, these are just the ones

that are always available to you,

but there's a lot in the standard library

that you want to use just as much,

and these are all absolutely crucial modules

in the standard library

you'll need to be very familiar with

to do effective use of Python.

As I said, rather than going into these in details,

I like to give an example.

Suppose we have a readable file somewhere.

We'd like to make an index that is a map

from the words in the file

to the line numbers where that word is found.

So, first we build the map, and then we emit it.

To build a map, we start with an empty dictionary.

A dictionary is natural way to represent a mapping.

We use with open, filename, as f,

syntaxed to open it and guarantee it's closed

as soon as we're out out of this block.

We use enumerate to get line numbers

and generally looping on a file

gives you the lines of the files or strings.

And then, we call the split method

which breaks a line into a list of words

and to loop over that.

And here, we use the setdefault, which is a bit complicated

because basically what it does is

it looks up this key in the dictionary.

If it's there, it returns what corresponds to it,

The corresponding value.

If it's not, then it sets the second argument

as the new value for the key.

So, it's kind of complicated,

but basically you take either a list

already corresponding to the word

or a new empty list.

In either case, you append the line number to it,

so the line numbers accumulate.

And then, once we have our index,

we can just emit it to standard output

in alphabetical order.

So, for that, we use a sorted built-in

which let's you loop on it in a sorted way,

so sorted order for strings is alphabetical,

so you loop on the words in order and you print them.

I'm using the percent for marking here

just to have a colon attached to it,

and then we print the line numbers.

Note the comma here, so we don't break the line,

and finally a print without anything

just to break the line at the very end.

We can do it slightly simpler

if we're familiar with the standard library

because the collections module has a sub-type of dict

known as a defaultdict.

A defaultdict is something that if you try to index it

and the index isn't there, the key isn't there,

instead of raising a key error

it calls something to make the new item.

In this case, we want to call the type list without arguments

to make an empty list and set it there.

So, once we've made a collections defaultdict list

instead of a plain dictionary, we can simply use

index word append(n) instead of having to go

the complicated set default route.

Everything else,

every other bit of code in this slide,

is just the same.

Other things we could do that could be interesting:

once we have this index, what about getting

the seven most popular words in the text files?

What are the seven words

that appear on most lines in this file?

Well, for this purpose, we want to use heapq.

Typically, it's a typical priority heap operation,

if you're familiar with your algorithms,

and specifically it exposes heapq.nlargest.

Give me the n largest items of a certain collection

and the n we want is seven, the collection is indx,

and we get to define what is the key extractor

for the comparison.

So, key equal index.get means

we're not getting the seven alphabetically largest,

but the seven whose corresponding value index

is largest, and this will work.

And this is an end walk form that is--

I don't want all the lines with the word rabbit

or all the lines with the word hare.

I want the lines with both rabbit and hare.

How am I gonna find them?

This is a set intersection problem.

So, I'm basically going to simply make a list

of all the words I'm looking for,

pop the first item,

and make a set out of the corresponding line numbers,

and then I'm using the ampersand operator

to do intersection

specifically in the form ampersand equals,

so intersect in place.

Basically, all binary operators have an equal form,

so, for example, plus equal increments in place

and ampersand equal intersects in place.

And, in the end, of course, I'm careful to sort it

so it will print out nicely in alphabet, in line order.

Because the set, by making a set,

I get very fast intersection and so on,

but I lose significant order

because a set is a hush table, so it basically, as I mentioned,

doesn't have an intrinsic order.

But when I do need it, I can always sort it on the fly.

So, I mentioned that that standard library

is full of very useful modules.

To access anything in the standard library

or any other module, you have to import the module

and this is how you do it.

I already showed it in a couple of examples,

but import modulename is the economical way.

So, for example,

when I wanted to use something in collections,

I started with import collections.

This basically makes the name collections available

to my program.

And the name collections refers to that module,

so then I can use the attributes of the module

in my program.

If a module is contained in a package,

you will see exactly what a package is in a second,

then I cannot just import it

because it's hiding inside a package.

I have to tell Python

from what package to import it.

This would be a package named package

inside a package named given

inside a package named some.

So, some dot given dot package,

Python would look for some inside of given,

inside of package, and inside that for modulename.

In either way, you basically get the modulename

as part of your namespace and you can access it.

And then, you want function blah

from the module,

just do modulename dot blah and you're done.

So, there are some other ways.

I did mention that we'd rather have only one way,

but sometimes it's kind of inevitable

to offer more than one.

Although in practice these two will

make you happy all the time,

you should be aware of the others

because you'll see them used in Python examples and code.

One possibility is you can import the module

under an assumed name.

Basically, put a false mustache on the module

so it makes believe in your namespace

that it's named something different

and that is the as clause.

So, for example,

suppose somebody gives you a name that is--

this name is far too long

and this name is far too long to use conveniently,

and you would like to have a shorter name for it.

Well, this is how you do it.

Import thisnameisfartoolong as a set,

and now instead of thisnameisfartoolong dot blah,

you can use zed dot blah.

And this is indeed sometimes useful

if you ever have to handle modules

whose names are far too long.

Tips I don't recommend is

instead of getting the module into your namespace,

reach into the module namespace and grab one thing into yours

from thisnameisfartoolong import blah.

This will work, but throughout the rest of your code

people who are reading it, including yourself

when you're maintaining it in six months,

will wonder where's this blah from,

and you will have to go and look for when it was imported.

So, I'd rather always see modulename.blah or z.blah

to get the immediate thing.

It's not a top level name, it's coming from something else.

And the very worst thing is to import star,

which basically grabs all of the items in the namespace

and injects them into yours.

Don't do that.

That is a guaranteed way to cause yourself headaches.

It's handy if you're at the interpreter prompt

because say you're doing a lot of math interpreter prompt,

you don't want to say

math dot syn, math dot cos, math dot tan, math dot atan.

You want syn, cos, tan, atan

and all the mathematical functions

to be part of the top namespace

because it's too much typing otherwise.

That's possibly the only reasonable way

to use the import star.

And this is, to clarify what I just--

So, I want to compute an arctan with two inputs.

The normal way is to import math,

and then use math.atan2 of x and y,

which in this case is this value.

If I tried even after importing to just use atan2,

this would give a NameError exception

because, in my current namespace,

there is no name atan2 defined.

Remember, I got math into my namespace,

but atan2 is still within namespace math.

I could do, from math import, atan2.

In this case, this would inject atan2

in my namespace.

Sometimes, when you're doing this interactively

like at the prompt, it's kind of nice.

In real programs,

it tends to be confusing and this goes squared.

From math import star,

now you have 25 mathematical functions

in your namespace which is handy,

but don't do it in real programs,

only in interactive use.

So, this is nice about how I use other modules,

but how do I make my own modules?

Well, that's easy.

Any Python program is a module.

Any dot py--wot dot py is a module.

Just put it in the appropriate directory,

say the same directory as the one

it's being imported from, the rules depend a bit on--

but for Google App Engine that's what you do.

And then, any other module in that directory

can do import wot

and Python will look for wot.py.

The directories are listed in sys.path,

and they need not be directories strictly.

In general, they could be zipped files.

So, instead of having a file system,

you can basically zip everything up

and Python will find it anyway.

That's kind of handy.

There are other things that you can import.

You could import byte code files.

You could import files coded in C

or other languages for Python.

So, the extensions are pyc, pyd,

or sometimes dot so on Linux,

but this wouldn't work in Google App Engine.

Google App Engine only accepts the dot py files,

so you want to make sure you place the dot py itself.

It handles the compilation to bytecode

in making sure the bytecode is secure

and everything else under the covers.

So, this variant is nice

for other kind of uses of Python,

but don't rely on it on the application engine.

So, what's a module?

Well, it's an object. Everything is an object.

It's got attributes.

That's basically everything it has.

It doesn't have anything sophisticated,

only attributes.

The attributes of the module object

are what you could see as the top-level names

of the module as a source.

So, for example, say we have a module

whose dot py source is only x equals 23.

This is called wot dot py.

If we import wot,

then the only thing there is in it is wot dot x,

which is 23 and that's all.

Besides assignment, names can be bound

by class, def, import, and from.

Class name binds the name, def name binds the name,

import name binds the name,

from foo bar import name binds the name.

The attributes of a module are also known

as the global variables of that module.

Note there are no such things as global globals in Python.

Globals are always per module.

Within that module, you can of course access them

as bare names from other modules,

who will access them by modulename dot variablename.

Note again, well, variables,

but can they be functions or classes?

Yes, there's no distinction.

Names are names are names.

Whether they're callable or not, they're just attributes.

You can also bind and unbind module attributes

from the outside,

a practice known as monkey patching.

I would strongly recommend against that

in production code.

Unless you're fixing some bug in a library

which you're not allowed to edit, don't do it.

Sometimes it's handy for testing,

but there are better ways.

Use the dependency injection design pattern,

you will do your testing in a much more systematic way.

I would strongly recommend avoiding monkey patching.

If you look for monkey patching on the web,

you'll find very strong diatribes for and against it

for every language that's supported besides Python

including Ruby, JavaScript, and so on.

It moves a lot of emotion.

Basically, people who are programmers at heart

don't want to do it,

but all these languages are used by people

who need to program but aren't programmers.

They're like hardware experts

or web experts and so on, and they don't see why

they should do things the proper way,

but I'll leave that to the web.

It's important to notice that modules are singletons.

They are the most natural and Pythonic form of singletons.

They're automatic singletons. What does it mean?

It means that if you import a module more than once,

the first import is treated very differently

from all the following ones.

The first import finds the module somewhere,

loads its composite on the fly if needed

or takes the compiled form,

executes the body of the module,

so the module object is properly populated,

and places the module in a system directory

known as sys dot modules.

Sys is itself a module, so you can import sys

if you want to play around with it,

and sys dot modules is a dictionary.

The keys are the names of modules.

The values are the module objects.

So, when you import the module again,

Python will first look into sys.modules.

It's very, very fast to check if something is in a dictionary.

If it finds the name,

that's what it gives you immediately,

basically one instruction.

No second loading, no further execution--instant.

Your singleton is right there.

So, some people who love the singleton design pattern

say, "Well, yeah, this kind of works,

"but it only gives me a module.

It can't--I can choose a class instance there."

So, for example, I want my singleton object

to support addition or greater than comparison,

and I can't do that because modules are simple objects.

Even if you define a function called

under, under, add, under, under, it won't do anything.

It's not as special

because it's not part of the type of the module.

Remember, the special methods only work

if they're in the type, not in the instance.

So, this is how you stick something that's not a module

into sys.modules.

Some people consider it a trick,

I consider it a perfectly legitimate use

of a mechanism that Python exposes for your users.

Of course, it's a very advanced one,

but it's a way to make--

Oh, by the way, dunder name is a special attribute,

which is the name of the current module.

And that's how you basically make your class

and underscoring front

just to indicate it's actually private

and intended for internal purposes,

and there you go.

Packages are essentially modules containing other modules

and, of course, there is no limit

to nesting you can do this way.

So, you can have sub-packages and so on.

In practice, it lives typically in a file system

or it could be in a zipped file.

In a directory--

And how you distinguish directories,

which are just directories

from directories which are packages,

is by the presence of a file called

dunder, init, under, under, init, under, under, dot, py.

Python will only consider for a package as such

a directory which has a file by that name.

So, that file contains the module body.

It's often empty because you don't necessarily want

the package to do anything else

than containing other modules.

And so, you don't need to do anything in init py,

but you still need to have it there

because it acts as a flag to Python

that, yes, this is not just any directory,

this is a package.

The modules inside the packages are

basically all the py files in that directory.

And that is done for you by Python,

so you don't need to worry about it.

And if you want sub-packages,

you will have sub-directories which in turn need to contain

the special dunder init dunder file.

Note that the parent directory of the directory package

must be on sys.path.

And once you have a package with a module bar inside it,

you could import foo dot bar,

that will work, it will bind foo dot bar.

So, you'll bind foo and give it an attribute bar,

but a more normal, more common approach is

to use a from foo import bar

that only binds the name bar directly.

So, basically, I like my imports

to always bind a module to a name,

and this is what it does.

Okay, now we've basically covered,

well, so to speak the Python language

and a little bit of the built-ins

in a very tiny fraction of the standard libraries.

One of the mottos of Python is "Batteries Included."

It means that the standard Python library

has more than 200 modules.

That's production modules.

There is many more for unit-tests,

encoders, decoders, demos of various kinds, and so on.

Some are pure Python, some are coded in C.

The application engine will support

any pure Python module and most C-coded modules

that are part of the standard library.

There are specific limitations

that will be covered in the App Engine sessions.

App Engine does not allow you to do threads,

does not allow you to do sockets.

That's not because they're encoded in C,

it's because of its very specific execution model.

Moreover, App Engine will add some specific APIs

such as datastore, users, urlfetch, mail,

and will support any pure Python module you use.

You just put as part of your App Engine application

any dot py that doesn't use anything except other dot pys,

and you'll be fine.

It will be supported.

But back to the standard library,

the standard library is so much larger than language

that the time it takes for an expert programmer

to master Python the language is maybe a couple of days.

Unfortunately, when he's done that,

he's still got a lot of work to do

because he probably wants to know the built-ins

and the special methods,

and the metaprogramming introspection and so on

and that takes another say three days.

Good, but now he needs to get started

with a standard library.

Out of the 200 modules, there's maybe 20 or 30

which are an absolute must

and those will take another 10 days or a bit more.

And when those are done,

to really claim they've mastered Python

as opposed to the Python language,

they probably really want to have a pretty good idea

of what's everywhere in the standard library

that's longer than all of the above.

And when they're done with all this,

well, there's third-party offerings.

I don't know how familiar you are with Monty Python,

in honor of which Python the language is named.

This is a scene from their cheese shop sketch,

which I strongly recommend.

Cheeseshop dot python dot org is a little bit less green,

but better supplies than this cheese shop.

It's got 4,000 packages and counting,

but I've given up on printing the exact name

because whatever I looked up yesterday

would probably already be false today.

There's more than one package being added every day.

It's always fun to roll your own.

I mean, if we're programmers,

it is because we like to program.

Say I need this functionality, I'll roll my own.

Unfortunately, that's not a very professional attitude.

You probably want to see who else already spent

person-years of effort into doing this,

and this is likely to be on the cheese shop.

How long does it take to learn all 4,000 packages?

Well, I have no idea.

I suspect that if you put yourself to it,

by the time you're finished there will be 4,000 more,

so you'd never be finished.

So, I have no real suggestion here except ask around.

Go to the mailing list or go to the usenet group

and just ask, "I need to do this and that.

Is there some good module you could recommend?"

You'll probably get several recommendations.

That's still a lot of work for you to pick the best one,

but better than nothing.

Remember, online resources, don't forget those.

They're all great.

You really need that

to do really good work in Python.

And here we are.

We're ready for questions and answers.

We've got about eight minutes, so I can field quite a few.

If you can please walk to the mic in the corridor

and speak your question in the mic.

man: One question.

Google is hosting Dojo and jQuery,

the JavaScript libraries.

Martelli: I'm sorry?

man: Google is right now hosting for JavaScript development--

Martelli: I can barely-- There's a lot of--

man: Is this better?

Martelli: Yes, please.

man: Okay.

Google is hosting some JavaScript libraries

that you can load directly from Google

like jQuery and Dojo and so forth.

I wonder if there's plans for hosting

the stack of third-party modules from Python

that you can pull directly?

Martelli: The reason it makes sense--

I'm not a JavaScript expert, but I believe

that the JavaScript libraries you require, you mentioned,

would typically be mentioned in another JavaScript file

or a web page and having a url,

at which there's certain to be some use.

Python doesn't work like that.

Python doesn't load code transparently

from across the web.

You put your code right there.

So, there's a lot of hosting of open source code

that we do at code.google.com,

but that's something you download

and integrate with your programs.

It doesn't live at any specific url

because that basically would serve no useful purposes.

man: Yeah, I know.

I just thought it would be a cool idea, sorry.

Second man: I see a lot of similarities

between Python and Ruby,

and I'm wondering what you think are

the major distinctions between the two languages?

Martelli: You could look for my name

and the words Ruby and Python,

and you'll find about half a dozen places

where I've written pretty long essays.

But to summarize, if I was doing a spanning tree

on the dag of distances of programming languages,

I think the first two qualities as being the closest

would be Ruby and Python.

From the point of view of just about anybody experienced

in just about any other languages,

they're closer

than any two other languages could possibly be.

My favorite, being Italian, is that they're like

spaghettini and capelli d'angelo.

Yes, I could explain the difference.

It's gonna be hard if you're not Italian,

but I could.

Like whether the microscopical round

are rounded or squared,

but in practice, if you're eating them,

I challenge you to tell the difference

unless you're Italian, in which case you can.

But it's like in the genes.

Second man chuckling: So, I guess my question--

Martelli: So, they're so similar.

You can dwell on the differences,

but why?

In terms of practical consideration,

Python is much more mature,

so the implementation are of a much higher quality

because they've been around so much longer

and Ruby's very fashionable.

So, do you want something solid and working

or do you want something that's, like,

in the news and top of fashion?

Your pick.

[audience laughing]

I'm an engineer. I want solid working stuff.

Somebody who'd rather be cool can make a different choice.

Second man: Are rails and jango also sort of analogs?

Martelli: Yes, except in this case

rails has been around longer.

So, in this case, rails is more mature than jango.

Jango's getting there,

but it's not quite as rich as rails is.

Second man: Thank you.

Martelli: Mm-hmm.

Third man: How much of this would we have to unlearn

when Python 3 comes out?

Martelli: [sighs] So, I did specifically mention

that you can finally unlearn the distinction

between plain strings and unicode strings.

Just about everything else I mentioned

is still true in Python 3.

Python 3 removes a lot of stuff

that has to be kept around for backwards compatibility

because as long as we're talking about Python 2-dot-something,

well, it has to keep compatibility.

That's a constraint we gave ourselves.

Python 3 can break compatibility

and therefore can eliminate redundant ways to do things

that you don't really need anymore.

You do have--I did mention the next gains

double-underscores before and after,

that's about it.

Third man: Well, I've heard that very simple programs

like the "Hello World" program--

Martelli: Oh, yeah, print becomes a function

instead of a statement, right,

so you'd need parentheses then.

Third man: Okay. All right.

Martelli: Pretty simple.

Fourth man: So, you mentioned

that monkey patching is considered bad, but--

Martelli: By some,

and it's considered wonderful by others and--

Fourth man: In the Python community,

I guess you in particular have an issue

with monkey patching, but not directly assigning

to a module to add additional functionality.

Is there a fundamental difference there

that you see or--

Martelli: Fundamental difference between what and what?

Fourth man: Between monkey patching

and directly assigning to a module?

Um, or are they--

From my point of view, they're both--

Martelli: Assigning-- assigning--

modifying a module from outside

is monkey patching.

That's my definition of monkey patching.

Fourth man: I guess, in particular,

is there a reason that one is a useful feature

of the language and one is something to avoid?

Martelli: Mm, I-- It's useful

to be able to fix bugs without editing the buggy code,

but it's not something that you should need to do

in the long run in production.

I'd much rather fix the code,

edit the source code and fix it.

Fourth man: Okay, thank you.

Fifth man: Are there any plans

to formally support the notion of interfaces?

I know frameworks like--

Martelli: Python 3 has a standard library module

called Abstract Base Classes, ABC,

which is more powerful than interfaces in some way,

although some people think it's less powerful than others.

That's basically what we'll have as part

of the standard library.

Otherwise, you can get third-party packages

which support interfaces strictly,

but there's some stuff

that adds absolutely no functionality,

but it makes you look good.

Fifth man: Is this somehow at odds

with the idea that you want to trust programmers?

Martelli: No, actually it's--it's--

There's nothing against trusting programmers

and letting people use abstract base classes.

The point is eliminate some redundancy.

An interface per se is a structural--

well, a way to structure your stuff.

It doesn't really eliminate much redundancy.

So, I'd normally rather--

I mean, if you look at some of the ABCs in Python 3.0,

they don't actually use the ability

to add functionality.

They work as if they were interfaces, so...

And you can use--

So, basically, instead of just having one way

when you're asking me is something a container,

I can tell you, well, it is a container

if and only if it matches the container

of such base classes, that kind of thing.

Fifth man: Thanks.

woman: We're out of time.

Martelli: Okay. Sorry.

Right, so we're done. Thank you very much.

[audience applauding]

The Description of Google I/O 2008 - Painless Python Part 2 of 2