Alex Martelli: Welcome back to the second part,
and now we're starting to speak about functions.
A function is defined by the key word def,
def, name, open paren, parameters, close paren,
colon, body.
Note that the parentheses are mandatory.
Even if you have no parameters,
you just write def, name, open, close.
That's all there is to it.
The body gets compiled
at the time you execute def.
It gets compiled and connected
to the function object that is created.
The function object is also assigned
to the name that you're giving.
It's not executed at this time.
Later, as in not necessarily after that in your program text,
but later in time, you can call the function.
The call is-- the syntax is simply
name, open paren, arguments, close paren.
Again, the parentheses are what makes a call.
This is going to be familiar to any C or C++ programmer.
If you just mention the name, you're mentioning the object.
You could put the function object
in a list or something.
If you want to call it,
you have to have parentheses there.
The arguments and parameters must respond.
Arguments are what you are--
are values that you are initializing the names
in the parameters with.
Their correspondence is typically one to one,
but there is a few ways
that you can use more flexibility there.
The parameters at the end
can have default values like in C++.
So, you can use the syntax parameter name equal expression
meaning, unless otherwise specified,
this object is the value for this parameter.
Be careful, this expression is evaluated once
at the time you run the def statement
and that object is going to be used for every call
that doesn't specify that name.
You may or may not override this
by explicitly specifying a corresponding argument
in the call.
Similarly, the last few arguments
you give can be named.
Instead of just taking the positional correspondent
first argument goes to first parameter,
second argument goes to second parameter,
you can say explicitly
what parameter name you're setting
to the value you're passing.
Finally, you may define a function
that takes arbitrary positional
and/or arbitrary named arguments.
The syntax is star-named.
It will give you a tuple name of extra positional arguments,
and star, star, name will give you a dictionary
of optional named args.
So, there's a lot of flexibility there,
which is typically used, for example,
to wrap a function in another function
and pass arguments through
without needing to know which ones they are.
Here's a simple example. A totally elementary function,
takes two arguments, does the sum of squares.
So, that's all there is to it,
but it's not very general.
It will work if we're in the middle
of computing a hypotenuse,
but what if we have more than two numbers
and we want the sum of their squares?
This is the Pythonic way of expressing that.
The sum of squares, arbitrary arguments,
is the sum of the square
for every item in the arguments.
So, the sum of all squares
is the sum of the square of each item.
That's just about as high-level
as you can get and stay general.
If you're used to lower level languages,
you may think in these terms.
Well, start with a total of zero,
then go over all arguments
incrementing the total by the square,
and finally return the total.
That will work, too.
As I said, we do bend over backwards
to try and support people who are so attuned
to lower level programming languages
that they can't just express their problem anymore.
But the problem you're trying to solve is this one,
and this is much faster and much more natural in Python.
So, if you find yourself basically writing
for turning Python, try and write Python instead.
It's going to be much faster and make you much happier.
Generators are functions
which instead of returning a value
yield values.
The key word is yield, y-i-e-l-d,
instead of return.
What's the difference?
A return gives a result
and terminates the execution of the function.
A yield gives a result
and suspends the execution of the function
at that very spot until further notice.
Specifically, the mechanism we've used
to let you resume and get the next result,
the next thing to be yielded, is iterators,
which are basically objects with a next method.
So, when you call the generator, you get the iterator.
Every time you call next on the iterator,
you get the next thing to be yielded.
And when everything is done,
when there is no more next value,
the iterator will raise a StopIteration exception.
If you use a for loop, remember to finish out
the for loop that we gave in the first half,
this is basically done for you for free.
So, I mentioned enumerate before is actually a built-in,
but if it wasn't a built-in,
these are two ways we could actually program it.
One is the lower-level way,
the other one is the higher obstruction
and faster way.
So, the job of enumerate is yielding each item
of the sequence with its index.
So, zero, first item, that's the first yield.
One, second item, second yield and so on.
So, you could think, well, I'll start at zero,
loop over the sequence,
and yield the current index, n, and the item,
and then increment for the next item.
So, when the yield executes, the function freezes.
The execution of that particular run
of the function freezes.
And when the next method is called on the iterator,
it can by incrementing and going back.
This is the higher abstraction way of doing it.
It uses the standard library module itertools,
these tools for iteration,
by doing a element by element parallel zipping and count,
which is an iterator which yields the unending sequence
zero, one, two, three, and so on forever.
What we're doing is zipping the sequence,
zero, one, two, three,
with the sequence we were giving as an argument.
So, this single statement does the job of these four.
Again, think in these terms when you're programming Python.
You'll be so much happier.
Let's give a slightly less elementary example.
One of my favorite mathematicians of all times,
Leonardo Pisano known as Fibonacci,
wrote an exercise in his "Liber Abaci"
about computing the number of rabbits produced
by a hypothetical rabbit warren.
You're maybe familiar with a Fibonacci sequence,
basically the sequence here.
So, we start with one, one,
and then every item is the sum of the previous two.
So, this is a low-level,
but perfectly acceptable way of programming.
We'll start with i and j both equal to one.
Note something I hadn't mentioned before,
you can change assignment like in C.
I equal j equal one gives both names, i and j,
to the object one.
And then while true, which of course means
this loop will never terminate, not per se,
we do another form of multiple assignment.
We assign i, j, and i plus j to r, i, and j respectively.
If this isn't perfectly obvious,
the right hand side with the multiple values
is evaluated left to right.
So, i and then j and then the sum,
and after all the evaluations, then all the name bindings.
So, r is i,
i is j,
j is i plus j.
You could write it as three assignments,
but there's no point.
And finally, we yield this current result.
If you just did a for on this, this will run forever.
So, it is crucial that the for
has a break inside it at some point
when you have enough stuff.
So, this is, as I mentioned,
the purpose of the exercise was computing
the number of rabbits born in a farm, in a certain year.
So, for rabbits and Fibonacci,
note we have to call it to get the iterator,
we just print it with a comma to avoid breaking the line,
and then we decide when to break.
We break when we have more than 100 rabbits
because that's a large enough farm
and this is the result sequence.
Another trick,
which will be very familiar to JavaScript programmers,
but is not common in the other language,
def is an executable statement.
I did mention there's no such thing as a declaration,
everything is a statement.
Def is an executable statement.
Every time def execute,
it builds a new function object and binds it to that name.
Also, scoping is lexical.
That is, if I'm looking for a variable name
and it's not bound in my current context,
I go to the context containing it,
where contexts are functions in Python.
So, consider this: Def makeAdder given an addend.
When def makeAdder compiles, it just compiles this--
when def, the outer def, executed just compiles this.
When makeAdder gets called, this def executes,
meaning this code gets compiled and bound to name adder
and that name adder, of an inner function,
is returned from the outer function.
This is known as a closure.
The inner function adder doesn't have anything named addend,
and so it looks outside
and finds it in the outer function.
So, for example, a23 equal makeAdder(23).
Now, a23 is a function
that will sum 23 to whatever it's given.
And similarly makeAdder(42), so now a42 is function
that will sum 42 to whatever is given.
So, we can print a23(100), a42(100) or compose them,
they're totally independent now, because every def,
every time the inner def runs, it builds a new function object,
but you don't interfere,
so we have the expected results.
A syntax form known as decorators makes
higher order functions even smoother to use in Python.
Basically, whenever you write at decorator def something
or in Python 2-6, which is about to go beta now,
similarly for at decorator class something,
but we won't be covering anything beyond 2-5.
Whenever you write this,
it's as if you've done your defs or created your function object,
and then rebound the same name, which you used in the def,
to whatever happens passing the function object
as the argument to the decorator.
So, basically, decorator has to be a higher order function
or it could be a call to a function
returning a higher order function,
a higher order squared function.
This is a very handy syntax for all sorts of things.
Particularly it's used in side classes as we'll see.
Okay, there's new-style classes and old-style classes.
I will not be mentioning old-style classes at all.
They're totally obsolete,
only exist for backwards compatibility.
We'll talk about new-style ones.
A class is similar to a def.
In a sense, it's also a key word.
Class, the name, and then bases in parenthesis.
If you have no bases--
Well, you should always have at least object as a base.
So, at the very least,
you'll have open paren, object, close paren, and then a body.
So, it's different from a def in this sense:
As I mentioned, when you call-- when you execute def,
the body only gets compiled, not executed yet.
When you execute class, the body gets executed.
The body's typically a series of assignment and def
and possibly other statements
and it's executed at the time you execute class.
It basically binds names to values
and those names become attributes of the class.
So, functions, for example, become methods for the class.
Also, anything that is an attribute of a base
is also found when you look up things on the class itself,
except that you can override just like in C++ or Java.
If something is defined in a base class
and you define it differently in the drive class,
the drive class wins.
So, in C++, this is likely different
because you explicitly have to say
that something is virtual in Python.
Like in Java, there's no such need.
Everything can be overridden.
Let's give a very boring example just to show.
So, class eg, that's just an example,
iterates from object, that means it has no real basis,
just the multi-generic object of them all.
It starts with assigning an empty list to cla,
so that's a class attribute.
The class itself holds the list,
and then it has a def
with the special name dunder init dunder.
That's the initializer
for the instance of the class which is called self.
And so it binds the new dictionary,
empty dictionary, to self.ins,
which is an attribute of the instance,
not of the whole class.
And then, we have two methods.
One does append to the class attribute,
and this one does an insertion into the instance attribute.
How we instantly change classes is simply by calling them.
There is no need
for such redundant operators as new.
We just call the class
and each time we call the class, we get a fresh new instance.
So, having just initiated them,
I see that the cla of the instances
is the empty list
because since the instance doesn't have one,
it goes back to the class
and the ins our empty dictionary,
there's one per instance.
Once I've called a few methods,
note that the cla is the same for both
while the ins has changed because one is a class attribute
and the other is the instance attribute.
So, if I ask are they identical objects?
Are they the same object for the cla?
The answer is true because they both go to the class,
while if I ask it for the ins,
which is by instance, then the answer is false.
They're not the same object.
When, uh-- Python is kind of peculiar
among very high-level languages because it tries to make
everything very explicit for you.
Remember that rule in the Zen of Python:
explicit is better than implicit.
In particular, what does it mean
to look up an attribute on something
that is very clearly, very detailedly specified?
So, for example,
normally something like inst, dot, method, arg1, arg2
will be exactly like looking at the method
on the type of the instance and passing the instance
as an implied first paragraph.
More generally, whenever you do inst dot name,
whether you are about to call it or not doesn't matter.
There's a single namespace
for executable and non-executable attributes.
First of all, the string name is looked up
as a possible key into a dictionary
which belongs to the instance.
Every object, just about every object,
owns a dictionary with a special name,
dunder, dict, dunder, which is basically
where all of its attributes are organized.
If it's not there, then we try the same thing
for the type that is the class.
Type and class are more or less synonyms in Python,
except class is a key word
and type is a normal identifier.
If it's there, then we try exactly the same thing
along all the bases.
The base classes are recorded as one
of the attributes of the type,
as part of the class statement.
If we haven't found it in any of the bases,
then we try a special method, dunder getattr dunder,
which is basically there to let you compute
attributes on the fly
just in time when necessary.
If that works, then it's a result,
otherwise you finally get an attribute error
which is an exception meaning
that attribute name does not exist anywhere
in the search space, for this instance.
Now, subclassing,
and this is where overriding gets in,
means you can specify that eg is a base class of sub,
and then you can define the same thing,
the same name, meth 2 in this case,
that you had already defined in the base class
and this is an override.
In the override, you can call the method
on the base class explicitly by base class dot meth2.
And remember, in this case,
you have to pass self explicitly
or you can do it implicitly with a super,
which is a kind of magical piece of--
a magic built-in which does the look-up
in the base classes for you.
So, another example is that you can override data as well.
So, I've defined a subclass of list
where every append is actually done twice.
So, basically every time you append something,
it's appended twice to the list.
Not particularly useful, just an example.
The point is to show that when you subclass,
you can override a piece of data,
in this case cla,
just as much as you can override a method.
Again, there is no distinction,
just define it with a different name.
Every use of that attribute
will use the overridden attribute.
Another very useful concept is that of a property.
Suppose you have two method, call them getter and setter.
Getter just takes the object and returns a value.
Setter takes the object and a value
and somehow sets the value appropriately
into the object.
You can give a very nice syntax
by calling the built-in property with a getter and setter,
as part of the class assigning it to a name.
Now, every time you access that name,
Python internally calls the getter method for you.
And every time you assign something to that name,
it calls the setter method for you.
So, that's very nice syntax
to be substituted to do function calls,
do method calls.
Note that I did say you cannot override the equal.
That only applies
when the left hand side is a simple identifier.
When the left hand side is like here,
a attribute excess, you can actually do
pretty advanced stuff even to the assignment,
and property does it for you.
Why am I dwelling so much on properties?
Because there's a bean going around
particularly in Java,
but to some extent in C++ as well,
that you should not expose attribute.
To keep flexibility, you should hide attributes
behind getter and setter methods.
Thanks to properties, this is useless in Python.
You just expose the interesting attributes directly.
If and when you need a getter and a setter,
you write them
and you drop them into a property,
so that all client-code using your class
doesn't need to be changed.
It can still assign to attributes C attributes,
and then method calls will happen intrinsically
on your behalf.
Avoid boilerplate. Don't waste pixels.
Do not code this kind of thing, just name the attribute
without the leading underscore so it's visible
and let people use them and rapid,
if and when that becomes necessary.
I did mention Python has operator overloading.
How does it do operator overloading
is by defining a huge number of special names.
Special names start and end with two underscores.
Two underscores,
you could say underscore-underscore,
but that's a bit long,
so a common way of pronouncing it
is dunder, for double-underscore.
Anything beginning and ending with dunder is reserved
to the Python language.
Do not use this form of identifier
for your own arbitrary names.
They could conflict with special names
in the future.
There's a lot of things you can do with special method.
There's a constructor, new, initializer, init.
Note there is a difference
between constructing and initializing.
If you're familiar
with the so-called two-step constructor design pattern,
Python gives that to you.
New actually makes a new object that's still bare,
and init let's you initialize the object.
Del, which is what happens when the object goes away.
It's not a destructor, it's more of a finalizer.
If you're familiar with C++ destructors,
that's a bit different.
This is more like a Java finalized.
And then, there is a way to convert things,
not just to wrap a string into flow,
too complex, and so on.
Many ways to compare things:
less than, greater than, equal.
A lot of method for arithmetic:
addition, subtraction, multiplication.
Methods to make it like a function,
so callable, hashable,
so it can go into a setattr dictionary.
Dealing with attempts to get set and delete attributes
or items as in a container, other container stuff.
Getting set to define what are called descriptors.
Enter and exit describe what are called contexts.
So, there's a huge number of special methods
you may want to define.
The point to retain,
you will find out the special methods,
which are basically a high syntax convenience,
if and when you need them.
Python will call the type special method for you
when you attempt the appropriate operation.
So, for example,
when you write foo, open, close,
Python looks up the type of foo,
finds the dunder call method, and that's what gets called.
It would be like operator, open, close, paren in C++
and sometimes it does so in a more structured way.
For example, if you ask if a greater than b,
and a doesn't have a gt method,
Python continues by looking if b as an lt method,
so that it's basically going to compute
as a second possibility if b is less than a
rather than if a is greater than b.
More generally, built-ins do things right for you,
but let's get a simple example first.
Remember the Fibonacci generator we did a few minutes ago?
This is the same thing done much more detailedly as a class.
At initialization, we set one to i and j.
It's got to be self.i and self.j
because of their instance attribute.
We have to specify an iterator by saying
if somebody wants to iterate and use me directly.
And then there's the next,
which is unfortunately not marked by dunder.
This is fixed in Python 3.0, but this is in Python 2.
We didn't place the double-underscore,
but it is a special method because it's called,
for example, by the for loop.
And this does essentially the same thing
that we did more simply in the generator,
but it does it explicitly with self dot variables,
and this is basically intended to be used
in exactly the same way as the generator I gave before.
So, it's exactly the same semantics.
Basically, what the generator does
is generate for you an object which is more or less like this.
Okay, built-in functions are what calls the special methods.
You never call a special method directly, essentially.
Think of the double-underscore
as a way to make special method ugly,
so you're not even tempted to call them.
Never call x, dot, dunder, len, open, close.
Call len, open, x, close, it will do the right job,
which in this case is calling dunder len.
Another example, abs, don't call this directly.
You don't really know what the abs built-in does.
It will probably call dunder abs,
but suppose it doesn't find it?
It may be able to do a change sign test
if greater than zero changes the sign for you.
It may or may not,
but always go through the built-in function.
There are a lot of built-in functions,
not just ones corresponding directly to special methods.
We'll see some of them in examples.
And also, these are just the ones
that are always available to you,
but there's a lot in the standard library
that you want to use just as much,
and these are all absolutely crucial modules
in the standard library
you'll need to be very familiar with
to do effective use of Python.
As I said, rather than going into these in details,
I like to give an example.
Suppose we have a readable file somewhere.
We'd like to make an index that is a map
from the words in the file
to the line numbers where that word is found.
So, first we build the map, and then we emit it.
To build a map, we start with an empty dictionary.
A dictionary is natural way to represent a mapping.
We use with open, filename, as f,
syntaxed to open it and guarantee it's closed
as soon as we're out out of this block.
We use enumerate to get line numbers
and generally looping on a file
gives you the lines of the files or strings.
And then, we call the split method
which breaks a line into a list of words
and to loop over that.
And here, we use the setdefault, which is a bit complicated
because basically what it does is
it looks up this key in the dictionary.
If it's there, it returns what corresponds to it,
The corresponding value.
If it's not, then it sets the second argument
as the new value for the key.
So, it's kind of complicated,
but basically you take either a list
already corresponding to the word
or a new empty list.
In either case, you append the line number to it,
so the line numbers accumulate.
And then, once we have our index,
we can just emit it to standard output
in alphabetical order.
So, for that, we use a sorted built-in
which let's you loop on it in a sorted way,
so sorted order for strings is alphabetical,
so you loop on the words in order and you print them.
I'm using the percent for marking here
just to have a colon attached to it,
and then we print the line numbers.
Note the comma here, so we don't break the line,
and finally a print without anything
just to break the line at the very end.
We can do it slightly simpler
if we're familiar with the standard library
because the collections module has a sub-type of dict
known as a defaultdict.
A defaultdict is something that if you try to index it
and the index isn't there, the key isn't there,
instead of raising a key error
it calls something to make the new item.
In this case, we want to call the type list without arguments
to make an empty list and set it there.
So, once we've made a collections defaultdict list
instead of a plain dictionary, we can simply use
index word append(n) instead of having to go
the complicated set default route.
Everything else,
every other bit of code in this slide,
is just the same.
Other things we could do that could be interesting:
once we have this index, what about getting
the seven most popular words in the text files?
What are the seven words
that appear on most lines in this file?
Well, for this purpose, we want to use heapq.
Typically, it's a typical priority heap operation,
if you're familiar with your algorithms,
and specifically it exposes heapq.nlargest.
Give me the n largest items of a certain collection
and the n we want is seven, the collection is indx,
and we get to define what is the key extractor
for the comparison.
So, key equal index.get means
we're not getting the seven alphabetically largest,
but the seven whose corresponding value index
is largest, and this will work.
And this is an end walk form that is--
I don't want all the lines with the word rabbit
or all the lines with the word hare.
I want the lines with both rabbit and hare.
How am I gonna find them?
This is a set intersection problem.
So, I'm basically going to simply make a list
of all the words I'm looking for,
pop the first item,
and make a set out of the corresponding line numbers,
and then I'm using the ampersand operator
to do intersection
specifically in the form ampersand equals,
so intersect in place.
Basically, all binary operators have an equal form,
so, for example, plus equal increments in place
and ampersand equal intersects in place.
And, in the end, of course, I'm careful to sort it
so it will print out nicely in alphabet, in line order.
Because the set, by making a set,
I get very fast intersection and so on,
but I lose significant order
because a set is a hush table, so it basically, as I mentioned,
doesn't have an intrinsic order.
But when I do need it, I can always sort it on the fly.
So, I mentioned that that standard library
is full of very useful modules.
To access anything in the standard library
or any other module, you have to import the module
and this is how you do it.
I already showed it in a couple of examples,
but import modulename is the economical way.
So, for example,
when I wanted to use something in collections,
I started with import collections.
This basically makes the name collections available
to my program.
And the name collections refers to that module,
so then I can use the attributes of the module
in my program.
If a module is contained in a package,
you will see exactly what a package is in a second,
then I cannot just import it
because it's hiding inside a package.
I have to tell Python
from what package to import it.
This would be a package named package
inside a package named given
inside a package named some.
So, some dot given dot package,
Python would look for some inside of given,
inside of package, and inside that for modulename.
In either way, you basically get the modulename
as part of your namespace and you can access it.
And then, you want function blah
from the module,
just do modulename dot blah and you're done.
So, there are some other ways.
I did mention that we'd rather have only one way,
but sometimes it's kind of inevitable
to offer more than one.
Although in practice these two will
make you happy all the time,
you should be aware of the others
because you'll see them used in Python examples and code.
One possibility is you can import the module
under an assumed name.
Basically, put a false mustache on the module
so it makes believe in your namespace
that it's named something different
and that is the as clause.
So, for example,
suppose somebody gives you a name that is--
this name is far too long
and this name is far too long to use conveniently,
and you would like to have a shorter name for it.
Well, this is how you do it.
Import thisnameisfartoolong as a set,
and now instead of thisnameisfartoolong dot blah,
you can use zed dot blah.
And this is indeed sometimes useful
if you ever have to handle modules
whose names are far too long.
Tips I don't recommend is
instead of getting the module into your namespace,
reach into the module namespace and grab one thing into yours
from thisnameisfartoolong import blah.
This will work, but throughout the rest of your code
people who are reading it, including yourself
when you're maintaining it in six months,
will wonder where's this blah from,
and you will have to go and look for when it was imported.
So, I'd rather always see modulename.blah or z.blah
to get the immediate thing.
It's not a top level name, it's coming from something else.
And the very worst thing is to import star,
which basically grabs all of the items in the namespace
and injects them into yours.
Don't do that.
That is a guaranteed way to cause yourself headaches.
It's handy if you're at the interpreter prompt
because say you're doing a lot of math interpreter prompt,
you don't want to say
math dot syn, math dot cos, math dot tan, math dot atan.
You want syn, cos, tan, atan
and all the mathematical functions
to be part of the top namespace
because it's too much typing otherwise.
That's possibly the only reasonable way
to use the import star.
And this is, to clarify what I just--
So, I want to compute an arctan with two inputs.
The normal way is to import math,
and then use math.atan2 of x and y,
which in this case is this value.
If I tried even after importing to just use atan2,
this would give a NameError exception
because, in my current namespace,
there is no name atan2 defined.
Remember, I got math into my namespace,
but atan2 is still within namespace math.
I could do, from math import, atan2.
In this case, this would inject atan2
in my namespace.
Sometimes, when you're doing this interactively
like at the prompt, it's kind of nice.
In real programs,
it tends to be confusing and this goes squared.
From math import star,
now you have 25 mathematical functions
in your namespace which is handy,
but don't do it in real programs,
only in interactive use.
So, this is nice about how I use other modules,
but how do I make my own modules?
Well, that's easy.
Any Python program is a module.
Any dot py--wot dot py is a module.
Just put it in the appropriate directory,
say the same directory as the one
it's being imported from, the rules depend a bit on--
but for Google App Engine that's what you do.
And then, any other module in that directory
can do import wot
and Python will look for wot.py.
The directories are listed in sys.path,
and they need not be directories strictly.
In general, they could be zipped files.
So, instead of having a file system,
you can basically zip everything up
and Python will find it anyway.
That's kind of handy.
There are other things that you can import.
You could import byte code files.
You could import files coded in C
or other languages for Python.
So, the extensions are pyc, pyd,
or sometimes dot so on Linux,
but this wouldn't work in Google App Engine.
Google App Engine only accepts the dot py files,
so you want to make sure you place the dot py itself.
It handles the compilation to bytecode
in making sure the bytecode is secure
and everything else under the covers.
So, this variant is nice
for other kind of uses of Python,
but don't rely on it on the application engine.
So, what's a module?
Well, it's an object. Everything is an object.
It's got attributes.
That's basically everything it has.
It doesn't have anything sophisticated,
only attributes.
The attributes of the module object
are what you could see as the top-level names
of the module as a source.
So, for example, say we have a module
whose dot py source is only x equals 23.
This is called wot dot py.
If we import wot,
then the only thing there is in it is wot dot x,
which is 23 and that's all.
Besides assignment, names can be bound
by class, def, import, and from.
Class name binds the name, def name binds the name,
import name binds the name,
from foo bar import name binds the name.
The attributes of a module are also known
as the global variables of that module.
Note there are no such things as global globals in Python.
Globals are always per module.
Within that module, you can of course access them
as bare names from other modules,
who will access them by modulename dot variablename.
Note again, well, variables,
but can they be functions or classes?
Yes, there's no distinction.
Names are names are names.
Whether they're callable or not, they're just attributes.
You can also bind and unbind module attributes
from the outside,
a practice known as monkey patching.
I would strongly recommend against that
in production code.
Unless you're fixing some bug in a library
which you're not allowed to edit, don't do it.
Sometimes it's handy for testing,
but there are better ways.
Use the dependency injection design pattern,
you will do your testing in a much more systematic way.
I would strongly recommend avoiding monkey patching.
If you look for monkey patching on the web,
you'll find very strong diatribes for and against it
for every language that's supported besides Python
including Ruby, JavaScript, and so on.
It moves a lot of emotion.
Basically, people who are programmers at heart
don't want to do it,
but all these languages are used by people
who need to program but aren't programmers.
They're like hardware experts
or web experts and so on, and they don't see why
they should do things the proper way,
but I'll leave that to the web.
It's important to notice that modules are singletons.
They are the most natural and Pythonic form of singletons.
They're automatic singletons. What does it mean?
It means that if you import a module more than once,
the first import is treated very differently
from all the following ones.
The first import finds the module somewhere,
loads its composite on the fly if needed
or takes the compiled form,
executes the body of the module,
so the module object is properly populated,
and places the module in a system directory
known as sys dot modules.
Sys is itself a module, so you can import sys
if you want to play around with it,
and sys dot modules is a dictionary.
The keys are the names of modules.
The values are the module objects.
So, when you import the module again,
Python will first look into sys.modules.
It's very, very fast to check if something is in a dictionary.
If it finds the name,
that's what it gives you immediately,
basically one instruction.
No second loading, no further execution--instant.
Your singleton is right there.
So, some people who love the singleton design pattern
say, "Well, yeah, this kind of works,
"but it only gives me a module.
It can't--I can choose a class instance there."
So, for example, I want my singleton object
to support addition or greater than comparison,
and I can't do that because modules are simple objects.
Even if you define a function called
under, under, add, under, under, it won't do anything.
It's not as special
because it's not part of the type of the module.
Remember, the special methods only work
if they're in the type, not in the instance.
So, this is how you stick something that's not a module
into sys.modules.
Some people consider it a trick,
I consider it a perfectly legitimate use
of a mechanism that Python exposes for your users.
Of course, it's a very advanced one,
but it's a way to make--
Oh, by the way, dunder name is a special attribute,
which is the name of the current module.
And that's how you basically make your class
and underscoring front
just to indicate it's actually private
and intended for internal purposes,
and there you go.
Packages are essentially modules containing other modules
and, of course, there is no limit
to nesting you can do this way.
So, you can have sub-packages and so on.
In practice, it lives typically in a file system
or it could be in a zipped file.
In a directory--
And how you distinguish directories,
which are just directories
from directories which are packages,
is by the presence of a file called
dunder, init, under, under, init, under, under, dot, py.
Python will only consider for a package as such
a directory which has a file by that name.
So, that file contains the module body.
It's often empty because you don't necessarily want
the package to do anything else
than containing other modules.
And so, you don't need to do anything in init py,
but you still need to have it there
because it acts as a flag to Python
that, yes, this is not just any directory,
this is a package.
The modules inside the packages are
basically all the py files in that directory.
And that is done for you by Python,
so you don't need to worry about it.
And if you want sub-packages,
you will have sub-directories which in turn need to contain
the special dunder init dunder file.
Note that the parent directory of the directory package
must be on sys.path.
And once you have a package with a module bar inside it,
you could import foo dot bar,
that will work, it will bind foo dot bar.
So, you'll bind foo and give it an attribute bar,
but a more normal, more common approach is
to use a from foo import bar
that only binds the name bar directly.
So, basically, I like my imports
to always bind a module to a name,
and this is what it does.
Okay, now we've basically covered,
well, so to speak the Python language
and a little bit of the built-ins
in a very tiny fraction of the standard libraries.
One of the mottos of Python is "Batteries Included."
It means that the standard Python library
has more than 200 modules.
That's production modules.
There is many more for unit-tests,
encoders, decoders, demos of various kinds, and so on.
Some are pure Python, some are coded in C.
The application engine will support
any pure Python module and most C-coded modules
that are part of the standard library.
There are specific limitations
that will be covered in the App Engine sessions.
App Engine does not allow you to do threads,
does not allow you to do sockets.
That's not because they're encoded in C,
it's because of its very specific execution model.
Moreover, App Engine will add some specific APIs
such as datastore, users, urlfetch, mail,
and will support any pure Python module you use.
You just put as part of your App Engine application
any dot py that doesn't use anything except other dot pys,
and you'll be fine.
It will be supported.
But back to the standard library,
the standard library is so much larger than language
that the time it takes for an expert programmer
to master Python the language is maybe a couple of days.
Unfortunately, when he's done that,
he's still got a lot of work to do
because he probably wants to know the built-ins
and the special methods,
and the metaprogramming introspection and so on
and that takes another say three days.
Good, but now he needs to get started
with a standard library.
Out of the 200 modules, there's maybe 20 or 30
which are an absolute must
and those will take another 10 days or a bit more.
And when those are done,
to really claim they've mastered Python
as opposed to the Python language,
they probably really want to have a pretty good idea
of what's everywhere in the standard library
that's longer than all of the above.
And when they're done with all this,
well, there's third-party offerings.
I don't know how familiar you are with Monty Python,
in honor of which Python the language is named.
This is a scene from their cheese shop sketch,
which I strongly recommend.
Cheeseshop dot python dot org is a little bit less green,
but better supplies than this cheese shop.
It's got 4,000 packages and counting,
but I've given up on printing the exact name
because whatever I looked up yesterday
would probably already be false today.
There's more than one package being added every day.
It's always fun to roll your own.
I mean, if we're programmers,
it is because we like to program.
Say I need this functionality, I'll roll my own.
Unfortunately, that's not a very professional attitude.
You probably want to see who else already spent
person-years of effort into doing this,
and this is likely to be on the cheese shop.
How long does it take to learn all 4,000 packages?
Well, I have no idea.
I suspect that if you put yourself to it,
by the time you're finished there will be 4,000 more,
so you'd never be finished.
So, I have no real suggestion here except ask around.
Go to the mailing list or go to the usenet group
and just ask, "I need to do this and that.
Is there some good module you could recommend?"
You'll probably get several recommendations.
That's still a lot of work for you to pick the best one,
but better than nothing.
Remember, online resources, don't forget those.
They're all great.
You really need that
to do really good work in Python.
And here we are.
We're ready for questions and answers.
We've got about eight minutes, so I can field quite a few.
If you can please walk to the mic in the corridor
and speak your question in the mic.
man: One question.
Google is hosting Dojo and jQuery,
the JavaScript libraries.
Martelli: I'm sorry?
man: Google is right now hosting for JavaScript development--
Martelli: I can barely-- There's a lot of--
man: Is this better?
Martelli: Yes, please.
man: Okay.
Google is hosting some JavaScript libraries
that you can load directly from Google
like jQuery and Dojo and so forth.
I wonder if there's plans for hosting
the stack of third-party modules from Python
that you can pull directly?
Martelli: The reason it makes sense--
I'm not a JavaScript expert, but I believe
that the JavaScript libraries you require, you mentioned,
would typically be mentioned in another JavaScript file
or a web page and having a url,
at which there's certain to be some use.
Python doesn't work like that.
Python doesn't load code transparently
from across the web.
You put your code right there.
So, there's a lot of hosting of open source code
that we do at code.google.com,
but that's something you download
and integrate with your programs.
It doesn't live at any specific url
because that basically would serve no useful purposes.
man: Yeah, I know.
I just thought it would be a cool idea, sorry.
Second man: I see a lot of similarities
between Python and Ruby,
and I'm wondering what you think are
the major distinctions between the two languages?
Martelli: You could look for my name
and the words Ruby and Python,
and you'll find about half a dozen places
where I've written pretty long essays.
But to summarize, if I was doing a spanning tree
on the dag of distances of programming languages,
I think the first two qualities as being the closest
would be Ruby and Python.
From the point of view of just about anybody experienced
in just about any other languages,
they're closer
than any two other languages could possibly be.
My favorite, being Italian, is that they're like
spaghettini and capelli d'angelo.
Yes, I could explain the difference.
It's gonna be hard if you're not Italian,
but I could.
Like whether the microscopical round
are rounded or squared,
but in practice, if you're eating them,
I challenge you to tell the difference
unless you're Italian, in which case you can.
But it's like in the genes.
Second man chuckling: So, I guess my question--
Martelli: So, they're so similar.
You can dwell on the differences,
but why?
In terms of practical consideration,
Python is much more mature,
so the implementation are of a much higher quality
because they've been around so much longer
and Ruby's very fashionable.
So, do you want something solid and working
or do you want something that's, like,
in the news and top of fashion?
Your pick.
[audience laughing]
I'm an engineer. I want solid working stuff.
Somebody who'd rather be cool can make a different choice.
Second man: Are rails and jango also sort of analogs?
Martelli: Yes, except in this case
rails has been around longer.
So, in this case, rails is more mature than jango.
Jango's getting there,
but it's not quite as rich as rails is.
Second man: Thank you.
Martelli: Mm-hmm.
Third man: How much of this would we have to unlearn
when Python 3 comes out?
Martelli: [sighs] So, I did specifically mention
that you can finally unlearn the distinction
between plain strings and unicode strings.
Just about everything else I mentioned
is still true in Python 3.
Python 3 removes a lot of stuff
that has to be kept around for backwards compatibility
because as long as we're talking about Python 2-dot-something,
well, it has to keep compatibility.
That's a constraint we gave ourselves.
Python 3 can break compatibility
and therefore can eliminate redundant ways to do things
that you don't really need anymore.
You do have--I did mention the next gains
double-underscores before and after,
that's about it.
Third man: Well, I've heard that very simple programs
like the "Hello World" program--
Martelli: Oh, yeah, print becomes a function
instead of a statement, right,
so you'd need parentheses then.
Third man: Okay. All right.
Martelli: Pretty simple.
Fourth man: So, you mentioned
that monkey patching is considered bad, but--
Martelli: By some,
and it's considered wonderful by others and--
Fourth man: In the Python community,
I guess you in particular have an issue
with monkey patching, but not directly assigning
to a module to add additional functionality.
Is there a fundamental difference there
that you see or--
Martelli: Fundamental difference between what and what?
Fourth man: Between monkey patching
and directly assigning to a module?
Um, or are they--
From my point of view, they're both--
Martelli: Assigning-- assigning--
modifying a module from outside
is monkey patching.
That's my definition of monkey patching.
Fourth man: I guess, in particular,
is there a reason that one is a useful feature
of the language and one is something to avoid?
Martelli: Mm, I-- It's useful
to be able to fix bugs without editing the buggy code,
but it's not something that you should need to do
in the long run in production.
I'd much rather fix the code,
edit the source code and fix it.
Fourth man: Okay, thank you.
Fifth man: Are there any plans
to formally support the notion of interfaces?
I know frameworks like--
Martelli: Python 3 has a standard library module
called Abstract Base Classes, ABC,
which is more powerful than interfaces in some way,
although some people think it's less powerful than others.
That's basically what we'll have as part
of the standard library.
Otherwise, you can get third-party packages
which support interfaces strictly,
but there's some stuff
that adds absolutely no functionality,
but it makes you look good.
Fifth man: Is this somehow at odds
with the idea that you want to trust programmers?
Martelli: No, actually it's--it's--
There's nothing against trusting programmers
and letting people use abstract base classes.
The point is eliminate some redundancy.
An interface per se is a structural--
well, a way to structure your stuff.
It doesn't really eliminate much redundancy.
So, I'd normally rather--
I mean, if you look at some of the ABCs in Python 3.0,
they don't actually use the ability
to add functionality.
They work as if they were interfaces, so...
And you can use--
So, basically, instead of just having one way
when you're asking me is something a container,
I can tell you, well, it is a container
if and only if it matches the container
of such base classes, that kind of thing.
Fifth man: Thanks.
woman: We're out of time.
Martelli: Okay. Sorry.
Right, so we're done. Thank you very much.
[audience applauding]