Thursday, January 8, 2015

PyTutorial: Advanced topics in Python

In this post I want to go over some more advanced Python programming techniques.  Where as functions and classes and packages are pretty much standard for every normal programming language out there, some of these techniques may not exist in all of them.  Still, they're pretty awesome, and definitely worth learning.

Iterate This!

Iterators are a nice, standard way of getting all the items in a container.  A container can be anything it with items, and items can be anything.  In a strange twist, a container can even have an indefinite (even endless) number of items.  But how can you get infinite items?  Well, the idea is that you don't get all these items at once.  Instead, you get them one at a time.  

Imagine you have a list of things.  You can ask the list, "give me your first thing."  And then ask for the "next thing", and the "next thing after that", etc. etc.  This way, the list can keep returning you things, and if it ever runs out, it'll just say "Stop!  That's enough!"   As you can see, because you keep asking for the "next" item in the container, in theory, the container can be endless.  It'll just keeping giving you more and more junk.   If you can't imagine how this works yet, it's ok.  Just bare with me, the examples will enlighten.

It's important to note that iterators don't really exist in and of themselves.  They are just a standard method for programming a way of using things that contain data.  (Whoa! That was a long sentence.)  If this sounds "abstract", well, it's because it is. :)

The good news is that you're already quite familiar with many containers that make an iterator interface available, as well as how to use iterators.  Really, you do!

Some of these containers are lists, dicts, sets, and even strings!  And how do you iterate through them?  Well, you use that trusted for loop! :)  Check this out:


Now you probably feel cheated.  How is this in any way advanced?  Well, for starters, just because you know about it and Python makes it easy, doesn't mean it's not advanced. :)  

But here is the challenge: how do you create an iterable container?  And why would you event want to?  :)

I'll start with the second answer.  You won't always want to.  In fact, in most cases, you can just return whatever data you want to iterate in another container that is iterable (such as a list, dict, set, etc.)  This is usually the easiest, most elegant solution for most simple problems.  But sometimes, this approach breaks.  One way this can break is when the data is indefinite. For example, say you want to iterate through all the letters that someone types in and capitalize them.  Without iterators, you have to wait until the user finished typing in all her letters, and only then capitalize them.  But what if you want to iterate through them as she types them in? One at a time?  Well, for that, you need iterators.  That's because you only keep on asking for the "next" character that she typed, not for all of them at once.

Another example is if you need to iterate through a lot of data.  Say you have a file that contains 1,000,000,000,000 lines of code.  (fun!) And you want to analyze each line of code to see if it's potentially valid Python.  Without iterators, you'd have to to load the entire file into memory before you would be able to analyze it.  And loading a file that large will simply crash your computer.  With iterators, you can analyze each line as you read it, one at a time.  That's because you only keep on asking for the "next" line in the file, not the whole one.

These are practical considerations, but there are also situations in which using iterators will make your code a lot more flexible as well as more elegant. But more on this soon.

So how the heck do we create iterators??  Well, the hard way is to use Python's iterator interface.  It's really not that hard, I'm only saying that it is because Python has an even easier and much cooler way of creating interfaces..........

Generate This!

Generators are awesome.  You can think of them as on-the-fly list creators.  Or more precisely, on-the-fly iterator creators.  Basically, generators are a type of functions that instead of returning something and being done, continuously yield things before being done.  It sounds more complicated that it really is.  Check this out:

Type it in and see what happens.  In this example, generate_this() yields 4 items.  Later, when generate_this() is used in the for loop, x gets all four items!  What's interesting here is that you can normally call a function in a for loop only if it returns an iterable container (such as a list).  But now, our function does no such thing!  Instead, it yields 4 different items!  What the heck?  Want to see something even cooler?  Add a print statement (for debugging) between each yield statement, and see what happens.  Neat huh?  Python magic.   In a nutshell, by having a function yield values, instead of returning them, the function becomes a generator: a container that can be iterated through!

Such generators can yield whatever they want and in any way the want.  Here's another example:

And another example:

Generators can even call each other:

Try them out!  
Assignment 1: Play around with generators.  Make them yield the world!
Remember my example of iterating through what someone types in and then capitalize it?  Well, you can write a generator that in an endless while loop, keeps getting some text from the user (using raw_input).  If the user types in "done", the loop breaks and the generator ends.  Otherwise, the generator yields that that user input.  Then, you can iterate through the values of that generator using a for loop, and do stuff with that input.  For instance, you can print the capitalized text, or the length of the text, or whatever you want.  Get it?
Assignment 2: do that. :)
 So far so good.  Now I want to give you an example of how generators can make your code more elegant.  Do you remember that good old Fibonacci Series?  That endless series of numbers that for some reason computer scientists always love to computer?  Great!

First, let's do things the hard and ugly way:
Assignment 3: write a function, fib1, that takes in an argument n, and returns the first n items in the series.  For example fib1(6) will return a list, [1, 1, 2, 3, 5, 8]
Assignment 4: write a function, fib2, that takes in an argument n, and returns a list of all the elements that are smaller than n.  For example fib2(20) will return [1, 1, 2, 3, 5, 8, 13]
Assignment 5: write a function, fib3, that takes in an argument n, and returns all the even elements that are smaller than n.  For example fib3(20) will return [2, 8].
Do you see where I'm going here?  All your functions, fib1, fib2, and fib3, just want to do stuff with the Fibonacci Series, but because of the other constraints, the function gets ugly and error prone.  What if there was a way that you can write the code for the Fibonacci Series once, and then use it as you see fit?  Hmmmm......
Assignment 6: write a generator, gen_fib, that takes no arguments, and yields all the items in the series.  This generator never ends...
Now you can call gen_fib() in a for loop and do with the items whatever you want!  When you break out of the for loop, you're done, and the generator basically "goes away."  (That is, you don't have to worry about something running forever).  The nice thing here, though, is that you can decide when to break out of the loop outside of the gen_fib function!  This lets your reuse the Fibonacci Series generator code in different situations.
Assignment 7: re-write fib1 from above to take advantage of your gen_fib generator.  Just keep appending the first n items that gen_fib returns into a list, and then return that list.  You don't have to use Python's built-in function, enumerate, but it could make your code that much cleaner.
Assignment 8: re-write fib2 & fib3 to take advantage of gen_fib.
Wonderful! :) Isn't that nicer?  But can you think of how to make your redo of fib3 even cooler??  What if instead of returning all the even elements that are smaller than n, it would, instead yield all the even elements, indefinitely?    The way to do this is rather simple: as you iterate through the items that gen_fib yield, you can check if that item is even.  If not, ignore it and continue with the next iteration in the loop.  But if it's even, yield it, and then continue with the loop.
Assignment 9: do that.  But call that function gen_fib_even()
Assignment 10: rewrite fib3 to take advantage of gen_fib_even().
Way cool.  If you want to see more cool generator examples, check out David Beazley's PDF.

But if you think that generators can make your code all pretty, wait until you hear about decorators... :)

Decorate That!

Put simply, decorators are cute little ways of letting you add features to a function (or even a class).  More technically, it's a function designed to modify the behavior of other functions.  This may sound a bit strange, and in some ways it is. :)  But a few use-cases may help explain.

Say you have some great big program, perhaps a Minesweeper game, and it's not working all too well.  In fact, you want to debug it a bit.  To do so, you want to see a trace of which functions are called when.   Wouldn't it be nice if you could just tell python to print something when a function is called, and then print something when it exits?  Say you had a function, called some_func, wouldn't it be nice to be able to do this:

And then, later, when calling some_func(...), trace information would magically appear as well?

Well, guess what?!  Decorators let you do just that!  In Python, you can decorate a function by using the at, '@', symbol with the decorator name right before the function definition.

Unfortunately, writing decorators can be a bit of pain.  Fortunately, there is a decorator out there that helps us write simple decorators! (After all, decorators are just functions, and thus can be decorated.)  Check this out:

There are a few things to note here:

  • To make the trace function a decorator, we are decorating it with "@decorate" (decorate is the name of the decorator that turns functions into decorators.)
  • When writing decorators with decorate, the function (in our example, trace) takes in 3 kinds of parameters:
    1. a function
    2. *args: all the positional parameters
    3. **kwargs: all the keyword parameters
  • You can have the decorator function do anything you want, but you need to remember to call the function that it's decorating.  That's done by calling f(*args, **kwargs).
    (You can read more about *args and **kwargs here and here.)
  • In our example, we also want to return that function's return value.
Now you can use your decorator to decorate other functions, as I demonstrated above on some_func().

All this may seem a bit cryptic, and it is.  Unfortunately, creating decorators isn't super simple, and there is more than one way to write them.  My goal here was to expose you to them, and show you how to use them, as well as how to write simple ones.  If you want to write more complicated ones, such as decorators that accept parameter values, think:

@trace(verbose=True)
def my_func(...):
   ....

Well, I recommend you read some other people's docs.  There is some documentation on the @decorator that I showed you (I also took the "trace" decorator example from there, and they have some other good ones too).  And here is a nice discussion that goes deeper on what decorators are and how they work.  There's also a really great Python Wiki full of useful decorators (and their source code, of course.)

But ultimately, it's more important that you know how to use them (@decorator_name), and to keep in mind that there is a way to create them as well.  But there is no need to memorize how to create them.

Now, decorators are interesting in theory, but I want to show you one more somewhat-practical example. The next section discusses another neat feature in Python, and in it, another simple decorator awaits...

Putting it in Context!

Another pretty cool feature in python is with.  It's goal is to help you properly deal with resources.  A resource is anything that you need to "take", and eventually "return".  A typical example is a file.  If you want to read it, you need to first open it (take the resource), and after reading it, you should close the file (return the resource.)  

Here's the less amazing way to read stuff from a file:
This way of coding is OK, so long as what you do is fairly simple, and nothing can go wrong.  Imagine a hypothetical in which the method capitalize() raises an exception.  It would mean that the above code will finish without closing the file.  Or look at a slightly more complicated example:
Not only will the function return without closing the file if an exception is raised, but also if a line that contains "stop here" is reached.  While these examples are rather contrived, you can see how this can be a problem.  Of course, there are work-arounds, like using a try-finally clause, like this:
The problem is that this sort of work-arounds may be confusing, because the try-except-finally clause is normally used for error-handling, not resource-handling.  Another problem with this approach is handling multiple resources at once, you need to have lots of try-finally clauses.  It may get a little hairy.

SOOOO.... what's Python's solution?  As you may have guessed, it's the with clause. :)  The above example can be re-written like this:
One nice thing to notice is that there is no need for the f.close() call.  The reason is that with takes care of all that for you.  A resource that supports with (and just about all of them do) has a built-in mechanism that tells python how to release it.  In our file example, it calls close() automatically when the code inside the with ends, where when all the lines were read, a return was hit, or an exception raised.  Using with pretty much guarantees that a resource will be closed appropriately.

A nice little feature that with has is allowing the use of multiple resources simultaneously.  Say you wanted to do something with three files, you could do this:
It doesn't get much better than this.

Of course, if one of these files can't be opened for some reason (perhaps it doesn't exist?) then you would still get an exception that you may want to handle appropriately.  But at least you can rest assured that all the other files, if they got opened in the process, are now closed.  Like this:

Such is life.

Now you may say: that's all nice and well, but how do I create a resource that with can handle?  The answer to that, quite naturally, is ... there are two ways of doing it! :)

The first approach requires you to create a class that defines two methods: __enter__() and __exit__().  (See the docs for details.)  But just to give you an idea, if you were going to implement the file open function in this
Assignment 11: write a Timer class, that on __enter__() starts the timer (you can use time.time() to see what time it is) and on __exit__()  stops the timer.  This way, you can start & stop the timer (your resource) using wtih.
The other approach is, in a way, much simpler.  Instead of creating a class, you created a decorated generator.  :)  (You must have felt this was coming, haha.)  You can see the docs for details.  But it basically goes something like this:


First, your generator is decorated with contextlib.contextmanager.  Your generator may or may not require parameters, it's up to you.  Typically, if you expect your resource to already exist, but you're just using it now, then you probably want to have that resource as your parameter. :)  Your generator first takes the resource.  In the timer assignment, this is where you would start your timer.  Then your generator yields.  It can yield nothing or something, depending on your needs (in the next example I show you how to yield something.)  Finally, you release the resource (in the timer example, this is were you stop the timer.)  

The above generator can be used like this:

with use_resource(x):
   ...
Assignment 12: create a generator called use_timer that takes in a timer object, and start / stops it appropriately.
But, as I hinted above, this generator can also create a resource and then take it:


In this example, no parameter may be required (unless you need some to help you create the resource.)  The resource is first created, and then opened.  After that, the generator yields the resource!  This way, someone can use this generator like this:

with use_resource() as x:
   ...

Notice that x gets whatever your generator yields.  Finally, just as before, you release your resource.

That's it for now!

I may come back here and add more interesting topics. I'll leave you with a few things to read about:
  • If you're not familiar with Python's string formatting abilities, this is probably as good a time to learn as any. 
  • Python has some nice collections that can help play with data.
  • Jeff wrote a nice, short book: Writing Idiomatic Python.  It's a nice way to quickly learn some neat tricks of the Python trade...