Monday, November 30, 2015

Unit-Testing Techniques

I'm a big proponent of unit-testing. In a nutshell, unit-testing means writing one program to test that another works as expected. While not perfect (who tests the unit testers?), in my humble experience, unit tests achieves two very important results:
  1. It dramatically increases the quality of your code (i.e. less bugs.)
  2. It dramatically increases your confidence when updating your code.
The first result is pretty self-explanatory, when you test your code, you are likely to find bugs, fix them, and thus your code becomes less buggy. One important twist here is that in complicated software, there are parts that testing manually is very ineffective or impossible, and the only way to really tell that its working well is by having another program test it (more on this later.)

The other result is less obvious. Because unit-testing means writing code to test your code (as opposed to manually running your code to see that it works alright), you end up with the ability to quickly verify that your code is working as expected. This ability is very important when you want to make changes to your code long after you initially wrote it. At that point, no one will remember exactly all the details of what that particular code was designed to do, and fear that fixing one bug will end up creating new ones. It's unfortunate that in many organizations, the most important part of their code is the most buggy and most untouched. The reason for this is the fear that fixing one bug there will create a catastrophic new one, and thus the code remains as it is. Unit testing helps prevent this, because you will have the confidence that any updates you make will at the very least meet previous expectations.

Now, writing unit testing is a skill all in of itself. In many cases, writing the code to test the software may be dramatically more complicated that writing the software itself! The good news is that Python has some great tools to help you unit-test your work...

Unittest and nose

Two great tools to be familiar with are unittest and nose. While each can be used stand-alone, I find that they compliment each other well.

Let's start with unittest. It's built-in with Python, and thus you don't have to install anything to use. It's docs are great, and I highly recommend you look them over. But before you do that, I want to point some of its highlights.

To use it, one typically creates a new python file for testing purposes. You can name it "test_some_other_file.py" if you like. In it, you need to import the unittest module, and then create a class that is derived from unittest.TestCase. Then, to actually run the tests, you need to call unittest.main().  Here is a simple example that does't do anything:


If you run it (by running "python test_some_other_file.py"), you should get something about running 0 tests OK. Great!

Now, let's make it actually test stuff. To add a test, one only needs to add a method that starts with "test". Thus "testThis" and "test_now" are tests, while "notAtest" is not:
So if you run it now, it should say something about passing two tests. Yippie!

But wait, how does it know that a test passes or fails? Basically, if it calls a test method, and that method doesn't raise an exception, then it is assumed that the test passes. If it raises an AssertionError, then it assumes that the test failed. If it raises any other exception, it assumes that something else went wrong, and the test errored-out. In other words, you can think of an AssertionError as a controlled failure, and any other error as an uncontrolled failure. Both are bad. :)

An AssertionError is typically raised by an assert statement, let's take a look:
If you run it now, you should see a bunch of information.It should say that it ran 3 tests, and that test_another() got an error, with details about it. It should also say that test_that() failed. Notice that it gives you "problem!!" as added information. In an assert statement, the first expression is the test, and if it's True, then it "passes" the test. If it's False, then it fails, and if there is a comma, then what comes after it is basically a not with additional info for the tester.
Assignment 1: Create a test file that tests a bunch of random stuff.
Now, asserts are nice and all, but when they fail, don't don't give you much information. It's up to you to manually add it as a "helper note" after the comma. To make life easier for you, unittest.TestCase offers a bunch of helper "assert" methods. Let's take a look at one that I really like:

Notice how the two tests are basically the same (and both fail). But when test_annoying() fails, you don't get much information about why, where as when test_nice() fails, it tells you exactly why: the "there" entries have different values! How nice is that?
Assignment 2: Explore the different helper "assert" methods that unittest has to offer. Play around with a few that look particularly interesting to you.

A "real" example

So far, it's all been very theoretical. In practice, you're going to want to test actual code. Say you have a python called "module1.py", which contains:
def square(num):
    return num*num

def is_even(num):
    return is_even % 2

(You can copy-paste this into such a file.)

To test it, you may create a "test_module1.py" which contains:
Go ahead and run it. It should say that it passed 2 tests OK. Great! So is all good? Of course not! Our test_even() method doesn't really test anything. :)
Assignment 3: fill in test_even() with some tests (is_even() should return True if the number is even, and False if its odd.)
Great! Did it pass your tests? If not, did you find the bug in is_even()? Unit testing is great because it helps you discover subtle (and sometimes not so subtle) bugs.

The nose knows...

Remember that I mentioned nose earlier? It's ok if you didn't. :) I won't be offended, really.

Well, nose has a few nice benefits on top of unittest. The most important one (in my opinion,) is that it lets you organize your tests better. Instead of putting your tests in the same folder as your code, nose lets you easily place them in a sub-folder, called "tests". It's important for both your main and your "tests" folders to contain an "__init__.py" file, which will be empty, in order for you to easily access the files that you want to test from within your test files. An example will clarify.
So say your folder looks like this:

+ my_code
   - module1.py
   - module2.py
   - main.py

Then, to add tests, you may want to do something like this:
+ my_code
   - __init__.py
   - module1.py
   - module2.py
   - main.py
   + tests
      - __init__.py
      - test_module1.py
      - test_module2.py
      - test_main.py

Now we can take our test_module1.py file from our previous example, and modify it like so:
Please notice two main differences:
  1. Instead of the line "from module1 import *" we now have "from ..module1 import *". The ".." are needed because now test_module1.py is in a sub-folder (tests). This is only possible to run properly if you have the two "__init__.py" files.
  2. There is no longer a call to unittest.main().
With nose, all we need to do to run the tests is run the command "nosetests." We can run it either from the main "my_code" folder, or from the "tests" folder within. Nosetests looks for any file that begins with "test" and checks to see if it has any tests in it. If it does, it runs them for you! The big advantage here is that in a large project, you can have dozens of different test files, and now, with a single command, you can run all of them at once!
Assignment 4: install nose, and put some stuff in "module2.py" that you can test in "test_module2.py". See how running nosetests runs all your tests! 
If you want to run a test in just one file, you can run "nosetests test_module1.py". If you want to run a particular class, you can do "nosetests test_module1.py:Tests" or "nosetests test_module1.py:Tests.test_square".

If you want to display the output (if any) of your script as it runs, run "nosetests -s".

This is a no-conflict zone...

An important matter when writing unit tests is making sure that they don't conflict. Not with each other, not with tests in other test files, and not with your system in general. Because the order in which tests run is pretty much arbitrary, it's good practice to make sure each test runs in a "bubble", one that doesn't depend on other tests running before it, and doesn't mess things up for tests running after it.

Creating such a bubble for your tests to run in is not always an easy feat to accomplish. There are some tools that I will talk about that help you achieve this, but it's something that always has to be in your head.

Not every test is difficult to build safely. Testing the square() and even() functions above require no special effort because they don't interact with the environment in any way. But say you had a test that had to read a file. And say you depend that the file had particular data in it for the test to run OK. You may just write the file somewhere once, and then test on it. But what happens if another test modifies or erases that file? In this case, your test will fail, and it may not always be obvious why. A good approach would be to set up the test by re-creating this file before running the actual test, this way you always ensure that the file is the way you expect it to be. Even better would be to clean up the test by removing this file, or perhaps restoring the file that was originally there.

To help you with this, unittest.TestCase has four methods to be aware of: setUp(), tearDown(), setUpClass(), and tearDownClass().

setUp() and tearDown() run right before and right after each test is run (i.e. test_square() and test_even() from our example above.) setUpClass() and tearDownClass(), as the name implies, run at the start and finish of running a class full of tests.

Thus if you need to have a certain file in a certain place, you can create it in setUp(), and remove it in tearDown(). If for some reason creating this file is very time consuming, you may want to create it once in setUpClass(), and then copy it to where the tests are expecting it in setUp(). Just don't forget to remove it now in tearDownClass().

Files are just one type of resource that may require set-up and clean-up. If you're working with a database, then it may need to be set up and cleaned up properly as well. A good database framework, like Django, will help you out with this.

The key is when running a test, ask yourself how it depends or affects the environment, and see if, with proper set-up and clean-up, you can remove the dependency / effect on the environment.

Let's play pretend...

Code in the real world can get quite complicated, and testing it can become a very dirty feat. Say you want to test a function that behaves differently at different times of the day?

You can see how testing this may be a problem. Now, you may be able to think of different solutions to this, such as passing hour as a parameter to the function. That may be fine, and in some cases, re-structuring the code to make it cleaner and more testable is a good idea. But in some cases, restructuring is not a good option. Perhaps a better illustration is this:
In this contrived example, greetings() first attempts to get an appropriate greeting from the internet, but if the internet connection is down, then it returns "Hello." Here structure of the function is not the problem here, the problem is now the unreliability of the internet.

Testing these sort of functions can get messy. If you find yourself putting test-specific code in your code, you are probably doing something wrong:
The above is not only ugly, but may actually introduce new bugs to your code!! In situations like these, it's good to be able to play pretend.

Mock, or if you're using Python < version 3.3, then mock, is an extremely powerful tool in the tester's arsenal.  Besides the library's documentation, there is a great tutorial that I highly recommend that you go over. Here I will only offer a little glimpse to better understand its power:
The above example lets me "mock" different return values for datetime.datetime.now(). First, I set it to around 7:15am, and then to 1:15am. Each time, greetings.greetings() is called in the same way, but returns a different (and appropriate) value.

But to get things to work well with nose, notice a few things here:

  1. I'm patching (using '@mock.patch') the datetime module as imported in greetings. This is important because if I mock 'datetime.datetime', it won't affect the greetings module. Relating to this, I had to import the greetings module by doing "from .. import greetings".
  2. Decorating a test function with @mock.patch will cause a mock parameter to be passed to the test function. In this example, I call it mock_datetime.
  3. To change the return value of a function inside a mocked module, simply change the appropriate mocked parameter. Thus to change greetings.datetime.now(), I had to mock greetings.datetime as mock_datetime, and then set mock_datetime.now.return_value to whatever I wanted.
  4. Notice that you can set the return_value as many times as you want within a test function.
  5. The decorator @mock.patch automatically "unmocks" the module as soon as the test function exists, thus there is no additional set-up or clean-up to perform.
Now there is a lot more you can do with the mock module! But I just wanted to give you a little practical glimpse. The beauty here is that mocking lets us test complicated code without having to change it.


Final thoughts...

But more than a just skill, unit-testing is a mindset. For it to have its impact on a project, it's important to understand that writing well-tested code is an investment. If one puts in the time to test the code, one saves time later dealing with less bugs. 

But as with any software project, it's important to have good tools at your disposal. Unittest, nose, and mock, and such tools. But they are not the only tools out there.

For instance, when looking for a good Python framework for writing web applications, of the many advantages that Django offers, not least is its well thought-out and integrated unit-testing capability. This means that making sure your code works is all that much easier using Django. All the better!

Now, I haven't covered everything that unit-testing may involve. Some other tools / techniques that may (or may not) be of interest are:

  • doctest: a neat way of having the unit tests be a part of the documentation!!
  • Code coverage: a convenient way to sport parts of the code that aren't being tested. Of course, this doesn't mean that the parts of the code that are being tested are being tested properly.





Monday, November 16, 2015

Django Hints & Tips

So calling what I'm about to do here a tutorial is a bit of an over-statement. Since Django has so many online tutorials (including Django Girls for Python beginners), books, a great reference, and no less than a gazillion packages, I figured trying to compete with all that will not only be a lot of work, but also kind of pointless.

And of course, there are also the topics and howto, which should be explored.

In fact, because of the enormity of the resources out there, I figured that perhaps a few hints and tips about a particular few gems that I like could be nice. So here goes (in no particular order):

Grappelli


Grappelli is a cute little Django-app that lets up upgrade the look & feel of the admin site with very little effort. Biggest draw-back is that it may take them a short bit to update their app to work with the latest Django version, but I found this not to be a big deal at all.

Django-solo

Django-solo is another Django-app, one that lets you work with models that should only have one record. You either need this or you don't.

Django-apptemplates

Django-apptemplates is a big deal. Django's built-in template inheritance is nice, but this really makes it complete (and I feel should be part of Django's core.) Typically, when extending a template, you write something like this:
{% extends "admin/base.html" %}
The above means you want to extend "base.html", and you're hinting that it's in the "admin" app. But there are two problems with it:
    1. It only lets you extend templates, not modify them. This means that if you're trying to create your own version of "admin/base.html", this will fail. It confuses your "admin/base.html" with the one you're trying to extend.
    2. Say you want to modify a template that is already modified by another app (such as an admin template that is modified by Grappelli.) In this case, the standard approach either fails or is very confusing to work with.
Django-apptemplates solves all this by letting you explicitly specify which app you want to extend the template from. Thus to update the admin's "base.html", all you have to do is this:
{% extends "admin:admin/base.html" %}
Or to update Grappelli's version of it, do this:
{% extends "grappelli:admin/base.html" %}
QED.

Create your own template tags and filters

This is part of Django's core features that I like. It lets you customize what you can do from within your Django templates, which sometimes can come in handy. You can read the full details in the howto. But here are a few pointers on how to do it:
  • Directory structure is important, make sure to put them in a folder called "templatetags"
  • Don't forget to load your custom tags & filters from within the template:
    {% load my_tags_file %}
  • You need to register your tags & filters, but that's easy using the decorator:
    @register.filter

Making complicated queries

Django's ORM has a simple yet powerful mechanism (Q objects) to enable arbitrarily complex queries. Check it out, you can do things like:
Poll.objects.get(
    Q(question__startswith='Who'),
    Q(pub_date=date(2005, 5, 2)) | Q(pub_date=date(2005, 5, 6))
)

Upgrade your debugging experience!

Django's default debugger is "OK". But if you really want to enjoy hunting-down bugs, get Werkzeug!

The easiest way to do this is to install django-extensions and Werkzeug (don't forget to update your settings.py file appropriately.)

That's it, now just run:
python manage.py runserver_plus

The advantage of going the django-extensions approach is that you get a lot of other nice goodies that way.

OK, fine! I'll talk a bit about databases...

There is one major problem with Django: it's too damn good! It takes care of so many things for you that it's easy to forget a few important things. Specifically, Django attempts to shield you as much as possible from the underlying database (whichever you may chose to use) that it's easy to forget that the database is even there!

But it is there. And it has its demands.

So, what is a database? Basically, it's a way to manage a BUNCH of information efficiently. By BUNCH, I mean BUNCH. As in, as much as you want. Megabytes? Gigabytes? Terabytes? Petabytes? Yumibytes? You got it!

With databases, it's fairly easy to store, look-up, update and delete information, without having to worry too much about things like running out of memory, which is nice.

Now, while databases tend to do this fairly efficiently, efficiency has its costs, and therefore it leaves it up to the database architect (or Django programmer) to tell it how to make it efficient. So... how do you tell a database to be efficient?

It's actually very simple, but before I get into it, I want to tell you a little bit about databases. Databases have tables (a.k.a "models" in Django), and each table has a bunch of columns ("fields" in Django), and a bunch of rows ("objects" in Django.) I'm going to use these terms interchangeably now just to throw you off. Now, because a table may have billions of rows, not all of them can be in memory at once. In fact, most of them remain in a file, and only when a particular row is fetched, does the database go to the right place in the file to read it. 

So far so good. But how does the database know where in the file that particular row resides? Does the database need to search the whole file to get to that row?

Obviously, the answer is not quite "yes", but not quite "no", either. :)

An index for your thoughts...

Lets imagine that you have a table called "people" with columns such as "first_name", "last_name", "date_of_birth", "city", and "gender". 

To create such a table in Django, you need to specify a model, "People" with the mentioned fields.

Now imagine you have all the people of the world who ever lived in this table. And now you want to find out where "Ido Carmi" was born. So you do something like this in Django:

for p in People.objects.all():
    if p.first_name == "Ido" and p.last_name == "Carmi":
        print "DOB:", p.date_of_birth

Great! Except that if you do that, Django has to retrieve all the billions of rows from the file just so you can check (in Python) if that particular person is the one you are looking for. It will work, it will just be slow.

So you say, I'll just do this:
for p in People.objects.filter(first_name = "Ido", last_name="Carmi"):
   print "DOB:", p.date_of_birth

Much better! Except that now, instead of Django getting all of the rows, the database will have to get all the rows to do the check for you, and will only pass the matches rows to Django. This is already much faster because it removes all the overhead of translating all the database rows to Django objects, but the burdensome task of going through all the rows in the file still occurs. Why is that?

The short answer is that the database doesn't inherently know where the "Ido Carmi" rows exist. The file isn't sorted in a meaningful way. So to make sure it didn't miss any, the database has to examine the file in its entirety, even if it found a few rows at the start.

This clearly, is not so great, and surely there must be a way to fix that! There is, and it's quite simple. It's called an "Index".  Remember that. It's key to database performance. 

And index is implemented differently in different databases, but it's basically a sophisticated Python dictionary, with the keys being one or more column values, and the value being the positions in the huge file that the rows with the matching column values exist. This way, all the database needs to do is look up the very efficient index to see where in the large file the data that you want is stored.

Now the catch is that maintaining such an index has its costs. Otherwise, one would index every field in the database! So you want to index fields that you "look up" often. 

For instance, if you look up people by city very often, then you probably want to have that field be indexed. You do that in Django by adding the db_index=True to the model's field, like so:

class People(models.Model):
    ...
    city = CharField(max_length=100, db_index=True)

Now that column will be indexed, and the look-ups will be very fast!

But, you say, in the original example the lookup was on two columns, what do we do then? Do we add an index to each column individually? Well, that is better then nothing, but not the best. Adding the two indexes, one for first_name and the other on last_name means that the database will have to consult two indexes in order to figure out where to look. Not the end of the world, but in large tables this can still be a little slow. It's better to build one index on two fields! In Django, you use the Meta option, index_together, like so:

class People(models.Model):
    ...
    Meta:
        index_together = ["first_name", "last_name"]

Now the database will build a super-index that will be able to find all the "Ido Carmi"s as quickly as the database can muster.

But what if you don't really care about when all the Ido Carmis were born, you just want to know how many of them there are! In this case, you can do something like this:

num_idos = len(People.objects.filter(first_name = "Ido", last_name="Carmi"))

"Duh!" you say. "What's the big deal about THAT?" Well, the big deal here is that Django's magic is at play here. You may be imagining that 

People.objects.filter(first_name = "Ido", last_name="Carmi")

is returning a list of matching objects, and then the standard len() function just returns the length of that list. It's a nice thought, a very nice thought, but wrong. :)

Doing its magic in magical Django ways, People.objects.filter() is designed to adapt itself to how it is being used. Because it's within a len(), instead of returning a list of objects, it actually asks the database to simply count the number of matching rows (which it can do by consulting the index, without ever looking at the huge file). This is super fast and efficient, and should make you smile inside.

A final note...

One final note to consider with indexes... Say you want to find all the people that are currently "female". Sure, you can put an index on that field, but it won't help you much. The reason is that there are basically around 10 different genders out there, so each index would point to about a billion rows! Sure, the database won't have to scan 10 billion rows now, "only" 1 billion, but since maintaining an index has its costs, and the benefit now isn't that significant, you may just chose to avoid this index and hope for the best. Generally, it's better to index columns that have many different distinct values (like "first_name", "last_name", "date_of_birth", and "city").

It's time for the Prime!

Now that we got indexes out of the way, we can talk a bit about "Primary Keys". Well... later... I'm sleepy now. :)

Monday, November 2, 2015

Good vs Perfect

This is the map of the West Bank that was proposed back in 2000. In exchange for Israeli recognition of a Palestinian state in just about the entire West Bank, Israel asked for an "end of the conflict."
Many Palestinian moderates of the time (including Edward Said) have adamantly rejected this proposal because it didn't allow for millions of Palestinians to "return" to Israeli lands (in quotes because most would have been descendants that were not actually born in Israel) -- and thus destroying Israel as the homeland of the Jewish people.
It's been 15 years since that proposal. Many wars have waged since, and many lives and opportunities lost. Although Israelis have suffered greatly during these 15 years of wars, I can't help but get upset at the thought of how the Palestinians have been betrayed by Arafat and the rest of their leadership.
15 years!! An entire generation of Palestinians could have been raised as citizens in their own state, free to build a better future for themselves, had this very reasonable proposal been accepted. Instead, all they got was stagnation, corruption, wars, and humiliation.
Now I'm not saying that this proposal was perfect, nor that it's reasonable to implement it now. It seems like so much has changed for both Israelis and Palestinians.
A wise person once told me "not to let the perfect get in the way of the good." It's a shame that 15 years ago, perfection got its way.

Thursday, January 8, 2015

PyTutorial: Advanced topics in Python

In this post I want to go over some more advanced Python programming techniques.  Where as functions and classes and packages are pretty much standard for every normal programming language out there, some of these techniques may not exist in all of them.  Still, they're pretty awesome, and definitely worth learning.

Iterate This!

Iterators are a nice, standard way of getting all the items in a container.  A container can be anything it with items, and items can be anything.  In a strange twist, a container can even have an indefinite (even endless) number of items.  But how can you get infinite items?  Well, the idea is that you don't get all these items at once.  Instead, you get them one at a time.  

Imagine you have a list of things.  You can ask the list, "give me your first thing."  And then ask for the "next thing", and the "next thing after that", etc. etc.  This way, the list can keep returning you things, and if it ever runs out, it'll just say "Stop!  That's enough!"   As you can see, because you keep asking for the "next" item in the container, in theory, the container can be endless.  It'll just keeping giving you more and more junk.   If you can't imagine how this works yet, it's ok.  Just bare with me, the examples will enlighten.

It's important to note that iterators don't really exist in and of themselves.  They are just a standard method for programming a way of using things that contain data.  (Whoa! That was a long sentence.)  If this sounds "abstract", well, it's because it is. :)

The good news is that you're already quite familiar with many containers that make an iterator interface available, as well as how to use iterators.  Really, you do!

Some of these containers are lists, dicts, sets, and even strings!  And how do you iterate through them?  Well, you use that trusted for loop! :)  Check this out:


Now you probably feel cheated.  How is this in any way advanced?  Well, for starters, just because you know about it and Python makes it easy, doesn't mean it's not advanced. :)  

But here is the challenge: how do you create an iterable container?  And why would you event want to?  :)

I'll start with the second answer.  You won't always want to.  In fact, in most cases, you can just return whatever data you want to iterate in another container that is iterable (such as a list, dict, set, etc.)  This is usually the easiest, most elegant solution for most simple problems.  But sometimes, this approach breaks.  One way this can break is when the data is indefinite. For example, say you want to iterate through all the letters that someone types in and capitalize them.  Without iterators, you have to wait until the user finished typing in all her letters, and only then capitalize them.  But what if you want to iterate through them as she types them in? One at a time?  Well, for that, you need iterators.  That's because you only keep on asking for the "next" character that she typed, not for all of them at once.

Another example is if you need to iterate through a lot of data.  Say you have a file that contains 1,000,000,000,000 lines of code.  (fun!) And you want to analyze each line of code to see if it's potentially valid Python.  Without iterators, you'd have to to load the entire file into memory before you would be able to analyze it.  And loading a file that large will simply crash your computer.  With iterators, you can analyze each line as you read it, one at a time.  That's because you only keep on asking for the "next" line in the file, not the whole one.

These are practical considerations, but there are also situations in which using iterators will make your code a lot more flexible as well as more elegant. But more on this soon.

So how the heck do we create iterators??  Well, the hard way is to use Python's iterator interface.  It's really not that hard, I'm only saying that it is because Python has an even easier and much cooler way of creating interfaces..........

Generate This!

Generators are awesome.  You can think of them as on-the-fly list creators.  Or more precisely, on-the-fly iterator creators.  Basically, generators are a type of functions that instead of returning something and being done, continuously yield things before being done.  It sounds more complicated that it really is.  Check this out:

Type it in and see what happens.  In this example, generate_this() yields 4 items.  Later, when generate_this() is used in the for loop, x gets all four items!  What's interesting here is that you can normally call a function in a for loop only if it returns an iterable container (such as a list).  But now, our function does no such thing!  Instead, it yields 4 different items!  What the heck?  Want to see something even cooler?  Add a print statement (for debugging) between each yield statement, and see what happens.  Neat huh?  Python magic.   In a nutshell, by having a function yield values, instead of returning them, the function becomes a generator: a container that can be iterated through!

Such generators can yield whatever they want and in any way the want.  Here's another example:

And another example:

Generators can even call each other:

Try them out!  
Assignment 1: Play around with generators.  Make them yield the world!
Remember my example of iterating through what someone types in and then capitalize it?  Well, you can write a generator that in an endless while loop, keeps getting some text from the user (using raw_input).  If the user types in "done", the loop breaks and the generator ends.  Otherwise, the generator yields that that user input.  Then, you can iterate through the values of that generator using a for loop, and do stuff with that input.  For instance, you can print the capitalized text, or the length of the text, or whatever you want.  Get it?
Assignment 2: do that. :)
 So far so good.  Now I want to give you an example of how generators can make your code more elegant.  Do you remember that good old Fibonacci Series?  That endless series of numbers that for some reason computer scientists always love to computer?  Great!

First, let's do things the hard and ugly way:
Assignment 3: write a function, fib1, that takes in an argument n, and returns the first n items in the series.  For example fib1(6) will return a list, [1, 1, 2, 3, 5, 8]
Assignment 4: write a function, fib2, that takes in an argument n, and returns a list of all the elements that are smaller than n.  For example fib2(20) will return [1, 1, 2, 3, 5, 8, 13]
Assignment 5: write a function, fib3, that takes in an argument n, and returns all the even elements that are smaller than n.  For example fib3(20) will return [2, 8].
Do you see where I'm going here?  All your functions, fib1, fib2, and fib3, just want to do stuff with the Fibonacci Series, but because of the other constraints, the function gets ugly and error prone.  What if there was a way that you can write the code for the Fibonacci Series once, and then use it as you see fit?  Hmmmm......
Assignment 6: write a generator, gen_fib, that takes no arguments, and yields all the items in the series.  This generator never ends...
Now you can call gen_fib() in a for loop and do with the items whatever you want!  When you break out of the for loop, you're done, and the generator basically "goes away."  (That is, you don't have to worry about something running forever).  The nice thing here, though, is that you can decide when to break out of the loop outside of the gen_fib function!  This lets your reuse the Fibonacci Series generator code in different situations.
Assignment 7: re-write fib1 from above to take advantage of your gen_fib generator.  Just keep appending the first n items that gen_fib returns into a list, and then return that list.  You don't have to use Python's built-in function, enumerate, but it could make your code that much cleaner.
Assignment 8: re-write fib2 & fib3 to take advantage of gen_fib.
Wonderful! :) Isn't that nicer?  But can you think of how to make your redo of fib3 even cooler??  What if instead of returning all the even elements that are smaller than n, it would, instead yield all the even elements, indefinitely?    The way to do this is rather simple: as you iterate through the items that gen_fib yield, you can check if that item is even.  If not, ignore it and continue with the next iteration in the loop.  But if it's even, yield it, and then continue with the loop.
Assignment 9: do that.  But call that function gen_fib_even()
Assignment 10: rewrite fib3 to take advantage of gen_fib_even().
Way cool.  If you want to see more cool generator examples, check out David Beazley's PDF.

But if you think that generators can make your code all pretty, wait until you hear about decorators... :)

Decorate That!

Put simply, decorators are cute little ways of letting you add features to a function (or even a class).  More technically, it's a function designed to modify the behavior of other functions.  This may sound a bit strange, and in some ways it is. :)  But a few use-cases may help explain.

Say you have some great big program, perhaps a Minesweeper game, and it's not working all too well.  In fact, you want to debug it a bit.  To do so, you want to see a trace of which functions are called when.   Wouldn't it be nice if you could just tell python to print something when a function is called, and then print something when it exits?  Say you had a function, called some_func, wouldn't it be nice to be able to do this:

And then, later, when calling some_func(...), trace information would magically appear as well?

Well, guess what?!  Decorators let you do just that!  In Python, you can decorate a function by using the at, '@', symbol with the decorator name right before the function definition.

Unfortunately, writing decorators can be a bit of pain.  Fortunately, there is a decorator out there that helps us write simple decorators! (After all, decorators are just functions, and thus can be decorated.)  Check this out:

There are a few things to note here:

  • To make the trace function a decorator, we are decorating it with "@decorate" (decorate is the name of the decorator that turns functions into decorators.)
  • When writing decorators with decorate, the function (in our example, trace) takes in 3 kinds of parameters:
    1. a function
    2. *args: all the positional parameters
    3. **kwargs: all the keyword parameters
  • You can have the decorator function do anything you want, but you need to remember to call the function that it's decorating.  That's done by calling f(*args, **kwargs).
    (You can read more about *args and **kwargs here and here.)
  • In our example, we also want to return that function's return value.
Now you can use your decorator to decorate other functions, as I demonstrated above on some_func().

All this may seem a bit cryptic, and it is.  Unfortunately, creating decorators isn't super simple, and there is more than one way to write them.  My goal here was to expose you to them, and show you how to use them, as well as how to write simple ones.  If you want to write more complicated ones, such as decorators that accept parameter values, think:

@trace(verbose=True)
def my_func(...):
   ....

Well, I recommend you read some other people's docs.  There is some documentation on the @decorator that I showed you (I also took the "trace" decorator example from there, and they have some other good ones too).  And here is a nice discussion that goes deeper on what decorators are and how they work.  There's also a really great Python Wiki full of useful decorators (and their source code, of course.)

But ultimately, it's more important that you know how to use them (@decorator_name), and to keep in mind that there is a way to create them as well.  But there is no need to memorize how to create them.

Now, decorators are interesting in theory, but I want to show you one more somewhat-practical example. The next section discusses another neat feature in Python, and in it, another simple decorator awaits...

Putting it in Context!

Another pretty cool feature in python is with.  It's goal is to help you properly deal with resources.  A resource is anything that you need to "take", and eventually "return".  A typical example is a file.  If you want to read it, you need to first open it (take the resource), and after reading it, you should close the file (return the resource.)  

Here's the less amazing way to read stuff from a file:
This way of coding is OK, so long as what you do is fairly simple, and nothing can go wrong.  Imagine a hypothetical in which the method capitalize() raises an exception.  It would mean that the above code will finish without closing the file.  Or look at a slightly more complicated example:
Not only will the function return without closing the file if an exception is raised, but also if a line that contains "stop here" is reached.  While these examples are rather contrived, you can see how this can be a problem.  Of course, there are work-arounds, like using a try-finally clause, like this:
The problem is that this sort of work-arounds may be confusing, because the try-except-finally clause is normally used for error-handling, not resource-handling.  Another problem with this approach is handling multiple resources at once, you need to have lots of try-finally clauses.  It may get a little hairy.

SOOOO.... what's Python's solution?  As you may have guessed, it's the with clause. :)  The above example can be re-written like this:
One nice thing to notice is that there is no need for the f.close() call.  The reason is that with takes care of all that for you.  A resource that supports with (and just about all of them do) has a built-in mechanism that tells python how to release it.  In our file example, it calls close() automatically when the code inside the with ends, where when all the lines were read, a return was hit, or an exception raised.  Using with pretty much guarantees that a resource will be closed appropriately.

A nice little feature that with has is allowing the use of multiple resources simultaneously.  Say you wanted to do something with three files, you could do this:
It doesn't get much better than this.

Of course, if one of these files can't be opened for some reason (perhaps it doesn't exist?) then you would still get an exception that you may want to handle appropriately.  But at least you can rest assured that all the other files, if they got opened in the process, are now closed.  Like this:

Such is life.

Now you may say: that's all nice and well, but how do I create a resource that with can handle?  The answer to that, quite naturally, is ... there are two ways of doing it! :)

The first approach requires you to create a class that defines two methods: __enter__() and __exit__().  (See the docs for details.)  But just to give you an idea, if you were going to implement the file open function in this
Assignment 11: write a Timer class, that on __enter__() starts the timer (you can use time.time() to see what time it is) and on __exit__()  stops the timer.  This way, you can start & stop the timer (your resource) using wtih.
The other approach is, in a way, much simpler.  Instead of creating a class, you created a decorated generator.  :)  (You must have felt this was coming, haha.)  You can see the docs for details.  But it basically goes something like this:


First, your generator is decorated with contextlib.contextmanager.  Your generator may or may not require parameters, it's up to you.  Typically, if you expect your resource to already exist, but you're just using it now, then you probably want to have that resource as your parameter. :)  Your generator first takes the resource.  In the timer assignment, this is where you would start your timer.  Then your generator yields.  It can yield nothing or something, depending on your needs (in the next example I show you how to yield something.)  Finally, you release the resource (in the timer example, this is were you stop the timer.)  

The above generator can be used like this:

with use_resource(x):
   ...
Assignment 12: create a generator called use_timer that takes in a timer object, and start / stops it appropriately.
But, as I hinted above, this generator can also create a resource and then take it:


In this example, no parameter may be required (unless you need some to help you create the resource.)  The resource is first created, and then opened.  After that, the generator yields the resource!  This way, someone can use this generator like this:

with use_resource() as x:
   ...

Notice that x gets whatever your generator yields.  Finally, just as before, you release your resource.

That's it for now!

I may come back here and add more interesting topics. I'll leave you with a few things to read about:
  • If you're not familiar with Python's string formatting abilities, this is probably as good a time to learn as any. 
  • Python has some nice collections that can help play with data.
  • Jeff wrote a nice, short book: Writing Idiomatic Python.  It's a nice way to quickly learn some neat tricks of the Python trade...