Monday, November 16, 2015

Django Hints & Tips

So calling what I'm about to do here a tutorial is a bit of an over-statement. Since Django has so many online tutorials (including Django Girls for Python beginners), books, a great reference, and no less than a gazillion packages, I figured trying to compete with all that will not only be a lot of work, but also kind of pointless.

And of course, there are also the topics and howto, which should be explored.

In fact, because of the enormity of the resources out there, I figured that perhaps a few hints and tips about a particular few gems that I like could be nice. So here goes (in no particular order):

Grappelli


Grappelli is a cute little Django-app that lets up upgrade the look & feel of the admin site with very little effort. Biggest draw-back is that it may take them a short bit to update their app to work with the latest Django version, but I found this not to be a big deal at all.

Django-solo

Django-solo is another Django-app, one that lets you work with models that should only have one record. You either need this or you don't.

Django-apptemplates

Django-apptemplates is a big deal. Django's built-in template inheritance is nice, but this really makes it complete (and I feel should be part of Django's core.) Typically, when extending a template, you write something like this:
{% extends "admin/base.html" %}
The above means you want to extend "base.html", and you're hinting that it's in the "admin" app. But there are two problems with it:
    1. It only lets you extend templates, not modify them. This means that if you're trying to create your own version of "admin/base.html", this will fail. It confuses your "admin/base.html" with the one you're trying to extend.
    2. Say you want to modify a template that is already modified by another app (such as an admin template that is modified by Grappelli.) In this case, the standard approach either fails or is very confusing to work with.
Django-apptemplates solves all this by letting you explicitly specify which app you want to extend the template from. Thus to update the admin's "base.html", all you have to do is this:
{% extends "admin:admin/base.html" %}
Or to update Grappelli's version of it, do this:
{% extends "grappelli:admin/base.html" %}
QED.

Create your own template tags and filters

This is part of Django's core features that I like. It lets you customize what you can do from within your Django templates, which sometimes can come in handy. You can read the full details in the howto. But here are a few pointers on how to do it:
  • Directory structure is important, make sure to put them in a folder called "templatetags"
  • Don't forget to load your custom tags & filters from within the template:
    {% load my_tags_file %}
  • You need to register your tags & filters, but that's easy using the decorator:
    @register.filter

Making complicated queries

Django's ORM has a simple yet powerful mechanism (Q objects) to enable arbitrarily complex queries. Check it out, you can do things like:
Poll.objects.get(
    Q(question__startswith='Who'),
    Q(pub_date=date(2005, 5, 2)) | Q(pub_date=date(2005, 5, 6))
)

Upgrade your debugging experience!

Django's default debugger is "OK". But if you really want to enjoy hunting-down bugs, get Werkzeug!

The easiest way to do this is to install django-extensions and Werkzeug (don't forget to update your settings.py file appropriately.)

That's it, now just run:
python manage.py runserver_plus

The advantage of going the django-extensions approach is that you get a lot of other nice goodies that way.

OK, fine! I'll talk a bit about databases...

There is one major problem with Django: it's too damn good! It takes care of so many things for you that it's easy to forget a few important things. Specifically, Django attempts to shield you as much as possible from the underlying database (whichever you may chose to use) that it's easy to forget that the database is even there!

But it is there. And it has its demands.

So, what is a database? Basically, it's a way to manage a BUNCH of information efficiently. By BUNCH, I mean BUNCH. As in, as much as you want. Megabytes? Gigabytes? Terabytes? Petabytes? Yumibytes? You got it!

With databases, it's fairly easy to store, look-up, update and delete information, without having to worry too much about things like running out of memory, which is nice.

Now, while databases tend to do this fairly efficiently, efficiency has its costs, and therefore it leaves it up to the database architect (or Django programmer) to tell it how to make it efficient. So... how do you tell a database to be efficient?

It's actually very simple, but before I get into it, I want to tell you a little bit about databases. Databases have tables (a.k.a "models" in Django), and each table has a bunch of columns ("fields" in Django), and a bunch of rows ("objects" in Django.) I'm going to use these terms interchangeably now just to throw you off. Now, because a table may have billions of rows, not all of them can be in memory at once. In fact, most of them remain in a file, and only when a particular row is fetched, does the database go to the right place in the file to read it. 

So far so good. But how does the database know where in the file that particular row resides? Does the database need to search the whole file to get to that row?

Obviously, the answer is not quite "yes", but not quite "no", either. :)

An index for your thoughts...

Lets imagine that you have a table called "people" with columns such as "first_name", "last_name", "date_of_birth", "city", and "gender". 

To create such a table in Django, you need to specify a model, "People" with the mentioned fields.

Now imagine you have all the people of the world who ever lived in this table. And now you want to find out where "Ido Carmi" was born. So you do something like this in Django:

for p in People.objects.all():
    if p.first_name == "Ido" and p.last_name == "Carmi":
        print "DOB:", p.date_of_birth

Great! Except that if you do that, Django has to retrieve all the billions of rows from the file just so you can check (in Python) if that particular person is the one you are looking for. It will work, it will just be slow.

So you say, I'll just do this:
for p in People.objects.filter(first_name = "Ido", last_name="Carmi"):
   print "DOB:", p.date_of_birth

Much better! Except that now, instead of Django getting all of the rows, the database will have to get all the rows to do the check for you, and will only pass the matches rows to Django. This is already much faster because it removes all the overhead of translating all the database rows to Django objects, but the burdensome task of going through all the rows in the file still occurs. Why is that?

The short answer is that the database doesn't inherently know where the "Ido Carmi" rows exist. The file isn't sorted in a meaningful way. So to make sure it didn't miss any, the database has to examine the file in its entirety, even if it found a few rows at the start.

This clearly, is not so great, and surely there must be a way to fix that! There is, and it's quite simple. It's called an "Index".  Remember that. It's key to database performance. 

And index is implemented differently in different databases, but it's basically a sophisticated Python dictionary, with the keys being one or more column values, and the value being the positions in the huge file that the rows with the matching column values exist. This way, all the database needs to do is look up the very efficient index to see where in the large file the data that you want is stored.

Now the catch is that maintaining such an index has its costs. Otherwise, one would index every field in the database! So you want to index fields that you "look up" often. 

For instance, if you look up people by city very often, then you probably want to have that field be indexed. You do that in Django by adding the db_index=True to the model's field, like so:

class People(models.Model):
    ...
    city = CharField(max_length=100, db_index=True)

Now that column will be indexed, and the look-ups will be very fast!

But, you say, in the original example the lookup was on two columns, what do we do then? Do we add an index to each column individually? Well, that is better then nothing, but not the best. Adding the two indexes, one for first_name and the other on last_name means that the database will have to consult two indexes in order to figure out where to look. Not the end of the world, but in large tables this can still be a little slow. It's better to build one index on two fields! In Django, you use the Meta option, index_together, like so:

class People(models.Model):
    ...
    Meta:
        index_together = ["first_name", "last_name"]

Now the database will build a super-index that will be able to find all the "Ido Carmi"s as quickly as the database can muster.

But what if you don't really care about when all the Ido Carmis were born, you just want to know how many of them there are! In this case, you can do something like this:

num_idos = len(People.objects.filter(first_name = "Ido", last_name="Carmi"))

"Duh!" you say. "What's the big deal about THAT?" Well, the big deal here is that Django's magic is at play here. You may be imagining that 

People.objects.filter(first_name = "Ido", last_name="Carmi")

is returning a list of matching objects, and then the standard len() function just returns the length of that list. It's a nice thought, a very nice thought, but wrong. :)

Doing its magic in magical Django ways, People.objects.filter() is designed to adapt itself to how it is being used. Because it's within a len(), instead of returning a list of objects, it actually asks the database to simply count the number of matching rows (which it can do by consulting the index, without ever looking at the huge file). This is super fast and efficient, and should make you smile inside.

A final note...

One final note to consider with indexes... Say you want to find all the people that are currently "female". Sure, you can put an index on that field, but it won't help you much. The reason is that there are basically around 10 different genders out there, so each index would point to about a billion rows! Sure, the database won't have to scan 10 billion rows now, "only" 1 billion, but since maintaining an index has its costs, and the benefit now isn't that significant, you may just chose to avoid this index and hope for the best. Generally, it's better to index columns that have many different distinct values (like "first_name", "last_name", "date_of_birth", and "city").

It's time for the Prime!

Now that we got indexes out of the way, we can talk a bit about "Primary Keys". Well... later... I'm sleepy now. :)

No comments:

Post a Comment