Saturday, November 9, 2013

PyTutorial: Real Programmers Pack

And by that, I mean, they use packages, of course. :)

This is the 9th in a series of posts.  If you want to go the the start, go here.

What if I asked you to write a program that fetches some information from the internet?  Say, the main page of python.org.  How would you go about doing that?

You may start learning all about programming for the internet: sockets, DNS, HTTP and what not.  A pretty tough task.  Not impossible, but definitely tough.  There are some big & heavy books on the subjects that you can spend an eternity reading.  Some people like that sort of thing.  That, or you can do something like this instead:

Packages (a.k.a libraries) are other people's code that help you achieve specific tasks.  You can imagine that there are lots of different things that a programmer would like to achieve, and, this being the case, there are lots of packages available out there to help you do what you want to do.  Python has quite a few packages that come pre-installed when you install Python, and thus they're called "built-in" packages.  But for every package that comes pre-installed with Python, there are hundreds if not thousands of packages out there that you can get (almost always for free, but sometimes for a price), install, and then use to help you in your programming adventures!

One of the advantages of using a package is that you get code that you would have to work very long and hard to write, well, instantly!  Not only that, many of these packages are well tested and many of its bugs have been fixed, thus not only do you get complicated code easily, you get pretty good code at that!

Of course, packages also have their disadvantages:  because someone else wrote them, they work the way they wanted it to work.  Good packages are pretty easy to work with, but some, even if they're good, may not "fit" the way you want to work with the code.  Just something to keep in mind.  Another problem that comes up is that sometimes packages do have bugs, and if one of these bugs affects you, well, you may have to wait some time until it's fixed.  In this regard, open-source packages are less of a problem: since the code is available, if there is a serious bug, at least there is a chance for you to fix it on your own.

Having said all that, packages are usually simply great to work with.  I highly recommend looking for a package before deciding to write some code yourself.  The time it will save you will be significant.  That, and the quality of your code will improve as well.  As a rule, try and concentrate on the part of the program that only you are willing to write.  Why work to build something that others have already built and made available for you to use?

Going back to the teaser at the start:

Try it!  Pretty neat, huh? I find it pretty amazing that it's so easy to achieve something as complicated as getting information from the web.  You can imagine that not a small amount of things have to happen behind the scene for this to work.  The beauty of using packages is just that: you can let the boring things happen behind the scene, freeing you to concentrate on what matters to you!

Now, let's go a little bit about how this works.  The line, "import urllib2" tells Python that you want to use the package urllib2.  If this package isn't installed on your computer, you will get some error message (duh!). But because urllib2 is a built-in package, well, it should be available for you to use!

Now, every package has its own way that it lets you work with it.  In tech-speak, this is called the package's interface.  An interface is usually just a bunch of functions (and classes) that the package made available for you to use.  Sometimes it may be some pieces of information as well.  For instance, the math package has a variable called pi, which, as you probably have guessed, contains an approximation of pi!  These variables aren't meant to be changed, and therefore are usually called constants.  Pi is pi, and that's that.

In our example, we want to use the "urlopen" function that urllib2 has to offer.  To do that, we just type in the name of the package, then a dot, then the name of the function that we want to use.  Pretty straight forward.
Calling urllib2.urlopen returns an object, with a method named read that you can use to actually get the content of the web page.  Of course, urllib2 has plenty more things that you can do with it, which you can read about here.  When reading such documentations, many time I prefer to scroll down to the bottom and quickly glance at the examples that are provided.  Usually, you can get a good feel for what the package does in a few minutes without having to read a whole bunch of boring documentation.

There are plenty more built-in packages that are available to you.  Python has a quick overview as well as a  comprehensive list of all of them.  I highly recommend that you take the time now to browse through the packages and see what Python has to offer.  Keep in mind that you're using Python 2.7 when looking for packages, as packages in Python 3.3 are slightly different.
Assignment 9.1: go through Python "quick overview" of packages and play around with the examples to see what Python has to offer.  You will notice that Python even has some built-in packages that help you test your code!  That's a good attitude. :)
Assignment 9.2: go through the "comprehensive list" of Python packages.  Just read the table of contents to see what's offered.  I want you to be impressed by all the code that other people wrote that you can so easily use.  If something strikes you as particularly interesting, clear on it and explore.  You don't need to memorize anything here, just enjoy all the power you now can wield. :)
Pretty exciting stuff, huh?  Did you notice all the things that you can do with numbers and strings? How about the wealth of data types (beyond numbers, strings, lists, and maps) that are readily available for you (particularly dates and sets)?  There's also that neat little package, pickle, that lets you save your Python data to disk and later read them again!

There are also packages that help you test and troubleshoot your code.  The logging package is a great way to write logs.  There are also lots of development as well as debugging and profiling tools that you can use (profiling being tech-speak for finding which parts of your code run fast, and which run slow.)

More ways than one
Python offers 4 ways of using packages.  The first is as I showed you:
>>> import urllib2
>>> x = urllib2.urlopen(...)

Another way of achieving the same thing is:
>>> from urllib2 import urlopen
>>> x = urlopen(...)

This approach has the advantage of less typing, but the disadvantage of making it less clear where functions come from.  A slight variation of the above is:
>>> from urllib2 import urlopen as foo
>>> x = foo(...)

This approach can be rather confusing, so use it scarcely (if at all.)

Finally, there's the wildcard approach:
>>> from urllib2 import *
>>> x = urlopen(...)

I tend to dislike this style, because it makes all the functions in urllib2 available.  This is the most confusing, because then there is very little clue where a function comes from, especially if you do a wildcard import from more than one package!

Even more packages

If you think that there are lots of built-in packages, well, you'll be shocked by the sheer number of packages that are available for you!  If you go to PyPI (Python's Package Index), you'll be able to find about 36 thousand packages that are ready for you to install and use!!  These aren't official Python packages, and some of them are better maintained than others.  But they're there, and easy to download and start using!

To start using packages from PyPI, the easiest thing to do is to first download PIP.  If you're using Windows, it means downloading a little program from here.

If you're running Ubuntu, then all you need to do is run the following:
# apt-get -q -y install python-pip python-dev build-essential
# pip install --upgrade pip

Then, to install a package, you first need to know the name of the package that you want.  The package name is the part without the version number.  For instance, having randomly clicked on pw:

You can see that the package name is simply "pw".  Scroll down to the bottom of the page, you should see something like this:

It's important to see whether or not the package is compatible with Python 2.7.  Most are.  In fact, more are compatible with Python 2 than with Python 3.

Now, just for exercise, let's find some package to install.  Just to make things a bit interesting, I went to:
http://pypi-ranking.info/alltime, a website that shows you popular PyPI packages. On it, I found the "requests" package (8th most popular PyPI package of all time) which seems to be a popular alternative to Python's urllib2.  To install it, all you have to do is run the command:
pip install requests
On Windows, it looks something like this:

And on Ubuntu, well, like this:
# pip install requests

Now, you can use the requests package in your python code:

It doesn't get much easier than this.
Assignment 9.3: install the requests package using pip.  Then try to run the above code and see how it works!
To upgrade the package, all you need to do is run:
pip install --upgrade requests 
And if you no longer want the package installed on your system anymore, all you have to do is run:
pip uninstall requests
Of course, there are plenty more Python packages that you can use that for one reason or another can't be found on PyPI.  Some of them are really happening, actually.

BTW, I'm not really sure why this package exists, and what its advantages are over Python's built-in urllib2 package.  But if it's interesting to you, I'm sure you can read about it on countless blogs.  Searching for "python urllib2 vs requests" I found quite a few sites that discuss exactly that.

Which bring me to a few important points regarding packages:

Think about what you need.
Before jumping into a package, it's a good habit to think a little about what you want done. What would the "ideal" package contain?  How would you want to use it?  etc. etc. etc.

Don't use the first package you see.
Look around at Python's built-in packages, but also look for other packages.  Say you wanted to find alternatives for urllib2, you can search for "python alternatives for urllib2" or simply "urllib2 vs".  The requests package (and others) would show up...

Now you should do your homework: see what problems they are trying to solve, and how close are they to what you want done.

  • Many times, you will discover that different packages go about solving the same problem in vastly different ways.
  • Some packages are better maintained than others.  See how many downloads there are, as well as how often it gets updated.
  • Some packages are more extensive than others.  Bigger isn't always better.  Some packages try to solve many problems, while others concentrate on a simple task at hand.  See which package best meets your needs.
  • Some packages are better documented, and easier to use than others.


Discover more about what you really need.
As you browse through a package documentation, you may end up learning more about the topic at hand.  You may discover, for instance, that getting documents off the internet is more complicated than initially imagined, and thus your initial requirements of an "ideal" package will change.

By learning about competing packages, you will better learn about the concepts and limitations of whatever it is that you're trying to accomplish.

Be reasonable about this.
As you can see, there are tens of thousands of packages around on PyPI alone!  You can spend many lifetimes going through them, comparing them, analyzing them, etc etc.  Don't go nuts with this!

If you're trying to just do something quickly, you can probably use any package you want! But if your program will depend heavily on a particular package, and this program is dearly important to you, well, then it's probably worth it to invest a little of your time.

Life is a work in progress.  You will never find the perfect package.  You will never write the perfect program.  Try to enjoy the ride.
Wow factor
Just to impress you with some of the amazing things out there, check out these insanely fascinating packages (in no particular order):
  • NLTK: an amazing package to help you work with natural language, that is, figure out what people mean when they write stuff in English, or some other human lanaguage. 
  • SQLAlchemy: a sophisticated package to help you manage the data that you want to save.
  • SciPy.org: a collection of packages that let you perform complicated math and scientific calculations fast.
  • mlpy: a Machine Learning (a.k.a Artificial Intelligence) package for Python
  • flask, and django: two packages (among many) with a very different take on how to build web applications in Python.  They're both amazing in their own way.
  • kivy: a relatively new package that lets you write Android & iPhone apps using Python.
  • matplotlib: lets you generate wonderful graphs and charts and other figures using Python.
  • arrow: not so much "wow", as much as making dealing with dates nicer.
And there are more.  Much more.  All you have to do is look around a bit...

Writing your own modules

When writing a large piece of software, you may want to break up your code into modules and packages.  Some advantages in writing your own modules / packages are:
  1. It gets you thinking abstractly about the problem.  A module does something generic, thus now you're trying to solve a generic problem.  This will get you thinking of how to solve that problem, as opposed to the specific coding problem at hand.  Thinking abstractly may help you better understand the requirements of whatever it is that you're trying to achieve.
  2. Easier to test.  Having a generic module is easier to test than some functions that are deeply integrated inside your code.  Easier to test means better tested.  Better tested means better code.
  3. Code reuse.  This is usually highlighted as #1, but for me, it's nowhere near as important as the above two.  Sure, code reuse is great, if you ever end up reusing it.  But most likely, you'll never really need to. :)  
You can think of modules as a small package, or of a package as a collection of modules.  At any case, writing your own module is very simple.

A module is basically just a regular python file, one with functions and classes, but one that doesn't actually call any of it's code.  If you try running it, it doesn't do anything!   Seems rather useless, doesn't it? :)

But, if you import the file (aka module) from another file, then that file can use any of the functions / classes in the module file.  See?

For example, say you had a file, file1.py: (that's your module file)

And another, file2.py: (that's the file using your module file)

Try running file2.py and see what happens.  Pretty neat, huh?  Did you notice that the import doesn't include the .py extension?  Great! :)

I recommend that you read more about the modules here.
Assignment 9.4: write a module that contains a function that prints a random number to the screen.  Call that function from another file.
Assignment 9.5: add to the module file a class that does something silly.  Use that class in another file. 
On the other hand, writing your own packages is a bit more complicated.   The biggest complication with packages is the way python figures out where they exist in your file-system.   This is a bit out of the scope of this tutorial, but you can learn more about packages here.  If you choose to write your own packages, then you may want to know about how to set the environment variable PYTHONPATH, as well as about relative imports.  Finally, if you want to make a package that can be installed using pip, then you should check out this tutorial.  At any case, writing modules is probably all you'll really need (at least for a while) so don't worry about all that.

Assert This!

Although a bit on a tangent: you should really know about the assert keyword.  It's a great way to make sure that your code is working as expected.  To use it, you just write:
>>> assert x == 3, "expecting x to be 3, but got %d instead" % (x,)

or:
>>> assert len(list) > 0, "expected something in this list!"

or, if you're really laze, just something like this will work too:
>>> assert len(list) > 0

etc. etc.  You get the point.  The key idea here is that if your assertion fails, an exception is raised. 

You should use them to test your most basic assumptions about what is going on in your code.  If they fail, you know something major is wrong, and you don't have to wait until some visible symptoms appear (and they will appear whenever it is least convenient for you, of course).  You should read this short little tutorial about using asserts here.

Minesweeper

In this lesson, we covered a few packages, can you think of using some of them in your Minesweeper code?
You're already using the random package in there, but what else can you use?

Here are some simple suggestions:
  • Use the logging package to add troubleshooting information to your code.
  • Use the pickle package to save and load "high scores" between games.
  • You can use the date object from the datetime package to add a date to the "high scores" board.
A nice thing that you can do is to separate the logic of the game from the user-interface part of the game by putting it in a separate module.  For instance, you can put things like cell & grid calculations into a separate file called logic.py.  Then, import that module from the part that actually plays the game and does all the user input and output.  Sometimes it may not be 100% certain where a function or a class should go, well, that's fine.  You can either just arbitrarily chose a place for it, or perhaps it's a sign that you should modify that piece of code in a way that better separates between the "logic" of the game and the "user interface" part.

Another thing that you should do is find a good way to test your code!!  Read about the built-in doctest and unittest packages, but also look around for alternatives.  Perhaps nose is a good alternative?  What are the differences between them?  Which one do you like the most?  Whichever package you like the most, great.  Now use it to test your Minesweeper code.

While you're looking into unit-testing modules, you should also add a bunch of assert statements to verify that your code is working more-or-less as expected.  As a little bonus, most unit-testing packages out there figure out that when an assert fails, the test failed as well.  But assertions are great to have even without any unit-testing.

Finally, go ahead and break your code into a few separate modules.  Perhaps you can have one module that contains the code for the logic of the game.  Perhaps another that contains the UI (User Interface, that is, the printing to the screen and input from the keyboard) that you have in the game.  Then, you can have your main minesweeper game import and use these modules.

Great news!
We've come a long way with our Minesweeper game.  But are we satisfied with it?  Never!!!

In the next lesson, we'll take the game to the next level.  And by that, I mean adding neat little graphics and even sounds!   As you can probably imagine, that will require using a package.  Of course, I already have a package in mind that I will want you to use, but as an extra credit:
Assignment 9.6: look for some nice packages that we can use to add graphics for our little Minesweeper game.  What are our requirements?  What are the alternatives?  Seen anything you like? :)
Brace yourself, the next lesson will be more fun than a basket full of barking frogs. :)


No comments:

Post a Comment