Friday, December 12, 2014

PyTutorial: writing web applications

This post teaches how to do stuff on the I-N-T-E-R-N-E-T.... woa!

But what is stuff?  And what is the internet?  Very good questions!

Since you are able to reach this blog, which is on the internet, I'm going to assume that you have at least a super-basic knowledge of the internet.  That is, that you can use web browsers to visit web sites.

In this post, I'll teach how to write web-applications.  By that, I mean writing web sites that do some fairly complicated tasks.  Another name for web-applications is dynamic web sites.  Basically, it's a web site that interacts with web user (a.k.a. client.)  This is contrasted with a web site that does not interact with the clients (a.k.a a static web site).  For example, this very web-site is very minimally dynamic: you can rate and add comments at the bottom of the page.  If I were to disable those, this web site would be completely static.  In a dynamic web site, the client can change something (such as add a comment, etc.) on the web site.  In a static site, all the web client can do is get stuff.   An example of a static web site is The Best Page In The Universe, as no one can leave comments or change the content in any way.  An example of dynamic web sites are Google (the page you get depends on the words you typed), Facebook (news feeds, recommendations, ads, what-not, depend on content people gave Facebook), etc. etc. etc.

There is a caveat, however, and to understand it, I'll have to go into how web sites work.

How Web Sites Work

Web sites have two parts:  the part that runs on the servers (the computers that belong to the web site), and the part that runs on the client (the browser running on your computer.)

Clearly, there is some interaction going on there.  To simplify things, it's good to think of what happens as a transaction, that is, the browser asks for a web page from the web server, and the web server responds with that web page.  When the transaction is over, the web server and web browser are free to go on with their business.  They can effectively forget about each other, until the web browser initiates another transaction.  In other words:

  • Only the web browser can initiate a transaction.
  • A transaction happens very quickly.
  • When the transaction is finished, it is finished.
Of course, just because a transaction is finished, it doesn't mean that the browser or server won't remember anything.  They may, but there is nothing "built-in" that forces them to.  For example, a browser can keep a history of places you've visited.  And a web site (especially a dynamic one) may save information from that transaction as well (such as your Twitter tweets.)

Let's "zoom in" on that transaction for a moment.  We said that the browser initiated a transaction.  This may happen by a person typing in "http://www.google.com/" in their address bar thingi.  The browser goes to Google's web site, and gets something back.  That something is called HTML.  If you're using Chrome, and your right-click on any web site, you should see a menu option called "View page source."  If you choose that, you will see the HTML that you got back.

Your trusted web-browser got that HTML, analyzed it, and made it look all nice and pretty: made it look like whatever it is you're seeing when you're browsing around.   HTML can contain links to other web pages.  If you click on such a link, another transaction is initiated and another HTML document is returned to your browser.  HTML can also contain forms, and if you fill it out and hit "submit" or "search" or "whatever", then another transaction is initiated, the browser sends whatever it is you filled out to the web server, and the web server gives back another HTML doc.  That's the way it works.

Sort of.  Let's zoom in a little more.  We mentioned that the web site responds to a web request with HTML.  But how does it figure out which HTML to respond with?  That greatly depends on the nature of the web site, and there are not hard-and-fast rules on how to do this.  For example, a static web site may just be a bunch of HTML files laying around in a computer folder somewhere.  A browser requests a certain file, and the server responds by giving back the contents of that file.  

A more complicated, or dynamic web site will have some sort of programming running on that server that figures out what to do.  That program can be written in just about any language imaginable!  Python is one of these languages, and lucky for you, Python actually gives you great tools to write sophisticated web sites with relative ease.  But you are not limited to Python.  You can write web sites in C++, Java, and even languages that were specifically invented for writing dynamic web sites (such as PHP and ASP.)  

So, you may naturally ask: hey, I've heard of Javascript; is that just such a language?  Unfortunately, the answer to that question would have to be no.  Javascript is actually a part of HTML.  

Say what?

Yes, the web is a dangerous place. :)  Basically, the world wasn't satisfied with boring old transactions as I've just described.  The world wanted web sites to be cool, to seem even more interactive than some dull forms and links.  The people demanded to be able to play Farmville or whatever it is they play these days.  They wanted Google auto-complete searches!  They wanted the world!!! What they got, unfortunately, was Javascript.  In a sentence, Javascript is the programming language you must write in if you want a program to run in the web browser (as opposed to the web server.)

Ah, Javascript.  Unfortunately, any tutorial on web development would be incomplete with out it.  Basically, Javascript is one of the worse languages ever invented.  It is horrible at just about everything it does.  But to its credit, it does a lot

So what is this demon I speak of?  Let me put it this way.  When you wrote your little Python scripts, you downloaded the Python interpreter, wrote the code, and had it run.  You can have it do anything you want, you can have it completely destroy the computer if you wanted too (I leave writing a quick program that tries to erase every file on the computer as an exercise to the student.)  And that's Python, a fairly advanced language.

Javascript is a way to write code that cannot harm the computer.  You can feel safe about browsing any web site in the world without worrying about your computer getting destroyed.  That's one thing that Javascript does well.  Another thing is that you can visit a web site from a Windows, Mac, or Linux computer, or even from an iPhone or an Android device, and that Javascript will work just as well.  Kind of.  This is something that Javascript tries to do well, but in reality it kind of sucks at it.  But again, to its credit, running on every imaginable device and running well is a pretty tough thing to expect.  And it does this, sort-of kind-of well.  Or at least well-enough.  In essence, it does this at the cost of making the Javascript-developer want to constantly want to throw up.  Or at least that's my experience with it.  :)

The reason that I'm telling you all this is that even though you can write the server-side code in any nice computer language you like (a.k.a. Python), if you want to do cool things on the client-side, you're going to have to do it in Javascript.  

The plus-side of Javascript is that it used to be way worse. :)  Today, you have what I call the new "HTML-Trio": HTML5, CSS3, and modern Javascript that make that whole crappy-mess somewhat less crappy.  Simply put, all three things are basically a part of the HTML standard, but each serves a different purpose:
  • HTML5: that's the basic HTML, what describes the structure of any given web page.  What are its links, where are its images, etc.
  • CSS3: that's describes the looks of a web page.  What fonts should be used, how are things spaced, the colors, etc.
  • Javascript: that describes dynamic aspects of the web page itself.  Things that the user can interact with without having to initiate another transaction with the web server.
To make things more difficult, CSS & Javascript can be embedded in HTML (that is, the code is write there inside the HTML.)  Not only that, Javascript can actually modify HTML on-the-fly.  I don't want to get into the details.  No one does.  On that note, I give you your first exercise:
Exercise 1: go to www.w3schools.com and play with their HTML, CSS, and Javascript tutorials.  This may take you some time (as in a few days), but get familiar with the technologies.  No need to be an "expert", just have a good sense of how they work together.
Exercise 2: review the HTML cheatsheet and  HTML Form guide to get a good, quick overview of what HTML has to offer.
BTW, this discussion won't be complete with a little tangent.  Have I mentioned that everyone hates Javascript?  As a result, many have tried to replace it (failed) or make work-arounds for it (mostly failed.)  For example, Google came up with a new programming language, Dart, which lets you write in Dart, and then "convert it" to Javascript.  Google also came up with an interesting technology called GWT that attempts to kill two birds with one stone: you write the server-side code in Java (sorry, not Python), and it automatically generates the required HTML, CSS, and Javascript code for you.  And of course, there is also pyjs, which lets you program in Python and have that turn into Javascript.

The reason I say that at best these technologies have mostly failed is that ultimately, they all generate Javascript.  And as such, if there is a problem, it's going to be in Javascript.  What's worse, is that if you want the browser to do things that they didn't think of, you will have to either:

  1. have to write Javascript
  2. cry a lot, and realize that you can't do it.  Then cry some more.  

Still, GWT has its followers, and Dart may be picking up steam.  Pyjs seems to be less perfected, but if you want to write a little Javascript game, it may be what you're looking for.

The last thing that I'm going to say about Javascript before moving on. Just like in Python, in order to do anything useful you are going to want to use packages that others have wrote.  Some popular Javascript ones you may want to be familiar with are jquery,  jquerymobile,  bootstrap,  angular,  dojo,  d3,  and many many more.  You may be tempted to think that with all these packages, writing Javascript should be a breeze.  But in that case, you would be forgetting that Javascript is crap, and thus these wonderful and amazing packages tend to break one anther in mysterious and creative ways.  Enjoy!

BTW, writing CSS can be a bit of a pain as well (but not nearly as bad as Javascript.)  Still, there are some nice tools out there to make authoring nice CSS all the merrier.  Check out SassCompass, and Bourbon.

There are also plenty of HTML authoring tools out there.  They are divided into two major kinds: WYSIWYG (What You See Is What You Get) and those that just help you work with HTML.  WYSIWYG are the most tempting, because, well, you don't need to know any HTML to create HTML.  The problem with them is that after you have something that looks nice, when you see the HTML that it created, well, it will most likely be complete junk.  You'll have 10,000 lines of code for something that can probably be achieved in 100.  This makes changing it, and making it dynamic (with Javascript or with Python) near impossible. This leaves you with editors that just help you work with HTML (syntax highlight & preview), which are cute, but not extremely helpful.  In other words, not much you can do to get around learning HTML.  But then again, you always wanted to learn HTML, so it works out! :)  Of course, you don't need to be an expert at it.  It may be good enough for you to have a basic understanding of it, and then get the help of an expert HTML designer to make it look all spiffy.
Exercise 3: check out Bootstrap's templates that make creating simple, yet beautiful, static web pages easy!  They take advantage of sophisticated JS & CSS libraries, including Font-Awesome, to simply development.  If you need to add some nice pictures, check out Pexels for some images free of charge.  Makes you want to create a web site, doesn't it? :)

Oh No! Another Tangent!!!

Sorry.  But as with any new programming field, there is a lot to learn.  The good news is that I'm close to getting started on showing you how to write dynamic web applications in Python.  The bad news, is that I feel that a little more background information would be useful.  The reason I'm doing this is that although you may not need all the information I will impart you with, you may run into some of the terminology that I'm going to teach you, and I want you to understand what it means.  Nothing here is rocket-science, btw.  Programming in Javascript is a lot harder.

In the previous tangent I discussed the transaction that takes place between a web browser and a web server.  Before going into the details of the server-side in the transaction, I want to give you a little more background about how the internet freaking works!

Say you type "http://www.idocarmi.com/p/pytutorial.html" in your browser and hit go! Many things happen to get that transaction going.

First, the browser needs to find the server.  How can it do that?  Well, http://www.idocarmi.com/p/pytutorial.html is called a URL that describes to the browser the steps in needs to take to initiate the transaction.

In that URL, there is www.idocarmi.com, that's the domain name (or host name) of the web server that it needs to find.  The browser can then, ask a domain name server (DNS) for the IP address that that domain name refers to.  IP what?  Every computer on the internet has a number.  That number is called its IP-address.  Most IP addresses are four numbers separated by a period.  To see what I mean, you can go to host2ip, type in any domain name that you want (www.google.com, www.idocarmi.com, etc.) and it'll give you its IP address (something like 173.194.67.121).  Every computer has a name for itself called localhost, as well as an IP address for itself: 127.0.0.1.  

Once the browser figures out the IP-address, it can use cool little internet protocols to figure out how to send data to the web server.  How that happens is also very interesting, but for another day (the curious can read about the OSI model.)

It may be shocking to discover that the internet is used for many purposes, and not just web browsing (sending emails is just one of many other things that can be done.)  Each such purpose has an internet protocol that define how two or more computers on the internet interact with one another.  For web browsing, the internet protocol is called HTTP, hence the "http://" at the start of the URL.  Secure (encrypted) web browsing is governed by the HTTPS (HTTP-Secure) protocol.  Luckily, HTTPS is just HTTP with encryption lapped on-top of it.  So once you know HTTP, you don't need to know much more to do HTTPS, just everything that there is to know about security.  

But that's not all.  In the basic Python tutorial you should have learned that computers are dumb.  Therefore if the client wants to talk HTTP, but the server is hearing SMTP (used for emails), as you can probably imagine, at best nothing will happen, and at worse, a big mess will happen.  Most protocols aren't designed to let the computer on the other end know which protocol is being used.  It's not like to do HTTP, the first thing the browser says is "I'm going to do HTTP now."  It would be nice if that how things worked, but it's not.  Instead, another way was devised to enable one server to talk multiple protocols at once.  This is achieved using ports.  

A port is just a number between 0 and 65,535.  The "standard" or "reserved" ports are under 1024.  For example, the standard port for HTTP is 80.  The standard for HTTPS is 443.  The ones for SMTP are 25 and 587.  The significance of these numbers is that now a computer on the internet can be used for many purposes at the same time without confusion about what is happening at any given time.  It's important to remember that these numbers are just standards and recommendations.  No one is stopping you from running an HTTP server on port 8080, or port 25 for that matter.  It's highly advised that you not use a port under 1024 unless it's the standard for that protocol (i.e. 80 for HTTPS).  But anything over 1024 is free for you to do as you please.  For instance, if you wanted to run an HTTP server on port 8888, you can do that, but in order for the web server to know that you it's running on port 8888 instead of 80, it will have to be explicitly told to do so in the URL, like this:

http://www.idocarmi.com:8888/p/pytutorial.html

(Note the colon followed by 8888 at the end of the host name in the URL.)
Of course, there is nothing listening on that port for the www.idocarmi.com server, so if you type in that URL in your browser, you'll get a webpage not available message from your friendly browser.

You can also be explicit about port 80 in the URL:
http://www.idocarmi.com:80/p/pytutorial.html

But it's the same as leaving it out, as it is the standard port for HTTP.  In fact, your browser may conveniently "erase it" for you.

You can also be a bastard, and so something like this:
https://www.idocarmi.com:80/p/pytutorial.html

Now you're telling your browser to talk HTTPS (secure) on port 80 (non-secure HTTP).  It'll try, but it'll fail because the server will talk HTTP back.  You should get an SSL connection error of some sorts.  SSL is basically the technology behind the 'S' in HTTPS. :)  While also very interesting, this too shall be for another day (the curious can read about Public-Key Cryptography.)

What's left in the URL is all that stuff after the host. In my example its:
http://www.idocarmi.com/p/pytutorial.html

This tells the browser to get the file called "/p/pytutorial.html"  Or more completely, it tells the browser to use the HTTP protocol to ask the server named www.idocarmi.com and which is listening on port 80 (implied) for the file found at path "/p/pytutorial.html".  QED.

(BTW, I don't want to mislead you into thinking that all internet protocols are transactional in the same way that I described in the previous section.  HTTP and HTTPS are, and these are the protocols that you'll be familiar with in web development.)

So is that it?  Are we FINALLY ready to get started on Python programming for the web?

Almost.  I don't want to get into all the details, but there is more to the HTTP protocol than the URL that I've just described (and more to the URL than I have just described too.)  

HTTP also allows for parameters (can be thought of like parameters of a Python function) to the web server, and thus allow more structured data to go through.

HTTP allows the web server to save cookies over at the browser side.  Cookies are similar to a Python dictionary.  For instance, a cookie name can be "last_visit_date" with a value being "2014-12-13", etc.  A web server can use cookies to keep track of returning visitors, or anything else they find useful.  They way they work is that the server, in the response of the transaction, basically tells the browser to save a few key-value pairs (like a dictionary.)  Then, in future transactions that the browser initiates to that same web-server only, it will provide those same key-value pairs to the server.

Another feature in HTTP is that with every response, the web server responds with a status code.  For example, 200 means "all is well", while 500 means "something is seriously broken".  404 means "i can't find what you're looking for."  (and thus a broken link.)

That's pretty much it for now.  There are more online explanations of http and cookies.  If you want the actual protocol reference, you can look up the relevant RFC (see http and cookies).  But at this point (and possibly ever) these RFCs may be too much information.

"Please remind me why I'm reading this post?"

To learn to do web development [using Python.]  "Ahhh, now I remember, yes!"

As I hinted previously, Python has some really cool tools to help you in your web development.  Each has a different approach, with advantages and disadvantages each.

django - a very comprehensive framework.  It has a lot of built-in features that help you build super-complicated stuff super-fast.  The draw back is that it has a somewhat steep learning curve, and making little modifications may be complicated.  You need to understand how it works and work with it, not against it.  It a bunch of powerful reusable apps that you can download an "plug into" your project.  It also has a built-in admin interface that lets you develop sophisticated data-entry websites quickly and easily.  In a sentence: it's a wonderful world, but one that takes getting used to (for help on that, see my post on Django hints & tips.)

flask - in many ways it's the also-powerful opposite of django.  It has a very low learning curve, and comes with some very powerful tools to help you with.  But the simplicity comes with a price: you have the tools, but now you need to work hard to use them in order to build something useful.  If you're doing something very unique, this may be your best bet.  Otherwise, you may be longing for a few of those django reusable apps.

There are many more frameworks, but I believe that (at the time of this writing) these two are the dominant ones.   While very different from another, they both share a few key (and wonderful) features:

ORMs (Object Relational Mappers)

This is a powerful technology that lets you work with databases (huge data stores containing complicated, and related information) without having to know anything about databases!  They let you use object-oriented programming (Python classes) to add, retrieve, update, and delete information.  Django has ORMs built-in, while Flask is very well integrated with sqlalchemy

Clean & Smart Template Model

As mentioned earlier, the final output of your web application will be some sort of mix of HTML, CSS, and Javascript.  Of those three things, the CSS & Javascript tend to be rather static (that is, the CSS & Javascript files are returned to the browser unmodified.)  What actually makes each transaction different is that the HTML is different (which may invoke the CSS & Javascript differently.)  Thus, in essence, the dynamic part of a web application is actually the different HTML that is generated for every request. 

This can be very hard and messy to achieve.  In the PHP & ASP programming language, the code is deeply embedded in the HTML, making trouble-shooting and testing quite difficult.  With both Django & Flask, there is a clean separation between the abstract work that the web application is performing (information processing, etc.) and how it is presented to the end-user (HTML generation.)  Django calls this MTV (or MVT). The Model is the basic information structure.  The View describes which information is then presented (that is, given back to the web browser.)  While the Template describes how this information is to be converted to HTML. 

As is the theme with Django and Flask, Django has this template model built-in, while Flask is very well integrated with jinja2.

Integrated Unit-Testing

Both Django and Flask support unit-testing.  They help generate "test databases" and provide tools to help you make sure your web application is working as expected.

Reverse URLs (creating links)

Building dynamic applications is, well, a dynamic process! :)  As your application changes, you want the links that are generated to be good (that is, not broken.)  Both Django & Flask have built-in tools to help ensure that this never happens.

Security

Having your web application out there begs hackers to try and mess with it.  Both frameworks give you the tools to prevent many common web attacks.  Of course no defense is prefect, and your web application may still be vulnerable some some attacks.  And although very interesting as well, this discussion will be left for another day.  (The curious can look in OWASP's Top Ten.)

Complementary Technologies

Both support integration with many complementary technologies.  More on that in a bit.

And much, much, more....

Both frameworks offer so many other features that I'll be hard pressed to name them all.  Internationalization -- allowing web sites to support multiple languages / localization. Performance optimizations.  Debugging tools. And of course, great online documentation (as well as a few books.)

Complementary Technologies

(See, that was in a bit!)

HTTP is a complicated protocol. And to implement it well, as well as fast, is hard.  Therefore, a division of labor was established in which certain web servers do the HTTP, while application frameworks communicate with the web servers and let you, the developer, write code mostly oblivious to HTTP.  Common (and free) web servers that are used in production environments are apache as well as nginx.  But don't worry about it for now, as both Django & Flask have built-in web servers that are great for development purposes.  And even if you'll want to go to production, you can leverage services that take care of the web servers as well as other complementary technologies for you.  (A quick search got me django and flask)

Another common complementary technology is a database server.  This is the technology that stores all your valuable information.  It is also the technology that the ORM in these frameworks tend to shield you from very effectively.  But even if you're not interacting with them directly, you still need to use them!  A great free such database that is good for development purposes is sqlite.  What's great about it is that it comes built-in with Python, and you don't have to do much to set it up!  Unfortunately it is not so great for many production environments.  In those cases, you may want to use postgresql or mysql (both are free, and work great!)  Again, you don't need to be concerned about this now.  This is just something to keep in mind.

These are the big ones.  Are are more, niche technologies (such as redis), but their use will depend on your specific needs.

Get on already!

Well, now you should be familiar enough with the main aspects relating to web development.  
Exercise 4: go through both the flask tutorial as well as the django tutorial.  Going through both will give you a better feel for what they are like and how they differ.

That's it!

Seriously.  :)  All I can do now is give you some personal recommendations and tid-bits of information.

First, what do I recommend?  Flask or Django?  My honest answer is that if you have the time to learn Django, it's well worth it.  I say this not because I like the learning curve, but because all the re-usable apps and the admin interface as incredibly powerful tools at your disposal.




No comments:

Post a Comment