Thursday, April 28, 2005

Exception-based Switch-Case and the flavour of Python

There's been a lot of discussion around switch-case flow-control in Python lately. With this recipe, Zoran Isailovski weighs in with what I find is a nice, clean, pythonic way of handling switch statements. He references another recipe by Brian Beck that handles the switch with a for loop (very cool).

I notice a comment on Zoran's recipe that references the C2 wiki's page on not using exceptions for flow control. Now I spend most of my time switching between Java and Python and it's true that I would never dream of using exceptions to control the flow in my Java application, but it's different in Python. In standard Python, iterators raise an exception to stop iteration and this is a pattern that is fairly well-recognized in the community.

All of which just points to the differences in flavour between Python and Java.

Lexical Analysis, Python-style

I've always wanted to be able to do lexical analysis Python-style and I've always had the feeling that it should be done with regexps instead of character by character. Now Jason Diamond and Frederik Lundh have presented recipes for doing just that. I'll have to try out these techniques and see if a re-write of the parser for Founder makes things cleaner.

Wednesday, April 27, 2005

Embedding data in HTML documents with the "data" URL scheme

The "data" URL scheme allows you to embed data directly in an HTML document. By encoding the data in Base64 you can even embed binary data straight in the page. The following example illustrates this by embedding the source of an image directly in the image tag itself.


<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFAAAAAMCAIAA

AB6P612AAAAmklEQVR42uWWSw6AIAxE7VXg/keqV8FPE12AdUYgamDREFomfVBIJaU0jTSEA

xbZbI8zaqXs6OwuUdVbkRDjamfVY4IEUzHILmQ4OubygHPOzwIjGV4CO5y4bkNgSpAApjiR/

CgGZwVxEcD2ulnO/06gT6uylp5dbE1W5SoIoQFwZaL9gAvkgwCfygb8cqeV9wn9ehuTH621X

AArWx8EprTSmAAAAABJRU5ErkJggg==">


The code above produces the following graphic

Monday, April 18, 2005

How well do you know Python -- Name mangling

In his article Spyced: How well do you know Python, part 1, Jonathan Ellis points out the danger of using exec. This principle shows up in inheritance too if you're not careful (or you come from Java):


>>> class A:
... def __init__(self):
... self.__x = 5
... def getx(self):
... return self.__x
...
>>> class B(A):
... def getx(self):
... return self.__x + 5
...
>>> b = B()
>>> b.getx()
Traceback (most recent call last):
File "", line 1, in ?
File "", line 3, in getx
AttributeError: B instance has no attribute '_B__x'

Friday, April 15, 2005

Duck Typing dilemmas - A strawman for "pure" Python interfaces

The Duck Typing debate is less about type-checking and more about expectations

Cedric Beust blogs about The Perils of Duck Typing and he's not alone. Many people have been talking about dynamic and static typing for quite a while now and it's very close to being an Emacs vs. vi argument.

I don't think the problem here is really the absence of static typing but more the absence of "good" design or reasonable expectation management. Let me give an example (so you can make fun of me if I'm wrong).

In Python, a number of methods and functions take "file-like objects" as parameters. Now some of these methods actually want all the methods that files implement and others are content with just having read and readline. This interface is obviously a little hard to code to since in order to implement it you need to read the specific requirements of the method/function you're calling.

Now the only thing wrong with this whole setup is that you really have no idea what methods to implement and as Cedric points out, you may omit the implementation of a method that is not called directly from the function you're calling. So here's my strawman proposal to those naysayers who demand more static typing: just implement an empty base class that throws NotImplementedException from your methods. Note that in Python this class wouldn't even need to be bundled with the base distribution since Duck Typing works whether you like it or not.

In the function below it doesn't matter what type the fileobject parameter is. It will work fine with a file from the open or file functions as well as with any other iterable object.


def line_count(fileobject):
count = 0

for i in fileobject:
count += 1

return count


Now let's assume that we want to make it explicit that you need to pass an iterable object. Most pythonistas will tell you to re-write the function like this.


def line_count(iterable_object):
count = 0

for i in iterable_object:
count += 1

return count


Now that's nice and clear. We can tell (as long as we have two brain cells to rub together) from the name of the argument what the type should be. Of course then the whole discussion takes a nasty swerve towards the "Hungarian notation debate" side of things.

Putting the notion of encoding the type in the argument name aside we can still provide a nice way to show what type is expected. All we have to do is write our function like this.



class Iterable(object):
def next(self):
raise NotImplementedException

def __iter__(self):
return self

def line_count(itobj):
"""Pass me an instance of Iterable, please."""

count = 0

for i in itobj:
count += 1

return count


Now, because Python supports multiple inheritance, we can just use the Iterable "interface" as a mixin. It doesn't really matter whether we use it or not because the methods that it defines are clearly described and listed in the class definition. In addition, the function will still accept non-Iterable objects that adhere to the "Iterator types" description in the Python Library Reference. It's just that designing like this makes it simpler on the client to use the library you've developed. Note also that any class that implements the "Iterator types" set of methods could also just mix in the Iterable class for the hell of it.

I know that this seems like a lot of flailing for nothing (since there is still no type-checking until runtime at which point it still happens using Duck Typing) but the point here is that I think the Duck Typing argument is a design argument and not a dynamic vs. static typing argument. You can implement Java-like interfaces in a dynamic language, they just won't be checked at compile-time; but maybe that's OK. The real goal is to: a) make sure that the implementor knows all the methods they need to support and b) make sure that you are not passing the wrong instance by mistake to your function. I think that by adhering to a few simple design guidelines both of those goals are pretty easy to accomplish.

NTP on Windows - Keeping up with the time

You can easily keep your local clock synchronized on recent versions of Windows

The NTP (Network Time Protocol) allows you to synchronize your local time to the time maintained by a set of servers. All those servers are fed (eventually, somewhere up the chain) by a clock of incredible precision (no, not Big Ben). UNIX users have had this feature since its creation and now it's time (no pun intended) for Windows users to have their day in the sun.

Just go to your start menu, select "Run..." and type "cmd" into the little text box. Click OK and type this line into the resulting command line:


net time /setsntp:pool.ntp.org


Your clock should now remain synchronized and up-to-date until the end of time.

Friday, April 08, 2005

Version 0.1.0 of WODFS Released

We finally released a version of the WOD. It still requires a tremendous amount of effort for others to get it to run (you need to install a bunch of dependencies) but we wanted to "release early, release often". The next release should contain a couple of improvements such as Windows support, limited WebDAV support, XMLRPC support and a re-structuring of some of the filesystem level code.

Wednesday, April 06, 2005

Podcasting meets blogging

Jon Aquino has started playing with an automatic blog to podcast script that looks pretty cool.

Tuesday, April 05, 2005

One box where two will do

By adding a keyword search for Google in Firefox you can remove the need for the extra Google search box

All you have to do is go to http://www.google.com and right-click on the form. Select "Add a keyword for this Search" and fill in the boxes in the popup. You can now use the keyword you just created in the address bar instead of using the right-hand-side Google search box.

Saturday, April 02, 2005

Reading between the lines

Reality can never be communicated, only experienced.

Words can never cause another person to experience anything. Only the receiver can perform the act of experiencing. We speak in order to draw smaller and smaller circles around the experience we are trying to provoke in the other person. When the circles are as small as we can go, we have a reasonable assurance that the other person has experienced the same thing that we have.

Writing (or saying) something using a lot of words allows us to refine the various possible meanings of the communication to a point where we not only have been able to express ourselves (i.e. we feel like we have accurately represented the experience) but also have been understood (i.e. we feel that the recipient has accurately reproduced the experience for themselves). This is easy. The way to do it is to start talking (or writing) and not stop until the other person is able to communicate back that they have experienced the same thing.

When we read religious or spiritual texts we often are confronted with the fact that the text is much shorter than what we really need in order to be able to reproduce the author's experience. This is simply because it would be impossible to expect an author to be able to write a text that could be understood by anybody at any time in any situation.

Writers of these texts therefore resort to a form of compression that allows them to accomplish two goals.

The first goal is to express themselves in a form that can be understood. It is not important how long it takes for the understanding to occur and indeed many of the authors (of these texts) that are read today have been dead for some time. The notion that it is irrelevant how much time and effort are required to understand the text is an interesting one since it underlines the fact that it is the understanding of the communication that is the ultimate goal.

The second goal is to allow the communication of the concept to be passed on even by those who do not understand it. In order for this to be possible it is important for the concept to be expressed simply. If the simplicity of the expression is great enough then even the greatest fools of the earth will propagate the message until such time as someone with the capacity to understand it can hear it.

So we have two conflicting goals: understanding and simplicity. How then are we to formulate a concept that is inherently complicated and deep in a way that is simple? Recall that simple in this context refers to the complexity of the message, not the complexity of the concept.

The Tao Te Ching expresses complicated concepts in simple verse. The Bible expresses the truth in the words of Jesus via parables. A highschool student could learn and recite each of these stories with little difficulty. However without the key to what is locked inside, they remain opaque and confusing.

The key is to "explode" the concepts hidden within each story; to tease each sentence apart and to push our understanding of each aspect of the work to it's limit. We need to read between the lines. Unfortunately, there are not very many lines so there appears to be a huge number of possible interpretations for each of these esoteric texts. The trick then is to draw as many lines as we can so that the space between them becomes smaller and smaller until, one day, perhaps, we will experience reality.