Jon Udell blogs about AJAX encapsulation with TIBCO General Interface. This is one seriously cool product and I'm now officially itching to get my hands on it.
On the other hand it makes me wonder why we persist in trying to rebuild the functionality of the OS (windowing systems and so on) inside the browser. More and more it seems to me that we need to have a complete re-write of the OS windowing system to take advantage of all the things we have learned about GUIs in the past little while.
Imagine if the "windowing" software for the OS was actually just a giant browser. You'd get automatic "Active Desktop". You'd be able to open frames and dialogs and file choosers and all the other crunchy goodness that we have painstakingly baked into the browser. You'd also stop worrying about whether your application's super-duper feature was going to work on MacOS, Windows and Linux since all three could run this "super-browser" as the base object in the windowing system.
There's probably a really good reason (other than the phenomenal effort required) to not do this... but I can't think of it.
Monday, May 30, 2005
Saturday, May 28, 2005
Interviewed by Richard Jones
I asked Richard Jones to interview me.
I'm sure that my answers (found below) are a bit longer than most people's attention spans. To alleviate my guilt, please follow these instructions and further the meme:
I write to try to capture my thoughts. I suppose that I don't really care about other people reading them but on some level I like the fact that they do. There is an ego trip associated with putting your thoughts out there and having other people read them. At the same time, I get to use readers as a sounding board: anybody who disagrees with what I write is welcome to comment and hopefully give me some insight that I hadn't previously seen.
I decided a while ago that I didn't want to have the kind of blog that goes on and on about what flavour of ice cream I ate this morning or which of my friends I will be seeing tonight. I think that in the future I will come back and read my thoughts from this year and last year and try to see what changes took place in my head and when (and maybe even why). Being able to see what I was thinking when certain events took place is kind of cool. I don't want to look back and see a list of what I was doing - I want to see what I was thinking about.
Finally, I blog because I'm fundamentally a yapper. I have an answer to every question (even though it might not be a great answer) and I sometimes feel the need to get that answer off my chest even if nobody is listening. That's why I sometimes blog about morality or society. My friends laugh at me (kindly) because I always have something to say about everything so I figure that some of that should end up here.
I think that wikis hold the #1 spot for me. They allow users to collaborate, chat and create together which is something that most other systems lack for one reason or another. A lot of features from other systems could be folded into wikis but even the basic (simplest) wiki system is far more powerful than most other systems.
Blogs are cool but they don't really create tight communities. They create clouds of people that read each other's blogs and leave comments for each other but they remain fundamentally un-collaborative publishing mechanisms - a way for me to tell you what I think/feel/do.
IRC has more immediate interaction but loses completely on the asynchronous level. I know, I know, there are bots that will tell you when the last time
Usenet has asynchronicity but is more like mailing lists. In fact, let me just lump mailing lists in here as well (even though there are access restrictions to most mailing lists that don't exist for most newsgroups) and say that both allow you to post articles (like a blog) on a given topic. I would say that Usenet fosters more community spirit than blogs but still loses to wikis.
Online games (like MMORPGs and so on) also foster great community spirit but are very limited in their focus. Even though most wikis have a theme, almost all have a very heterogenous collection of pages.
Tagging services like Flickr and del.icio.us don't create communities. They expose existing interest groups. You and I may tag the same resource the same way but have nothing else to say to each other. Also, tagging is not conversation (no matter what the pundits say) and is an incredibly low-bandwidth medium of communication. The lack of consistent commenting means that tagging services are more like personality quizzes than they are like a community blackboard.
So wikis is my final answer.
The WODFS is a Write-Once Distributed FileSystem. I wrote it in the winter of last year with my friend Justin and we put a lot of work into our "baby". The system is based on the design of archival storage systems and distributed filesystems and its main goal is to allow users to simulate an infinite, indestructible hard-drive.
The WOD allows you to keep saving data forever to a network of computers that store only the unique parts of your data. This means that each "chunk" of data is only ever stored once (conceptually). Each node in the network caches the most frequently used blocks so that you can achieve a disk access speed that is reasonable.
What's really cool is that since the WOD is write-once, you never actually "lose" any data. Even deleted data can be easily retrieved.
The system includes a set of extensions to the filesystem that allow you to create special folders and access files that contain data from the past. For example, if you have a folder called
I'm working hard right now to finish the code to hook the system into the Windows explorer but right now things run only on UNIX. Performance is not as good as we would like but the whole thing is written in un-optimized Python so that is to be expected.
I think that right now, my favourite hack has to be the soft reference module that we developed for the WOD. We hooked into the GC routines of the Python runtime to manage a set of objects that gets collected only when the memory they use is needed.
Close runners-up include hacks that involve meta-classes or decorators.
I'm not a big fan of cramming as much code as possible into one line. I like to see elegance and consistency. By consistency I mean that even hacks should be done in the spirit of the language used.
I've spent the last years coding in Java and so I really like hacks that show flexible (sometimes even functional) capabilities of languages. I like to see languages pushed to do things that they were not necessarily meant to do in clean and cogent ways.
I've always been a big fan of LISP and Scheme so I like to see hacks that modify language structure or build "small languages" or that modify a set of entities (objects or functions or whatever) transparently and orthogonally. That's why I like playing with decorators and meta-classes.
Blogs win. Honestly I think that blogs and usenet are pretty similar and if bloggers had some more tools (other than Trackback and comments) to allow a threaded style of linking between their posts (like metadata that allowed a tool to generate lists of related posts) then blogs would be a lot like Usenet but without the crappy flamefests and trolling and religious wars.
Blogs mean I get to choose whose writings I read. I like that choice. There's no spam and in general most blogs are consistent enough that I know after a little while whether or not to keep reading the blog.
Usenet can go any way. There are often spam messages posted to newsgroups. There are trolls who spend their time wasting other people's lives. Also, newsgroup organization is more centralized than blog management (says the guy who blogs on blogspot.com).
All in all, blogs are more flexible and allow more personal choice and freedom. They get my leftist vote.
I'm sure that my answers (found below) are a bit longer than most people's attention spans. To alleviate my guilt, please follow these instructions and further the meme:
- Leave me a comment saying, "Interview me."
- I will respond by asking you five questions. I get to pick the questions.
- You will update your weblog with the answers to the questions.
- You will include this explanation and an offer to interview someone else in the same post.
- When others comment asking to be interviewed, you will ask them five questions.
1. Why do you write a weblog?
I write to try to capture my thoughts. I suppose that I don't really care about other people reading them but on some level I like the fact that they do. There is an ego trip associated with putting your thoughts out there and having other people read them. At the same time, I get to use readers as a sounding board: anybody who disagrees with what I write is welcome to comment and hopefully give me some insight that I hadn't previously seen.
I decided a while ago that I didn't want to have the kind of blog that goes on and on about what flavour of ice cream I ate this morning or which of my friends I will be seeing tonight. I think that in the future I will come back and read my thoughts from this year and last year and try to see what changes took place in my head and when (and maybe even why). Being able to see what I was thinking when certain events took place is kind of cool. I don't want to look back and see a list of what I was doing - I want to see what I was thinking about.
Finally, I blog because I'm fundamentally a yapper. I have an answer to every question (even though it might not be a great answer) and I sometimes feel the need to get that answer off my chest even if nobody is listening. That's why I sometimes blog about morality or society. My friends laugh at me (kindly) because I always have something to say about everything so I figure that some of that should end up here.
2. What's the coolest online community enabling system right now?
I think that wikis hold the #1 spot for me. They allow users to collaborate, chat and create together which is something that most other systems lack for one reason or another. A lot of features from other systems could be folded into wikis but even the basic (simplest) wiki system is far more powerful than most other systems.
Blogs are cool but they don't really create tight communities. They create clouds of people that read each other's blogs and leave comments for each other but they remain fundamentally un-collaborative publishing mechanisms - a way for me to tell you what I think/feel/do.
IRC has more immediate interaction but loses completely on the asynchronous level. I know, I know, there are bots that will tell you when the last time
foxy6969
was in the channel but it's not the same. Wikis are always now.Usenet has asynchronicity but is more like mailing lists. In fact, let me just lump mailing lists in here as well (even though there are access restrictions to most mailing lists that don't exist for most newsgroups) and say that both allow you to post articles (like a blog) on a given topic. I would say that Usenet fosters more community spirit than blogs but still loses to wikis.
Online games (like MMORPGs and so on) also foster great community spirit but are very limited in their focus. Even though most wikis have a theme, almost all have a very heterogenous collection of pages.
Tagging services like Flickr and del.icio.us don't create communities. They expose existing interest groups. You and I may tag the same resource the same way but have nothing else to say to each other. Also, tagging is not conversation (no matter what the pundits say) and is an incredibly low-bandwidth medium of communication. The lack of consistent commenting means that tagging services are more like personality quizzes than they are like a community blackboard.
So wikis is my final answer.
3. What on Earth is WOD?
The WODFS is a Write-Once Distributed FileSystem. I wrote it in the winter of last year with my friend Justin and we put a lot of work into our "baby". The system is based on the design of archival storage systems and distributed filesystems and its main goal is to allow users to simulate an infinite, indestructible hard-drive.
The WOD allows you to keep saving data forever to a network of computers that store only the unique parts of your data. This means that each "chunk" of data is only ever stored once (conceptually). Each node in the network caches the most frequently used blocks so that you can achieve a disk access speed that is reasonable.
What's really cool is that since the WOD is write-once, you never actually "lose" any data. Even deleted data can be easily retrieved.
The system includes a set of extensions to the filesystem that allow you to create special folders and access files that contain data from the past. For example, if you have a folder called
myprojects
where you put all your code, you can open (either by creating it in Windows or by cd'ing to it in UNIX) a folder called myprojects@2004-01-01
and you will see that folder's contents exactly as they were on the first of January, 2004. This works with any folder (even the root) in your filesystem. Individual files track every single saved modification and allow you to browse through the history of the file using dates (as explained above) or by appending @
and a negative number representing the number of steps backwards to take.I'm working hard right now to finish the code to hook the system into the Windows explorer but right now things run only on UNIX. Performance is not as good as we would like but the whole thing is written in un-optimized Python so that is to be expected.
4. What's your favourite Python hack?
I think that right now, my favourite hack has to be the soft reference module that we developed for the WOD. We hooked into the GC routines of the Python runtime to manage a set of objects that gets collected only when the memory they use is needed.
Close runners-up include hacks that involve meta-classes or decorators.
I'm not a big fan of cramming as much code as possible into one line. I like to see elegance and consistency. By consistency I mean that even hacks should be done in the spirit of the language used.
I've spent the last years coding in Java and so I really like hacks that show flexible (sometimes even functional) capabilities of languages. I like to see languages pushed to do things that they were not necessarily meant to do in clean and cogent ways.
I've always been a big fan of LISP and Scheme so I like to see hacks that modify language structure or build "small languages" or that modify a set of entities (objects or functions or whatever) transparently and orthogonally. That's why I like playing with decorators and meta-classes.
5. Blogs vs. Usenet, fight to the death. Who wins?
Blogs win. Honestly I think that blogs and usenet are pretty similar and if bloggers had some more tools (other than Trackback and comments) to allow a threaded style of linking between their posts (like metadata that allowed a tool to generate lists of related posts) then blogs would be a lot like Usenet but without the crappy flamefests and trolling and religious wars.
Blogs mean I get to choose whose writings I read. I like that choice. There's no spam and in general most blogs are consistent enough that I know after a little while whether or not to keep reading the blog.
Usenet can go any way. There are often spam messages posted to newsgroups. There are trolls who spend their time wasting other people's lives. Also, newsgroup organization is more centralized than blog management (says the guy who blogs on blogspot.com).
All in all, blogs are more flexible and allow more personal choice and freedom. They get my leftist vote.
Thursday, May 26, 2005
Tag maintenance
Clay Shirky blogs about the Dynamic growth of Tag Clouds. He put together a script that showed how the cloud of tags associated to a URL on del.icio.us grows over time.
An interesting aspect (mentioned later in his post) is that some of the top tags for a resource only percolate to the top later in the tagging process. The example he uses is the tag "ajax" being applied to the "original Adaptive Path article". The "ajax" tag wasn't used until after more than 1/3 of taggers had already tagged the resource.
This begs the questions: how can users who are unaware of a certain term ever going to use that term as a tag; and how will that term ever be added to the list of tags used by those who have already tagged the resource?
To be clear: I tagged the original article. I did not however use the tag "ajax" since at the time the term meant nothing to me. It had not yet caught on as a global term for the practice of using javascript to communicate with a server behind the scenes. Today, I might very well attempt to find that article using the "ajax" tag. Of course, this wouldn't work since I never used that tag. It would be nice if I could search my tagged resources using others' tags when I don't have the tag I'm searching for. Obviously my own tags should have priority but if I search my resources for the "ajax" tag, the resources that I have tagged and that others have tagged as "ajax" should show up in my search.
This points to a general maintenance problem with tagging (and I would venture to guess with classification schemes overall): how is the metadata maintained cheaply. Currently, tagging a resource into del.icio.us takes me less than 30 seconds. It would be a real pain if I had to review my resources all the time and add new tags.
Maybe another solution (other than the one listed above) would be to produce a feed of resources that I have tagged and an interface that would guess which tags from others would be most useful to me.
It feels like we are moving beyond tagging to searching. So is the issue the technology behind tagging/classification or behind searching?
An interesting aspect (mentioned later in his post) is that some of the top tags for a resource only percolate to the top later in the tagging process. The example he uses is the tag "ajax" being applied to the "original Adaptive Path article". The "ajax" tag wasn't used until after more than 1/3 of taggers had already tagged the resource.
This begs the questions: how can users who are unaware of a certain term ever going to use that term as a tag; and how will that term ever be added to the list of tags used by those who have already tagged the resource?
To be clear: I tagged the original article. I did not however use the tag "ajax" since at the time the term meant nothing to me. It had not yet caught on as a global term for the practice of using javascript to communicate with a server behind the scenes. Today, I might very well attempt to find that article using the "ajax" tag. Of course, this wouldn't work since I never used that tag. It would be nice if I could search my tagged resources using others' tags when I don't have the tag I'm searching for. Obviously my own tags should have priority but if I search my resources for the "ajax" tag, the resources that I have tagged and that others have tagged as "ajax" should show up in my search.
This points to a general maintenance problem with tagging (and I would venture to guess with classification schemes overall): how is the metadata maintained cheaply. Currently, tagging a resource into del.icio.us takes me less than 30 seconds. It would be a real pain if I had to review my resources all the time and add new tags.
Maybe another solution (other than the one listed above) would be to produce a feed of resources that I have tagged and an interface that would guess which tags from others would be most useful to me.
It feels like we are moving beyond tagging to searching. So is the issue the technology behind tagging/classification or behind searching?
Thursday, May 19, 2005
Sparklines
I discovered a fantastic page containing a draft chapter from Edward Tufte's new book via Pensieri di un lunatico minori.
Digging deeper as is my wont I found a whole whack of cool articles about sparklines and their implementations.
http://bitworking.org/news/Sparklines_in_data_URIs_in_Python
http://www.ietf.org/rfc/rfc2397
http://www.bissantz.de/sparklines/
http://agiletesting.blogspot.com/2005/04/sparkplot-creating-sparklines-with.html
http://dealmeida.net/en/Projects/PyTextile/sparklines.html
http://www.waler.com/flint10.exe
[Update: fixed link to Pensieri di un lunatico minori]
Digging deeper as is my wont I found a whole whack of cool articles about sparklines and their implementations.
http://bitworking.org/news/Sparklines_in_data_URIs_in_Python
http://www.ietf.org/rfc/rfc2397
http://www.bissantz.de/sparklines/
http://agiletesting.blogspot.com/2005/04/sparkplot-creating-sparklines-with.html
http://dealmeida.net/en/Projects/PyTextile/sparklines.html
http://www.waler.com/flint10.exe
[Update: fixed link to Pensieri di un lunatico minori]
WODFS storage architecture
An off-the-cuff description of the WOD's storage system
OK, I admit it. We shamelessly ripped off the idea for how to break up and store data from the Venti filesystem papers.
So the basic idea is that each file is broken up into a number of blocks. Each block is 1024 bytes in size (for now, more later on variable-sized blocks) and its fingerprint is the SHA-1 hash of the data in the block.
A given block can be either a data block or a pointer block. Pointer blocks have pointers to other pointer blocks or to data blocks. Each file is conceptually a tree of pointer blocks that end up pointing to leaf data blocks. Because has as its fingerprint the hash of the data in the block, each block can be interpreted as either a data block or a pointer block. This is not an issue to begin with but as the system fills with data, we project that this will result in considerable savings in terms of space since the data and the data structure are expressed in the same way.
In addition, each file is uniquely identified by a fingerprint which is the hash of its contents: the leaf data blocks are referenced by the pointer blocks all the way up to a root block for the file; that block's fingerprint is the file's fingerprint. Similarly, the entire filesystem is represented by a single root fingerprint. This root fingerprint is stored between sessions and is, in fact, the only item necessary to rebuild the entire system from scratch.
For the moment the type of a block is decided by a flag in the pointer block that points to it. This is a bit fragile since there is a limited amount of space for flags and thus the scheme could become unwieldy at some point in the future.
Since blocks are context-neutral they can be shared by different instances of the WOD running on the same machine.
The blocks are stored on the disk contiguously without any specific order (this sucks because retrieval is via random-access only). To speed up block retrieval, an index maps fingerprints to offsets in the data "file". This index can be rebuilt from the data "file" at any time and so it is not crucial to the functioning of the system. Theoretically the index could be stored within the system itself and loaded after the system is initialized. In this way the bootstrap routines would be a bit slower but once the index was loaded the whole system would speed up.
Right now the big hit for performance is in what we call the chainer. This part of the system takes a contiguous stream of bytes and breaks it up into a tree of blocks. It would be really nice to figure out a way to derive the "chained fingerprint" (i.e. the root fingerprint for the processed file) from the initial data in order to be able to skip the chaining phase when it is not required. Already the system discards write hits for blocks that are pre-extant in the data "file"; it would be nice to bring that up a level so that entire files could be discarded if their contents were known (this could be trivially done by adding the original hash to the root block since the original hash of all files with the same root block would be identical).
OK, I admit it. We shamelessly ripped off the idea for how to break up and store data from the Venti filesystem papers.
So the basic idea is that each file is broken up into a number of blocks. Each block is 1024 bytes in size (for now, more later on variable-sized blocks) and its fingerprint is the SHA-1 hash of the data in the block.
A given block can be either a data block or a pointer block. Pointer blocks have pointers to other pointer blocks or to data blocks. Each file is conceptually a tree of pointer blocks that end up pointing to leaf data blocks. Because has as its fingerprint the hash of the data in the block, each block can be interpreted as either a data block or a pointer block. This is not an issue to begin with but as the system fills with data, we project that this will result in considerable savings in terms of space since the data and the data structure are expressed in the same way.
In addition, each file is uniquely identified by a fingerprint which is the hash of its contents: the leaf data blocks are referenced by the pointer blocks all the way up to a root block for the file; that block's fingerprint is the file's fingerprint. Similarly, the entire filesystem is represented by a single root fingerprint. This root fingerprint is stored between sessions and is, in fact, the only item necessary to rebuild the entire system from scratch.
For the moment the type of a block is decided by a flag in the pointer block that points to it. This is a bit fragile since there is a limited amount of space for flags and thus the scheme could become unwieldy at some point in the future.
Since blocks are context-neutral they can be shared by different instances of the WOD running on the same machine.
The blocks are stored on the disk contiguously without any specific order (this sucks because retrieval is via random-access only). To speed up block retrieval, an index maps fingerprints to offsets in the data "file". This index can be rebuilt from the data "file" at any time and so it is not crucial to the functioning of the system. Theoretically the index could be stored within the system itself and loaded after the system is initialized. In this way the bootstrap routines would be a bit slower but once the index was loaded the whole system would speed up.
Right now the big hit for performance is in what we call the chainer. This part of the system takes a contiguous stream of bytes and breaks it up into a tree of blocks. It would be really nice to figure out a way to derive the "chained fingerprint" (i.e. the root fingerprint for the processed file) from the initial data in order to be able to skip the chaining phase when it is not required. Already the system discards write hits for blocks that are pre-extant in the data "file"; it would be nice to bring that up a level so that entire files could be discarded if their contents were known (this could be trivially done by adding the original hash to the root block since the original hash of all files with the same root block would be identical).
Moral objections to medical procedures
Medical decisions should not be coloured by a doctor's personal beliefs
David Toub blogs about "compulsory abortion training" for OB/GYNs. He mentions that at Yale it is not compulsory for students to learn abortion procedures although they did have to learn how to handle post-abortion patients.
It sickens me to think that there are doctors in the world that may be (by hook or by crook) imposing their own moral views on their patients.
The idea that I would have to change doctors because mine refuses to perform a certain procedure on moral grounds seems to me like a direct violation of the Hippocratic Oath which in theory involves providing the best care possible for a sick patient. It must be said that the original oath contained a clause that specifically forbade abortion; but that clause is absent from most modern versions of the oath.
Should a Jewish doctor be allowed to refuse to perform a porcine heart implant on the grounds that they do not believe that a pig's heart should be in a human body? Should any doctor refuse to perform a blood transfusion because they believe that their blood is their own and that to accept another's blood is to import impurities?
Science and scientists have always been on the razor edge of this knife: can I refuse to employ my knowledge because of a known outcome that I personally find reprehensible?
In many cases the answer is "yes". Engineers can decide to not build bridges that they believe will cause environmental damage, atomic physicists can refuse to build a better bomb, etc. But doctors, even more so than other scientists, are in the tricky business of directly saving lives.
To say that a doctor should be able to pick and choose which conditions he treats would be like saying that a policeman could pick and choose whom he protects; or like saying that a fireman could decide to not put out a fire based on what business took place in the building; or like a teacher being allowed to pick and choose which of his students he teaches.
When your profession places you in a position of responsability for the well-being of somebody else I think that your decisions for the treatment of that person (or that person's property in the case of the fireman) should be made based on the person's system of beliefs and not on your own. It's one thing for a doctor to advise against a certain operation (although the morality of the thing becomes very grey indeed) it's another entirely to relegate the patient to a lesser (or even just another) doctor just to satisfy one's own sense of moral values.
When somebody depends on you, putting them at risk to satisfy your own subjective reality can hardly be said to be the moral high ground.
David Toub blogs about "compulsory abortion training" for OB/GYNs. He mentions that at Yale it is not compulsory for students to learn abortion procedures although they did have to learn how to handle post-abortion patients.
It sickens me to think that there are doctors in the world that may be (by hook or by crook) imposing their own moral views on their patients.
The idea that I would have to change doctors because mine refuses to perform a certain procedure on moral grounds seems to me like a direct violation of the Hippocratic Oath which in theory involves providing the best care possible for a sick patient. It must be said that the original oath contained a clause that specifically forbade abortion; but that clause is absent from most modern versions of the oath.
Should a Jewish doctor be allowed to refuse to perform a porcine heart implant on the grounds that they do not believe that a pig's heart should be in a human body? Should any doctor refuse to perform a blood transfusion because they believe that their blood is their own and that to accept another's blood is to import impurities?
Science and scientists have always been on the razor edge of this knife: can I refuse to employ my knowledge because of a known outcome that I personally find reprehensible?
In many cases the answer is "yes". Engineers can decide to not build bridges that they believe will cause environmental damage, atomic physicists can refuse to build a better bomb, etc. But doctors, even more so than other scientists, are in the tricky business of directly saving lives.
To say that a doctor should be able to pick and choose which conditions he treats would be like saying that a policeman could pick and choose whom he protects; or like saying that a fireman could decide to not put out a fire based on what business took place in the building; or like a teacher being allowed to pick and choose which of his students he teaches.
When your profession places you in a position of responsability for the well-being of somebody else I think that your decisions for the treatment of that person (or that person's property in the case of the fireman) should be made based on the person's system of beliefs and not on your own. It's one thing for a doctor to advise against a certain operation (although the morality of the thing becomes very grey indeed) it's another entirely to relegate the patient to a lesser (or even just another) doctor just to satisfy one's own sense of moral values.
When somebody depends on you, putting them at risk to satisfy your own subjective reality can hardly be said to be the moral high ground.
Tuesday, May 17, 2005
Serendipity in RSS-land
In Jon Aquino's post Tip: Random-number generator for Firefox, he lists some of the things that he uses the random number for. He makes a choice between emailing a random contact or reading a page out of any one of a number of different sources. It would be kind of cool to have an RSS feed with a random page from a number of sources that would provide a little serendipity.
On a slightly related note, sometimes it's nice to let a couple of days go by without reading a given feed. It let's the juicy bits pile up so you can read them all at once.
On a slightly related note, sometimes it's nice to let a couple of days go by without reading a given feed. It let's the juicy bits pile up so you can read them all at once.
Wednesday, May 11, 2005
Bitten by indentation
Python should raise more appropriate exception when the indentation in a class definition is not consistent.
Python allows you to use either spaces or tabs to indent your code; but you can't use both. You can't mix and match. Now this is not a big deal (although it does sort of violate the "there is one way to do things" rule that Python seems to adhere to fairly consistently) but it does mean that if you cut and paste some code from another file you could be lining yourself up for problems.
The thing is that Python, for some reason, doesn't complain when your indentation is not consistent within the definition of a class; instead, it gives an error that has nothing to do with the actual problem. For example, the following code:
Will raise a NameError and say that 'x' is not defined on line 4. I don't understand why this doesn't raise an IndentationException the same way it would outside the class definition.
Python allows you to use either spaces or tabs to indent your code; but you can't use both. You can't mix and match. Now this is not a big deal (although it does sort of violate the "there is one way to do things" rule that Python seems to adhere to fairly consistently) but it does mean that if you cut and paste some code from another file you could be lining yourself up for problems.
The thing is that Python, for some reason, doesn't complain when your indentation is not consistent within the definition of a class; instead, it gives an error that has nothing to do with the actual problem. For example, the following code:
1 class t:
2 [tab]def f():
3 [tab][tab]x = 5
4 [space][space]print x
5
6 [tab]def x():
7 [tab][tab]pass
8
9 t()
Will raise a NameError and say that 'x' is not defined on line 4. I don't understand why this doesn't raise an IndentationException the same way it would outside the class definition.
Tuesday, May 10, 2005
RSS Reading in Firefox
I just installed the Firefox Bookmark Synchronizer. I have had Sage installed for a while but now I can actually have my feeds synchronized between home and the office. Next on the list is getting my browser running off my USB stick so I can literally take my browser anywhere I go.
Monday, May 09, 2005
The Whisper Campaign
Dave writes that he has had a change in philosophy but I'm not sure this is really a change but maybe more of a refinement of previous ideas: there are certain types of praise that are more suited to being given in private.
I think the key factor may be intimacy; if I have intimate praise for somebody, sharing it publically may work against me more often than not. Public praise should focus on shared goals whereas private praise should focus on personal and/or intimate aspects of the relationship. When you tell somebody that they are "amazing" you are referring to some aspect of your relationship with that person that causes you to be amazed. This is not something that needs to be bandied about or else it loses it's subtlety.
In social gatherings my girlfriend and I often spend a lot of time apart, flitting from one group of people to the next. We always manage to catch the other's eye however and deliver a small, subtle gesture of caring even across a crowded room.
I think private praise gives a feeling of belonging and closeness that is not acheived via public praise.
I think the key factor may be intimacy; if I have intimate praise for somebody, sharing it publically may work against me more often than not. Public praise should focus on shared goals whereas private praise should focus on personal and/or intimate aspects of the relationship. When you tell somebody that they are "amazing" you are referring to some aspect of your relationship with that person that causes you to be amazed. This is not something that needs to be bandied about or else it loses it's subtlety.
In social gatherings my girlfriend and I often spend a lot of time apart, flitting from one group of people to the next. We always manage to catch the other's eye however and deliver a small, subtle gesture of caring even across a crowded room.
I think private praise gives a feeling of belonging and closeness that is not acheived via public praise.
Thursday, April 28, 2005
Exception-based Switch-Case and the flavour of Python
There's been a lot of discussion around switch-case flow-control in Python lately. With this recipe, Zoran Isailovski weighs in with what I find is a nice, clean, pythonic way of handling switch statements. He references another recipe by Brian Beck that handles the switch with a for loop (very cool).
I notice a comment on Zoran's recipe that references the C2 wiki's page on not using exceptions for flow control. Now I spend most of my time switching between Java and Python and it's true that I would never dream of using exceptions to control the flow in my Java application, but it's different in Python. In standard Python, iterators raise an exception to stop iteration and this is a pattern that is fairly well-recognized in the community.
All of which just points to the differences in flavour between Python and Java.
I notice a comment on Zoran's recipe that references the C2 wiki's page on not using exceptions for flow control. Now I spend most of my time switching between Java and Python and it's true that I would never dream of using exceptions to control the flow in my Java application, but it's different in Python. In standard Python, iterators raise an exception to stop iteration and this is a pattern that is fairly well-recognized in the community.
All of which just points to the differences in flavour between Python and Java.
Lexical Analysis, Python-style
I've always wanted to be able to do lexical analysis Python-style and I've always had the feeling that it should be done with regexps instead of character by character. Now Jason Diamond and Frederik Lundh have presented recipes for doing just that. I'll have to try out these techniques and see if a re-write of the parser for Founder makes things cleaner.
Wednesday, April 27, 2005
Embedding data in HTML documents with the "data" URL scheme
The "data" URL scheme allows you to embed data directly in an HTML document. By encoding the data in Base64 you can even embed binary data straight in the page. The following example illustrates this by embedding the source of an image directly in the image tag itself.
The code above produces the following graphic
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFAAAAAMCAIAA
AB6P612AAAAmklEQVR42uWWSw6AIAxE7VXg/keqV8FPE12AdUYgamDREFomfVBIJaU0jTSEA
xbZbI8zaqXs6OwuUdVbkRDjamfVY4IEUzHILmQ4OubygHPOzwIjGV4CO5y4bkNgSpAApjiR/
CgGZwVxEcD2ulnO/06gT6uylp5dbE1W5SoIoQFwZaL9gAvkgwCfygb8cqeV9wn9ehuTH621X
AArWx8EprTSmAAAAABJRU5ErkJggg==">
The code above produces the following graphic
Monday, April 18, 2005
How well do you know Python -- Name mangling
In his article Spyced: How well do you know Python, part 1, Jonathan Ellis points out the danger of using
exec
. This principle shows up in inheritance too if you're not careful (or you come from Java):
>>> class A:
... def __init__(self):
... self.__x = 5
... def getx(self):
... return self.__x
...
>>> class B(A):
... def getx(self):
... return self.__x + 5
...
>>> b = B()
>>> b.getx()
Traceback (most recent call last):
File "", line 1, in ?
File "", line 3, in getx
AttributeError: B instance has no attribute '_B__x'
Friday, April 15, 2005
Duck Typing dilemmas - A strawman for "pure" Python interfaces
The Duck Typing debate is less about type-checking and more about expectations
Cedric Beust blogs about The Perils of Duck Typing and he's not alone. Many people have been talking about dynamic and static typing for quite a while now and it's very close to being an Emacs vs. vi argument.
I don't think the problem here is really the absence of static typing but more the absence of "good" design or reasonable expectation management. Let me give an example (so you can make fun of me if I'm wrong).
In Python, a number of methods and functions take "file-like objects" as parameters. Now some of these methods actually want all the methods that files implement and others are content with just having
Now the only thing wrong with this whole setup is that you really have no idea what methods to implement and as Cedric points out, you may omit the implementation of a method that is not called directly from the function you're calling. So here's my strawman proposal to those naysayers who demand more static typing: just implement an empty base class that throws
In the function below it doesn't matter what type the
Now let's assume that we want to make it explicit that you need to pass an iterable object. Most pythonistas will tell you to re-write the function like this.
Now that's nice and clear. We can tell (as long as we have two brain cells to rub together) from the name of the argument what the type should be. Of course then the whole discussion takes a nasty swerve towards the "Hungarian notation debate" side of things.
Putting the notion of encoding the type in the argument name aside we can still provide a nice way to show what type is expected. All we have to do is write our function like this.
Now, because Python supports multiple inheritance, we can just use the Iterable "interface" as a mixin. It doesn't really matter whether we use it or not because the methods that it defines are clearly described and listed in the class definition. In addition, the function will still accept non-Iterable objects that adhere to the "Iterator types" description in the Python Library Reference. It's just that designing like this makes it simpler on the client to use the library you've developed. Note also that any class that implements the "Iterator types" set of methods could also just mix in the Iterable class for the hell of it.
I know that this seems like a lot of flailing for nothing (since there is still no type-checking until runtime at which point it still happens using Duck Typing) but the point here is that I think the Duck Typing argument is a design argument and not a dynamic vs. static typing argument. You can implement Java-like interfaces in a dynamic language, they just won't be checked at compile-time; but maybe that's OK. The real goal is to: a) make sure that the implementor knows all the methods they need to support and b) make sure that you are not passing the wrong instance by mistake to your function. I think that by adhering to a few simple design guidelines both of those goals are pretty easy to accomplish.
Cedric Beust blogs about The Perils of Duck Typing and he's not alone. Many people have been talking about dynamic and static typing for quite a while now and it's very close to being an Emacs vs. vi argument.
I don't think the problem here is really the absence of static typing but more the absence of "good" design or reasonable expectation management. Let me give an example (so you can make fun of me if I'm wrong).
In Python, a number of methods and functions take "file-like objects" as parameters. Now some of these methods actually want all the methods that files implement and others are content with just having
read
and readline
. This interface is obviously a little hard to code to since in order to implement it you need to read the specific requirements of the method/function you're calling.Now the only thing wrong with this whole setup is that you really have no idea what methods to implement and as Cedric points out, you may omit the implementation of a method that is not called directly from the function you're calling. So here's my strawman proposal to those naysayers who demand more static typing: just implement an empty base class that throws
NotImplementedException
from your methods. Note that in Python this class wouldn't even need to be bundled with the base distribution since Duck Typing works whether you like it or not.In the function below it doesn't matter what type the
fileobject
parameter is. It will work fine with a file from the open
or file
functions as well as with any other iterable object.
def line_count(fileobject):
count = 0
for i in fileobject:
count += 1
return count
Now let's assume that we want to make it explicit that you need to pass an iterable object. Most pythonistas will tell you to re-write the function like this.
def line_count(iterable_object):
count = 0
for i in iterable_object:
count += 1
return count
Now that's nice and clear. We can tell (as long as we have two brain cells to rub together) from the name of the argument what the type should be. Of course then the whole discussion takes a nasty swerve towards the "Hungarian notation debate" side of things.
Putting the notion of encoding the type in the argument name aside we can still provide a nice way to show what type is expected. All we have to do is write our function like this.
class Iterable(object):
def next(self):
raise NotImplementedException
def __iter__(self):
return self
def line_count(itobj):
"""Pass me an instance of Iterable, please."""
count = 0
for i in itobj:
count += 1
return count
Now, because Python supports multiple inheritance, we can just use the Iterable "interface" as a mixin. It doesn't really matter whether we use it or not because the methods that it defines are clearly described and listed in the class definition. In addition, the function will still accept non-Iterable objects that adhere to the "Iterator types" description in the Python Library Reference. It's just that designing like this makes it simpler on the client to use the library you've developed. Note also that any class that implements the "Iterator types" set of methods could also just mix in the Iterable class for the hell of it.
I know that this seems like a lot of flailing for nothing (since there is still no type-checking until runtime at which point it still happens using Duck Typing) but the point here is that I think the Duck Typing argument is a design argument and not a dynamic vs. static typing argument. You can implement Java-like interfaces in a dynamic language, they just won't be checked at compile-time; but maybe that's OK. The real goal is to: a) make sure that the implementor knows all the methods they need to support and b) make sure that you are not passing the wrong instance by mistake to your function. I think that by adhering to a few simple design guidelines both of those goals are pretty easy to accomplish.
NTP on Windows - Keeping up with the time
You can easily keep your local clock synchronized on recent versions of Windows
The NTP (Network Time Protocol) allows you to synchronize your local time to the time maintained by a set of servers. All those servers are fed (eventually, somewhere up the chain) by a clock of incredible precision (no, not Big Ben). UNIX users have had this feature since its creation and now it's time (no pun intended) for Windows users to have their day in the sun.
Just go to your start menu, select "Run..." and type "cmd" into the little text box. Click OK and type this line into the resulting command line:
Your clock should now remain synchronized and up-to-date until the end of time.
The NTP (Network Time Protocol) allows you to synchronize your local time to the time maintained by a set of servers. All those servers are fed (eventually, somewhere up the chain) by a clock of incredible precision (no, not Big Ben). UNIX users have had this feature since its creation and now it's time (no pun intended) for Windows users to have their day in the sun.
Just go to your start menu, select "Run..." and type "cmd" into the little text box. Click OK and type this line into the resulting command line:
net time /setsntp:pool.ntp.org
Your clock should now remain synchronized and up-to-date until the end of time.
Friday, April 08, 2005
Version 0.1.0 of WODFS Released
We finally released a version of the WOD. It still requires a tremendous amount of effort for others to get it to run (you need to install a bunch of dependencies) but we wanted to "release early, release often". The next release should contain a couple of improvements such as Windows support, limited WebDAV support, XMLRPC support and a re-structuring of some of the filesystem level code.
Wednesday, April 06, 2005
Podcasting meets blogging
Jon Aquino has started playing with an automatic blog to podcast script that looks pretty cool.
Tuesday, April 05, 2005
One box where two will do
By adding a keyword search for Google in Firefox you can remove the need for the extra Google search box
All you have to do is go to http://www.google.com and right-click on the form. Select "Add a keyword for this Search" and fill in the boxes in the popup. You can now use the keyword you just created in the address bar instead of using the right-hand-side Google search box.
All you have to do is go to http://www.google.com and right-click on the form. Select "Add a keyword for this Search" and fill in the boxes in the popup. You can now use the keyword you just created in the address bar instead of using the right-hand-side Google search box.
Saturday, April 02, 2005
Reading between the lines
Reality can never be communicated, only experienced.
Words can never cause another person to experience anything. Only the receiver can perform the act of experiencing. We speak in order to draw smaller and smaller circles around the experience we are trying to provoke in the other person. When the circles are as small as we can go, we have a reasonable assurance that the other person has experienced the same thing that we have.
Writing (or saying) something using a lot of words allows us to refine the various possible meanings of the communication to a point where we not only have been able to express ourselves (i.e. we feel like we have accurately represented the experience) but also have been understood (i.e. we feel that the recipient has accurately reproduced the experience for themselves). This is easy. The way to do it is to start talking (or writing) and not stop until the other person is able to communicate back that they have experienced the same thing.
When we read religious or spiritual texts we often are confronted with the fact that the text is much shorter than what we really need in order to be able to reproduce the author's experience. This is simply because it would be impossible to expect an author to be able to write a text that could be understood by anybody at any time in any situation.
Writers of these texts therefore resort to a form of compression that allows them to accomplish two goals.
The first goal is to express themselves in a form that can be understood. It is not important how long it takes for the understanding to occur and indeed many of the authors (of these texts) that are read today have been dead for some time. The notion that it is irrelevant how much time and effort are required to understand the text is an interesting one since it underlines the fact that it is the understanding of the communication that is the ultimate goal.
The second goal is to allow the communication of the concept to be passed on even by those who do not understand it. In order for this to be possible it is important for the concept to be expressed simply. If the simplicity of the expression is great enough then even the greatest fools of the earth will propagate the message until such time as someone with the capacity to understand it can hear it.
So we have two conflicting goals: understanding and simplicity. How then are we to formulate a concept that is inherently complicated and deep in a way that is simple? Recall that simple in this context refers to the complexity of the message, not the complexity of the concept.
The Tao Te Ching expresses complicated concepts in simple verse. The Bible expresses the truth in the words of Jesus via parables. A highschool student could learn and recite each of these stories with little difficulty. However without the key to what is locked inside, they remain opaque and confusing.
The key is to "explode" the concepts hidden within each story; to tease each sentence apart and to push our understanding of each aspect of the work to it's limit. We need to read between the lines. Unfortunately, there are not very many lines so there appears to be a huge number of possible interpretations for each of these esoteric texts. The trick then is to draw as many lines as we can so that the space between them becomes smaller and smaller until, one day, perhaps, we will experience reality.
Words can never cause another person to experience anything. Only the receiver can perform the act of experiencing. We speak in order to draw smaller and smaller circles around the experience we are trying to provoke in the other person. When the circles are as small as we can go, we have a reasonable assurance that the other person has experienced the same thing that we have.
Writing (or saying) something using a lot of words allows us to refine the various possible meanings of the communication to a point where we not only have been able to express ourselves (i.e. we feel like we have accurately represented the experience) but also have been understood (i.e. we feel that the recipient has accurately reproduced the experience for themselves). This is easy. The way to do it is to start talking (or writing) and not stop until the other person is able to communicate back that they have experienced the same thing.
When we read religious or spiritual texts we often are confronted with the fact that the text is much shorter than what we really need in order to be able to reproduce the author's experience. This is simply because it would be impossible to expect an author to be able to write a text that could be understood by anybody at any time in any situation.
Writers of these texts therefore resort to a form of compression that allows them to accomplish two goals.
The first goal is to express themselves in a form that can be understood. It is not important how long it takes for the understanding to occur and indeed many of the authors (of these texts) that are read today have been dead for some time. The notion that it is irrelevant how much time and effort are required to understand the text is an interesting one since it underlines the fact that it is the understanding of the communication that is the ultimate goal.
The second goal is to allow the communication of the concept to be passed on even by those who do not understand it. In order for this to be possible it is important for the concept to be expressed simply. If the simplicity of the expression is great enough then even the greatest fools of the earth will propagate the message until such time as someone with the capacity to understand it can hear it.
So we have two conflicting goals: understanding and simplicity. How then are we to formulate a concept that is inherently complicated and deep in a way that is simple? Recall that simple in this context refers to the complexity of the message, not the complexity of the concept.
The Tao Te Ching expresses complicated concepts in simple verse. The Bible expresses the truth in the words of Jesus via parables. A highschool student could learn and recite each of these stories with little difficulty. However without the key to what is locked inside, they remain opaque and confusing.
The key is to "explode" the concepts hidden within each story; to tease each sentence apart and to push our understanding of each aspect of the work to it's limit. We need to read between the lines. Unfortunately, there are not very many lines so there appears to be a huge number of possible interpretations for each of these esoteric texts. The trick then is to draw as many lines as we can so that the space between them becomes smaller and smaller until, one day, perhaps, we will experience reality.
Subscribe to:
Posts (Atom)