From the hip: May 2005

Monday, May 30, 2005

AJAX encapsulation and some gooey musings

Jon Udell blogs about AJAX encapsulation with TIBCO General Interface. This is one seriously cool product and I'm now officially itching to get my hands on it.

On the other hand it makes me wonder why we persist in trying to rebuild the functionality of the OS (windowing systems and so on) inside the browser. More and more it seems to me that we need to have a complete re-write of the OS windowing system to take advantage of all the things we have learned about GUIs in the past little while.

Imagine if the "windowing" software for the OS was actually just a giant browser. You'd get automatic "Active Desktop". You'd be able to open frames and dialogs and file choosers and all the other crunchy goodness that we have painstakingly baked into the browser. You'd also stop worrying about whether your application's super-duper feature was going to work on MacOS, Windows and Linux since all three could run this "super-browser" as the base object in the windowing system.

There's probably a really good reason (other than the phenomenal effort required) to not do this... but I can't think of it.

Saturday, May 28, 2005

Interviewed by Richard Jones

I asked Richard Jones to interview me.

I'm sure that my answers (found below) are a bit longer than most people's attention spans. To alleviate my guilt, please follow these instructions and further the meme:

Leave me a comment saying, "Interview me."

I will respond by asking you five questions. I get to pick the questions.

You will update your weblog with the answers to the questions.

You will include this explanation and an offer to interview someone else in the same post.

When others comment asking to be interviewed, you will ask them five questions.

1. Why do you write a weblog?

I write to try to capture my thoughts. I suppose that I don't really care about other people reading them but on some level I like the fact that they do. There is an ego trip associated with putting your thoughts out there and having other people read them. At the same time, I get to use readers as a sounding board: anybody who disagrees with what I write is welcome to comment and hopefully give me some insight that I hadn't previously seen.

I decided a while ago that I didn't want to have the kind of blog that goes on and on about what flavour of ice cream I ate this morning or which of my friends I will be seeing tonight. I think that in the future I will come back and read my thoughts from this year and last year and try to see what changes took place in my head and when (and maybe even why). Being able to see what I was thinking when certain events took place is kind of cool. I don't want to look back and see a list of what I was doing - I want to see what I was thinking about.

Finally, I blog because I'm fundamentally a yapper. I have an answer to every question (even though it might not be a great answer) and I sometimes feel the need to get that answer off my chest even if nobody is listening. That's why I sometimes blog about morality or society. My friends laugh at me (kindly) because I always have something to say about everything so I figure that some of that should end up here.

2. What's the coolest online community enabling system right now?

I think that wikis hold the #1 spot for me. They allow users to collaborate, chat and create together which is something that most other systems lack for one reason or another. A lot of features from other systems could be folded into wikis but even the basic (simplest) wiki system is far more powerful than most other systems.

Blogs are cool but they don't really create tight communities. They create clouds of people that read each other's blogs and leave comments for each other but they remain fundamentally un-collaborative publishing mechanisms - a way for me to tell you what I think/feel/do.

IRC has more immediate interaction but loses completely on the asynchronous level. I know, I know, there are bots that will tell you when the last time foxy6969 was in the channel but it's not the same. Wikis are always now.

Usenet has asynchronicity but is more like mailing lists. In fact, let me just lump mailing lists in here as well (even though there are access restrictions to most mailing lists that don't exist for most newsgroups) and say that both allow you to post articles (like a blog) on a given topic. I would say that Usenet fosters more community spirit than blogs but still loses to wikis.

Online games (like MMORPGs and so on) also foster great community spirit but are very limited in their focus. Even though most wikis have a theme, almost all have a very heterogenous collection of pages.

Tagging services like Flickr and del.icio.us don't create communities. They expose existing interest groups. You and I may tag the same resource the same way but have nothing else to say to each other. Also, tagging is not conversation (no matter what the pundits say) and is an incredibly low-bandwidth medium of communication. The lack of consistent commenting means that tagging services are more like personality quizzes than they are like a community blackboard.

So wikis is my final answer.

3. What on Earth is WOD?

The WODFS is a Write-Once Distributed FileSystem. I wrote it in the winter of last year with my friend Justin and we put a lot of work into our "baby". The system is based on the design of archival storage systems and distributed filesystems and its main goal is to allow users to simulate an infinite, indestructible hard-drive.

The WOD allows you to keep saving data forever to a network of computers that store only the unique parts of your data. This means that each "chunk" of data is only ever stored once (conceptually). Each node in the network caches the most frequently used blocks so that you can achieve a disk access speed that is reasonable.

What's really cool is that since the WOD is write-once, you never actually "lose" any data. Even deleted data can be easily retrieved.

The system includes a set of extensions to the filesystem that allow you to create special folders and access files that contain data from the past. For example, if you have a folder called myprojects where you put all your code, you can open (either by creating it in Windows or by cd'ing to it in UNIX) a folder called myprojects@2004-01-01 and you will see that folder's contents exactly as they were on the first of January, 2004. This works with any folder (even the root) in your filesystem. Individual files track every single saved modification and allow you to browse through the history of the file using dates (as explained above) or by appending @ and a negative number representing the number of steps backwards to take.

I'm working hard right now to finish the code to hook the system into the Windows explorer but right now things run only on UNIX. Performance is not as good as we would like but the whole thing is written in un-optimized Python so that is to be expected.

4. What's your favourite Python hack?

I think that right now, my favourite hack has to be the soft reference module that we developed for the WOD. We hooked into the GC routines of the Python runtime to manage a set of objects that gets collected only when the memory they use is needed.

Close runners-up include hacks that involve meta-classes or decorators.

I'm not a big fan of cramming as much code as possible into one line. I like to see elegance and consistency. By consistency I mean that even hacks should be done in the spirit of the language used.

I've spent the last years coding in Java and so I really like hacks that show flexible (sometimes even functional) capabilities of languages. I like to see languages pushed to do things that they were not necessarily meant to do in clean and cogent ways.

I've always been a big fan of LISP and Scheme so I like to see hacks that modify language structure or build "small languages" or that modify a set of entities (objects or functions or whatever) transparently and orthogonally. That's why I like playing with decorators and meta-classes.

5. Blogs vs. Usenet, fight to the death. Who wins?

Blogs win. Honestly I think that blogs and usenet are pretty similar and if bloggers had some more tools (other than Trackback and comments) to allow a threaded style of linking between their posts (like metadata that allowed a tool to generate lists of related posts) then blogs would be a lot like Usenet but without the crappy flamefests and trolling and religious wars.

Blogs mean I get to choose whose writings I read. I like that choice. There's no spam and in general most blogs are consistent enough that I know after a little while whether or not to keep reading the blog.

Usenet can go any way. There are often spam messages posted to newsgroups. There are trolls who spend their time wasting other people's lives. Also, newsgroup organization is more centralized than blog management (says the guy who blogs on blogspot.com).

All in all, blogs are more flexible and allow more personal choice and freedom. They get my leftist vote.

Thursday, May 26, 2005

Tag maintenance

Clay Shirky blogs about the Dynamic growth of Tag Clouds. He put together a script that showed how the cloud of tags associated to a URL on del.icio.us grows over time.

An interesting aspect (mentioned later in his post) is that some of the top tags for a resource only percolate to the top later in the tagging process. The example he uses is the tag "ajax" being applied to the "original Adaptive Path article". The "ajax" tag wasn't used until after more than 1/3 of taggers had already tagged the resource.

This begs the questions: how can users who are unaware of a certain term ever going to use that term as a tag; and how will that term ever be added to the list of tags used by those who have already tagged the resource?

To be clear: I tagged the original article. I did not however use the tag "ajax" since at the time the term meant nothing to me. It had not yet caught on as a global term for the practice of using javascript to communicate with a server behind the scenes. Today, I might very well attempt to find that article using the "ajax" tag. Of course, this wouldn't work since I never used that tag. It would be nice if I could search my tagged resources using others' tags when I don't have the tag I'm searching for. Obviously my own tags should have priority but if I search my resources for the "ajax" tag, the resources that I have tagged and that others have tagged as "ajax" should show up in my search.

This points to a general maintenance problem with tagging (and I would venture to guess with classification schemes overall): how is the metadata maintained cheaply. Currently, tagging a resource into del.icio.us takes me less than 30 seconds. It would be a real pain if I had to review my resources all the time and add new tags.

Maybe another solution (other than the one listed above) would be to produce a feed of resources that I have tagged and an interface that would guess which tags from others would be most useful to me.

It feels like we are moving beyond tagging to searching. So is the issue the technology behind tagging/classification or behind searching?

Thursday, May 19, 2005

Sparklines

I discovered a fantastic page containing a draft chapter from Edward Tufte's new book via Pensieri di un lunatico minori.

Digging deeper as is my wont I found a whole whack of cool articles about sparklines and their implementations.

http://bitworking.org/news/Sparklines_in_data_URIs_in_Python
http://www.ietf.org/rfc/rfc2397
http://www.bissantz.de/sparklines/
http://agiletesting.blogspot.com/2005/04/sparkplot-creating-sparklines-with.html
http://dealmeida.net/en/Projects/PyTextile/sparklines.html
http://www.waler.com/flint10.exe

[Update: fixed link to Pensieri di un lunatico minori]

WODFS storage architecture

An off-the-cuff description of the WOD's storage system

OK, I admit it. We shamelessly ripped off the idea for how to break up and store data from the Venti filesystem papers.

So the basic idea is that each file is broken up into a number of blocks. Each block is 1024 bytes in size (for now, more later on variable-sized blocks) and its fingerprint is the SHA-1 hash of the data in the block.

A given block can be either a data block or a pointer block. Pointer blocks have pointers to other pointer blocks or to data blocks. Each file is conceptually a tree of pointer blocks that end up pointing to leaf data blocks. Because has as its fingerprint the hash of the data in the block, each block can be interpreted as either a data block or a pointer block. This is not an issue to begin with but as the system fills with data, we project that this will result in considerable savings in terms of space since the data and the data structure are expressed in the same way.

In addition, each file is uniquely identified by a fingerprint which is the hash of its contents: the leaf data blocks are referenced by the pointer blocks all the way up to a root block for the file; that block's fingerprint is the file's fingerprint. Similarly, the entire filesystem is represented by a single root fingerprint. This root fingerprint is stored between sessions and is, in fact, the only item necessary to rebuild the entire system from scratch.

For the moment the type of a block is decided by a flag in the pointer block that points to it. This is a bit fragile since there is a limited amount of space for flags and thus the scheme could become unwieldy at some point in the future.

Since blocks are context-neutral they can be shared by different instances of the WOD running on the same machine.

The blocks are stored on the disk contiguously without any specific order (this sucks because retrieval is via random-access only). To speed up block retrieval, an index maps fingerprints to offsets in the data "file". This index can be rebuilt from the data "file" at any time and so it is not crucial to the functioning of the system. Theoretically the index could be stored within the system itself and loaded after the system is initialized. In this way the bootstrap routines would be a bit slower but once the index was loaded the whole system would speed up.

Right now the big hit for performance is in what we call the chainer. This part of the system takes a contiguous stream of bytes and breaks it up into a tree of blocks. It would be really nice to figure out a way to derive the "chained fingerprint" (i.e. the root fingerprint for the processed file) from the initial data in order to be able to skip the chaining phase when it is not required. Already the system discards write hits for blocks that are pre-extant in the data "file"; it would be nice to bring that up a level so that entire files could be discarded if their contents were known (this could be trivially done by adding the original hash to the root block since the original hash of all files with the same root block would be identical).

Moral objections to medical procedures

Medical decisions should not be coloured by a doctor's personal beliefs

David Toub blogs about "compulsory abortion training" for OB/GYNs. He mentions that at Yale it is not compulsory for students to learn abortion procedures although they did have to learn how to handle post-abortion patients.

It sickens me to think that there are doctors in the world that may be (by hook or by crook) imposing their own moral views on their patients.

The idea that I would have to change doctors because mine refuses to perform a certain procedure on moral grounds seems to me like a direct violation of the Hippocratic Oath which in theory involves providing the best care possible for a sick patient. It must be said that the original oath contained a clause that specifically forbade abortion; but that clause is absent from most modern versions of the oath.

Should a Jewish doctor be allowed to refuse to perform a porcine heart implant on the grounds that they do not believe that a pig's heart should be in a human body? Should any doctor refuse to perform a blood transfusion because they believe that their blood is their own and that to accept another's blood is to import impurities?

Science and scientists have always been on the razor edge of this knife: can I refuse to employ my knowledge because of a known outcome that I personally find reprehensible?

In many cases the answer is "yes". Engineers can decide to not build bridges that they believe will cause environmental damage, atomic physicists can refuse to build a better bomb, etc. But doctors, even more so than other scientists, are in the tricky business of directly saving lives.

To say that a doctor should be able to pick and choose which conditions he treats would be like saying that a policeman could pick and choose whom he protects; or like saying that a fireman could decide to not put out a fire based on what business took place in the building; or like a teacher being allowed to pick and choose which of his students he teaches.

When your profession places you in a position of responsability for the well-being of somebody else I think that your decisions for the treatment of that person (or that person's property in the case of the fireman) should be made based on the person's system of beliefs and not on your own. It's one thing for a doctor to advise against a certain operation (although the morality of the thing becomes very grey indeed) it's another entirely to relegate the patient to a lesser (or even just another) doctor just to satisfy one's own sense of moral values.

When somebody depends on you, putting them at risk to satisfy your own subjective reality can hardly be said to be the moral high ground.

Tuesday, May 17, 2005

Serendipity in RSS-land

In Jon Aquino's post Tip: Random-number generator for Firefox, he lists some of the things that he uses the random number for. He makes a choice between emailing a random contact or reading a page out of any one of a number of different sources. It would be kind of cool to have an RSS feed with a random page from a number of sources that would provide a little serendipity.

On a slightly related note, sometimes it's nice to let a couple of days go by without reading a given feed. It let's the juicy bits pile up so you can read them all at once.

Wednesday, May 11, 2005

Bitten by indentation

Python should raise more appropriate exception when the indentation in a class definition is not consistent.

Python allows you to use either spaces or tabs to indent your code; but you can't use both. You can't mix and match. Now this is not a big deal (although it does sort of violate the "there is one way to do things" rule that Python seems to adhere to fairly consistently) but it does mean that if you cut and paste some code from another file you could be lining yourself up for problems.

The thing is that Python, for some reason, doesn't complain when your indentation is not consistent within the definition of a class; instead, it gives an error that has nothing to do with the actual problem. For example, the following code:


1 class t:
2 [tab]def f():
3 [tab][tab]x = 5
4 [space][space]print x
5
6 [tab]def x():
7 [tab][tab]pass
8
9 t()

Will raise a NameError and say that 'x' is not defined on line 4. I don't understand why this doesn't raise an IndentationException the same way it would outside the class definition.

Tuesday, May 10, 2005

RSS Reading in Firefox

I just installed the Firefox Bookmark Synchronizer. I have had Sage installed for a while but now I can actually have my feeds synchronized between home and the office. Next on the list is getting my browser running off my USB stick so I can literally take my browser anywhere I go.

Monday, May 09, 2005

The Whisper Campaign

Dave writes that he has had a change in philosophy but I'm not sure this is really a change but maybe more of a refinement of previous ideas: there are certain types of praise that are more suited to being given in private.

I think the key factor may be intimacy; if I have intimate praise for somebody, sharing it publically may work against me more often than not. Public praise should focus on shared goals whereas private praise should focus on personal and/or intimate aspects of the relationship. When you tell somebody that they are "amazing" you are referring to some aspect of your relationship with that person that causes you to be amazed. This is not something that needs to be bandied about or else it loses it's subtlety.

In social gatherings my girlfriend and I often spend a lot of time apart, flitting from one group of people to the next. We always manage to catch the other's eye however and deliver a small, subtle gesture of caring even across a crowded room.

I think private praise gives a feeling of belonging and closeness that is not acheived via public praise.

From the hip