One Who Rides a Tiger Will Find It Hard To Dismount

| 4 Comments

Or, how Apple will revolutionize blogging (maybe).

Warning: Heavier doses than normal of geekery ahead.

When Safari was first introduced waaaaaaaaaaaay back in January of 2003, one of the biggest questions that was asked (other than “why do we need another browser?”) was “Why is Safari built on top of KHTML and why isn’t it based on Mozilla?”

The answer given at the time was that the KHTML engine, while perhaps somewhat immature compared to the Mozilla engine, was just as fast, but more importantly, much smaller and easier to work with.

This was true as far as it went, but it left out why being smaller and relatively cruft-free was really important: Safari’s rendering engine, called WebCore, was eventually going to be rolled into Mac OS X itself. In fact, you can actually consider Safari, the application, to simply be a wrapper around WebCore, an OS-level HTML rendering engine (this is oversimplification, but it works for our purposes).

Obviously, if you’re going to be rolling code into the OS proper, it helps if the code is as small and as cruft-free as possible. But why does Mac OS X really need an OS-level HTML renderer (other than the fact that Windows has one)? I mean, it makes some things, like improving the HTML rendering in HelpViewer and Mail.app, much simpler.

But that’s fairly trivial.

What if HTML was used in a really out-of-the-box way? Say, what if HTML (with a few minor extensions) was used as a lightweight way to build front ends for programs? That’s definitely an application that would require an OS-level HTML rendering engine.

Smells like Dashboard, doesn’t it?

Since Dashboard’s HTML extensions are part of WebCore, incorporating them (and thus virtually all of Dashboard’s functionality) into Safari (or any other application that uses WebCore) is already a fait accompli; all you have to do is make the specs for the Dashboard extensions public and all of a sudden, you’ve got Browser Wars II: Electric Boogaloo!

And who woulda thunk that all this would have happened all the way back in January of 2003, when all anyone wanted to know was “Why didn’t they use Mozilla?”

Now the past is prologue.

To anyone paying attention, it’s no secret that Apple is (perhaps belatedly) embracing blogging in a big way in OS X 10.4, aka Tiger. RSS support is being built into the next version of Safari and Tiger Server featuring a turnkey blogging server. The blogging server is built around blojsam, a blogging system that was inspired by blosxom.

Blosxom is famous for being a full-featured yet extremely lightweight blogging system—it consists of about 150 lines of perl code. One of the ways that it achieves such simplicity is that it uses the file system as the database—in other words, each entry is saved as a separate text file.

Blojsom picks up where Bloxsom leaves off—it’s written in Java, not Perl, and it adds a number of extra—useful—features. But, crucially, it still uses the file system as the database.

And now I want to talk about yet another feature of Tiger: Spotlight (don’t worry, this is actually going somewhere).

Spotlight is really two things: the first is very powerful tool in the Finder for finding files—using a combination of metadata and full-content searches to return its results—and the second is a set of programming interfaces to the searching technology behind Spotlight. It’s similar to the way that Safari is both a web browser (“Safari”) and the system-level HTML rendering code (“WebCore”).

Before I lose you with too many buzzwords, let’s look at what metadata is. As the name implies, it’s data about data. It’s really simpler than it sounds. It’s information abut an object, but not the object itself. Metadata associated with a generic file might include the date it was created, the kind of file it is (application or data), what applications can open it, the icon that should be used to represent the file, and so on. But metadata can also apply to things other than files: for example, the metadata for an email would include things like the date, the sender, the recipient, the subject, if it’s been read, if it’s been replied to, and so on.

It’s worth repeating here that much of the metadata for an email and a blog post are the same.

One of the interesting things about Spotlight is that it’s extremely focused on searching and finding files. In fact, in order to take full advantage of the new technology, the next version of Mail.app will change the way it stores mail messages.

At the moment, Mail.app saves email using mbox format: each folder is saved as a file; individual messages are aggregated together inside each file. The version of Mail.app that will ship with 10.4 changes that: now, each email will be saved as a separate file. In other words, they’re letting the file system be the database.

Hm. Sounds familiar, doesn’t it?

I think that one of the key reasons why Apple picked blojsom as the turnkey blogging server for Tiger Server is precisely because the file-oriented nature of it’s storage mechanism fits perfectly with the way that Spotlight is designed.

So now the question is how exactly will Spotlight be integrated into the blogging server? The obvious answer is that it can be used as an ultra-fast site search function. But it could also easily automate finding similarly related posts—kind of an automated category-building feature. Something like that might go a long way towards solving some of the problems I talk about here.

Other uses? I’m not entirely sure, to be honest. But the combination of such a powerful search tool and blogging is sure to generate new and innovative applications—the sort of thing that’ll be blindingly obvious, but only after someone invents it. Anyone have any ideas?

4 Comments

Hmmm. Interesting read, although the only thing that concerns me with concepts like "each mail is its own file" is that if not managed correctly, that sort of thing gets extremely slow. I don't know how the Mac OS X filesystem holds up, but most filesystems start to suck once you get lots of files in every directory. Just last week I clocked in with 12000 emails in my inbox.

Now, there are obvious technical solutions that allow you to get past some of that, but sometimes those simple things just slip by the best of us.

(I'm not denying my 12000 emails was unusual, but actually not /as/ unusual as it sounds - I've talked to a lot of people with thousands of emails in their inbox, mostly work inboxes. I'd imagine it wouldn't be an issue for most personal inboxes.)

I actually played with blosxom, and I like it - very simple to use. And there are a huge number of plugins that add all sorts of functionality. Good stuff.

I think fast, reliable searching would definitely be a big plus, but I'm unconvinced about the idea of automatic keywords or categories - mainly because unless you put a lot of effort into maintaining it, it probably wouldn't end up all that useful, as it wouldn't be intelligent enough to make meaningful categories. There might be some interesting and amusing ones, but I'd imagine there wouldn't be a lot that would necessarily make navigation easier. Maybe, if there was some /really/ impressive Bayesian learning going on, and if it was heavily maintained by a human (i.e. building dictionaries of words to ignore and/or emphasize - some of which would be supplemented by the Bayesian stuff), it could work. But at some point I'd imagine it'd be just as easy to manually categorize things.

But hey, I'm a natural skeptic. Maybe I'm wrong. :)

(One thing I will say - sometimes the most dramatic results come from relatively subtle changes in interfaces that lead to a different way of approaching the problem rather than just giving you slightly different tools to do the same old same old with in a slightly more efficient way. And this could be an example of it. But I'm skeptical. ;) )

Actually it sounds like they are going to use the mh_mailbox format. But they will have to do some major adjustments to mail.app to make it work, because right now when I try to access my mh_mailbox archives via imap, mail.app makes a mess out of them. Thankfully Apple has upped the processing on the imap side going from 10.2 to 10.3, otherwise I'd stick to Mulberry, except for handling hotmail stuff.

Very interesting. Of course my brain stopped processing anything after the third word, "Apple."

Believe it or not:

NOT EVERYONE ON THE PLANET HAS A SCREEN RESOLUTION OF 1024 x 768 OR BIGGER

Suggest you at least add a horizontal scrollbar for us neanderthals.

Leave a comment