Wednesday, January 31, 2007

Ninjawords -- because 18 DEX > *

Phil Crosby said that online dictionaries made him feel like shouting, in his words, "get out of my face and show me a definition!” We've all experienced this problem first-hand many times over the years, especially as dictionary.com, Merriam-Webster Online, and even Wiktionary bloated their pages more and more (though, in their defense, Wiktionary's "bloated" pages consist mostly of value-added content as opposed to graphics, ads, and inefficient layout coding). That's why he created Ninjawords.

In line with his original goal, Phil's solution is fast -- for me, on a mediocre broadband connection, a single lookup takes less than half a second from the time I press enter to the time I start reading the definition(s). Additionally, it is primarily populated with definitions from Wiktionary, which I'm all for. But to top it all off, Ninjawords has additional features that set it far ahead of the pack. For instance, you can enter multiple search terms by separating them with commas, and the results are all returned instantly on the same page. Also, when you have a typo, Ninjawords not only offers guesses as to what you meant to type -- it also automatically returns the definition of the most relevant suggestion.

The minimalist interface takes a queue from Google, and is welcome respite from the terrors of Merriam-Webster:


Check out the site for yourself!

Labels: , , , ,

Tuesday, December 12, 2006

Wiktionary: Replacing them all

Only an evolving medium can describe an evolving entity like the English language, especially if the entity evolves as quickly as the English language. Today, Wiktionary is four years old, and it makes the competition look incredibly slow -- slow to keep up with the language, and slow to load in my web browser.

If you were to look up the word podcast on Merriam-Webster Online, you would be greeted with the following:

As you can see, it's not in there, but I doubt that Merriam-Webster questions the legitimacy of the word: they use the word quite often. As I write this, the root word podcast appears on Merriam-Webster’s main page four times -- once as a common noun ("this free podcast delivers a daily dose of word power and fun"). Their podcast section is even listed as a free daily feature:

Isn’t it ironic? A dictionary website, whose main draw is ostensibly to find definitions of words, uses a word that’s missing from their own dictionary multiple times on the main page! A further irony is the fact that podcasts are a free service of Merriam-Webster, yet to find out what the word means, you’d be forced to search for it at another site.

While it’s possible that the word’s definition would be available from Merriam-WebsterUnabridged.com, the by-subscription site whose link is visible in the above screenshot, this gives rise to a whole new set of questions. Google has demonstrated that low-fi content (such as dictionary data) can be offered for free in a profitable business model based on targeted advertising, even when the backend is as comparatively complex as that of Google Search or Gmail. So in the case of Merriam-Webster, this seems like a no-brainer: If you really do have good dictionary content stored somewhere, make it free! While you’re at it, cut down on the bloatiness of your site.

I know that the word podcast does indeed appear in Webster’s New Millennium Dictionary of English, as can be found through a dictionary.com query, which searches through the "Preview Edition" of that text. But why doesn’t it appear on Merriam Webster’s main page? Years ago, I relied on their site for definitions. (They won me over because they had a clever little JavaScript prompt that allowed me, even back in my Internet Explorer days of 2000, to search their site for a definition without first loading the main page.*) They lost my business soon thereafter when I started to notice that even quite pedestrian words were occasionally missing from their data. I shudder at the thought of paying a subscription fee to access word definitions, so around five years ago I stopped using Merriam-Webster’s site altogether.

What’s more, I noticed that the New Millennium Dictionary, which does contain the word podcast, is missing from the list of versions that are included in the Merriam-WebsterUnabridged.com package. This suggests that Merriam-Webster is withholding content even from paying users. But it’s not just Merriam-Webster with problems…

The above screenshot is from the subscriber-view (it would appear that my university is a paying customer) of Oxford English Dictionary (OED). As you can see, the word podcast is absent from this site as well, despite the fact that the related New Oxford American Dictionary (NOAD) named podcast their word of the year for 2006. Meanwhile, on AskOxford, they proudly announce the addition of two seemingly useless words, even while "podcast":

I applaud Oxford for leveraging the publishing power of the web to update their online version of OED four times per year, which is a bit more often than a print version could feasibly be updated. But the first question I have to ask is, “why only four?” It’s not like web publishing is all that difficult — I even manage to update my silly blog more often than four times per year, and when I do, there are quite a few more words. Further, I can’t for the life of me think of a single technical, financial, or user-oriented reason that OUP stands to benefit from the practice of buffering their updates until a predetermined day each quarter.

The OED bills itself out as “The definitive record of the English language”, yet it fails to define words like podcast and laptop, each of which has over 100 million Google hits (whereas liposculpture has abount 300 thousand) and is in daily use all over the developed world. Omissions like these are rampant in MSN Encarta Dictionary, American Heritage Dictionary (which even lacks “blog” as a term), and every other print-and-web dictionary I could find. Clearly, these companies are behind the times.

Now, I’m not the kind of tech-head that demands webization or “web 2.0″-ization of all things, but nearly everyone I know already relies on the web for definitions. And right now, they are users that are up for grabs, searching for a product that doesn’t suck; a free product that won’t intentionally omit words to solicit paying subscriptions; a product that loads quickly; a product that is up-to-date.

The product they’re looking for is Wiktionary. Wiki technology is exactly what a dictionary needs. It is a platform that thrives on changes, change-averse language pundits (see my previous entry). Because language changes far faster than the publication cycle of top-down dictionaries, a by-the-users and highly iterative approach is needed. Wiktionary uses that approach. I encourage all of you to compare your current web-based dictionary of choice with Wiktionary and see how you like it. In instances where you find it failing, you can just edit the entry and justify your changes on its accompanying discussion page. I believe that the more active users a wiki has, the more accurate and useful it becomes — especially if Wikipedia is any evidence. Because a definition can be quickly added, edited, or deleted by any reader, Wiktionary has the potential to be everyone’s go-to resource for word definitions — I know it’s mine.

*Note: Nowadays, this is possible for most of the sites I use — it’s a built-in feature in Firefox to create similar shortcuts. For instance, if I type “wp ninja” in my browser’s address bar, I get Wikipedia’s search results for the query “ninja”. I have set up similar shortcuts for about 20 sites, including Facebook search, Urban Dictionary, my university’s online student directory, and a registrar WHOIS search.

Labels: , , , ,

Thursday, October 26, 2006

Language as Transient: When Syntax Matters and When it Doesn't

When I was younger, I was absolutely obsessed with learning the “correct” syntax for the English language, and would make constant mental judgements of the grammar and usage of those around me, sometimes even using it as a major factor in my estimations of their mental fitness. I even wore this as a badge of honor for a while.

I’d say around seven years ago — when I was fifteen, for those of you keeping score at home –I began to re-evaluate my position, as I found reasons to constantly challenge* the rigid set of rules that determine what is “correct”.** As more evidence surfaced, and I challenged the status quo, I started to think about why the status quo wasn’t effective enough. In the end, I decided that the ruling bodies that determine what is “correct” are too sluggish in doing so, and that language is evolving faster than they can make sense of.

Most people become more stubborn as they age. So, let’s suppose that a given person’s opinions about language were pretty much set in stone at around 30 years old. That means this person has around 50 more years to cling to those rigid beliefs — but over the course of 50 years, the vernacular changes drastically. That’s why there are so many people who are disgruntled about the way the language is, in their eyes, being abused. That which was once pure to them (i.e. when our example was 30) is being tainted, non-words are becoming words, and old words are being redefined. The only way to relieve such a person of their*** subbornness tends to be to an appeal to authority, such as pointing out the new entry of a word in the dictionary the person most respects. I call these people late adopters — they actually refuse to reconsider their stance until the ruling bodies have already reconsidered, revised, and re-published their stance.

I do not believe that language should be free of rules. Quite the contrary: I believe that, in the interest of fostering clear communication, it is important to maintain syntactical standards. But I feel that these standards should be adaptable enough to accomodate the transient nature of the language they govern. I think less weight should be placed on minor infractions of these standards, especially when the infractions in no way cloud the meaning of the speaker or writer. For instance, understanding the distinction between “your” and “you’re” is still important, because using the wrong one can cloud meaning. But the rule stating that writers must never start a sentence in a conjunction, like I’ve done twice in this paragraph, is obsolete.

Rules are a means to an end. Rules exist to govern behavior so that a certain goal, for instance peace or communication, can be achieved. In all areas where there are rules, a problem I’ve seen in a lot of conformists (and especially detail-oriented conformists) is that they allow their means to become their end: they decide that the rules must be followed at all times, even silly ones that are no longer necessary or sufficient to achieving that end, and then they defend those rules to the bitter end (ten bucks for my crappy pun, thanks). Stubborn so-called grammarians are among these sheep-like rule followers. But let me reiterate: I am not a language anarchist! My big picture view is that language syntax exists to make sure that communication is clear, and as such I respect the need for language syntax. But many people seem to think it exists for some higher purpose, because they follow it at all costs — even when the rules actually cloud communication! This was lampooned in a brilliant scene in one Mike Judge film, wherein an officer describes the victim as the guy “off in whose trailer they were whacking”, referencing blind obedience of the rule that no sentence should ever end in a preposition. Obscene or not, if this example doesn’t make it very clear how blind adherence to language rules can actually cloud communication, then what will?

*I believe that split infinitives should be allowed because it feels so awkward to avoid them. Also, because Picard uses one at the beginning of every single episode, and I have intensely blind faith in Picard’s decisions.

**In that last sentence, I made what some would consider to be a syntactical error by failing to enclose the trailing period in my quotation marks (while it would be silly to enclose the period within parenthetical asides such as this one). Further, the quotation marks syntax already makes such allowances for question marks and semicolons, but not periods. I think that it’s about time this awkward and inconsistent rule be destroyed. Most computer programmers with language aptitude share my opinion that, if we’re going to enforce a syntax, we should at least be consistent.

***I am a supporter of the singular they — not that it needs my support, as its heavy usage indicates that its eventual widespread acceptance is inevitable, just like that of “snuck” or “apron” was. It may not happen in my lifetime or yours, but it will happen.

Labels: ,