Skip to content

Just a bag of words

September 29, 2009

As mentioned earlier, it was Wittgenstein who pointed out that words are a vehicle for meaning. I’ve begun to think about it as a channel, where language is one medium for our thoughts and knowledge. The medium and the message become inseparable to a point,  yet they do remain different things. We use language to describe meaning. Words are the reflections of concepts.

Thinking about this in relation to information retrieval, it’s pretty obvious that our personal descriptions of what we mean – communicating our thoughts, our needs, and our knowledge – can lead us to walls:

If meaning is so wrapped up in context, and understanding  is so dependent on this very human context, how does a computer search engine find answers to our questions?

Information retrieval systems, like Google, are based on an assumption about natural language and how we use it. The assumption is that documents that share vocabulary are also related semantically (meaningfully). This is because when we write words (and sentences and documents) we write them down with meaning. We aren’t writing random words for the fun of it (although I can think of some exceptions in the land of poetry).

So when you ask Google to search for something, the search engine looks through its database, finds the matching terms in its index and returns documents containing those terms. But the index is just strings of words and documents, without any context whatsoever. To the computer the words are random – a bag of words – and it just plays a matching game. And although there are sometimes results that make no sense to us, this automatic, systematic and statistical game actually works a whole lot of the time. Weird, eh?

*This is obviously a simplified version of how an online information retrieval system works, but this is the basic idea. If you consistently get crappy results when searching the web, try putting in more words. The more words the computer has to play the matching game with, the better the results (usually).*

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: