November 26, 2009

A while ago I was searching around the web for some history of the field of Information Retrieval and came upon an essay by Vannevar Bush, titled “As We May Think”, written in 1945. The article is fascinating. Bush writes about the state of science in 1945, addressing current problems and future directions. What is most interesting about his analysis is that he puts his finger on the problem of the unmanageable amount of information and his solution is something close to what we have today – computers, digitization and the internet. Looking back to his original cry for help though, it doesn’t seem that these fantastic machines and fantastic technology have actually saved us from drowning in information. Maybe we’re just more used to having to swim through these deep information waters.

I’ve included a lot of quotes from the text here because I think his writing is really captivating. You might actually want to just read the original essay instead (link above) – definitely worth the time, and not much longer than this blog post.

Bush begins by talking about what he sees as one of the main problems facing science in 1945:

There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember, as they appear.

The difficulty seems to be…that publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.

He knows the change is coming, that something big is on its way. How could it not? After all,

Note the automatic telephone exchange, which has hundreds of thousands of such contacts, and yet is reliable. A spider web of metal, sealed in a thin glass container, a wire heated to brilliant glow, in short, the thermionic tube of radio sets, is made by the hundred million, tossed about in packages, plugged into sockets—and it works! Its gossamer parts, the precise location and alignment involved in its construction, would have occupied a master craftsman of the guild for months; now it is built for thirty cents. The world has arrived at an age of cheap complex devices of great reliability; and something is bound to come of it.

But, he argues, there has yet to be such revolutionary technology in the realm of information storage and retrieval. When it comes to “the manipulation of ideas and their insertion into the record,”

…we seem to be worse off than before—for we can enormously extend the record; yet even in its present bulk we can hardly consult it….There may be millions of fine thoughts, and the account of the experience on which they are based, all encased within stone walls of acceptable architectural form; but if the scholar can get at only one a week by diligent search, his syntheses are not likely to keep up with the current scene.

The biggest obstacle separating us from information (“the record”) is caused by “the artificiality of systems of indexing”:

When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.

The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. It has other characteristics, of course; trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory. Yet the speed of action, the intricacy of trails, the detail of mental pictures, is awe-inspiring beyond all else in nature.

Man cannot hope fully to duplicate this mental process artificially, but he certainly ought to be able to learn from it.

What he is talking about is the difference between indexing practices that use controlled vocabulary and hierarchical relationships versus those that use natural languages and flatter, more associative based relationships. The former is exemplified by the classical library system, something like the Library of Congress classification system or the Dewey Decimal System, systems that Bush was familiar with. The latter, imagined system, we have today thanks to the digitization of information and the internet (think of programs that allow for user-generated tagging and bookmarking, etc; this is indexing in natural language in associative relationships).

Bush imagines this machine of the future that will help us organize and regain control over the endless waves of information:

Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, “memex” will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

He goes on to describe all the ways the memex could be used to link together documents  and items and to share those strings of links with other people. Of course, from today’s perspective the memex seems quite fanciful and more than a little amateur. But from a 1945 perspective, it must have been quite a striking and exciting vision. He concludes his essay by commenting on what this kind of information revolution might mean to people:

Presumably man’s spirit should be elevated if he can better review his shady past and analyze more completely and objectively his present problems. He has built a civilization so complex that he needs to mechanize his records more fully if he is to push his experiment to its logical conclusion and not merely become bogged down part way there by overtaxing his limited memory. His excursions may be more enjoyable if he can reacquire the privilege of forgetting the manifold things he does not need to have immediately at hand, with some assurance that he can find them again if they prove important.

This is the part I find most provoking. Has our ability to sort and organize a seemingly infinite amount of information actually “elevated” our spirits? Have our excursions into the world of science and information seeking become more enjoyable because of the personal computer and the internet? And have we really solved the problem Bush began his essay with, to be able to better navigate the maze of  “the record”? Has digital information and the internet helped you find what is most important now without losing things that might be important later?

I don’t know. I am in the middle of research-writing madness for school. Sometimes I can’t imagine how people ever wrote a research paper without the internet and digital files of nearly every published article there ever has been. Other times of course I realize that I have once again lost myself online, scanning and filtering through hundreds of possibly relevant records, wandering miles and minutes off track into the netherworld of hyperlinks. This usually ends with me feeling like I’ve wasted heaps of time and the feeling can only be ameliorated by shutting off my dear old computer and leaving the work unfinished til the next day.

  1. taylor permalink
    December 21, 2009 7:54 pm

    I agree that the food web of the cod’s habitat is much too cluttered. The main fact expressed is that all the species are interconnected. That is clear. And that’s important. But I think the food web could be more interesting and informative if the lines connecting animals were not uniform. For instance, thicker lines (or different shades) could be used to indicate the level of dependency between species. For example, in the food web of my own life, the line between me and coffee would be drawn with a thick black permanent marker to indicate extreme urgency and daily consumption, but the line between me and tofu would be thinner or paler to indicate less dependency. I have never seen a food web like this but I suspect they exist. Do they?

    • December 21, 2009 10:46 pm

      That’s a great suggestion! I’ve seen (and created) graphs like you’re describing, but not in a food chain context. The ones I’m familiar with have to do with link rates (how often one website links to another). But I don’t see why it couldn’t be applied to this – it would certainly give us more information, allowing us to see which food sources a species depends on most. I also like the idea of drawing a personal food web. If you end up making one, you know I’d love to see it!

