Friday, September 22, 2006

So what is at the Bottom?

The readings this week on databases was indeed deep and wide. Remeniscent of swimming in a lake, realizing you cannot touch the bottom, and wondering what really is down there in the dark green? Scum, sunken ships, debtors in cement overshoes? Exabites indeed give me a cold chill. But I couldn't help wondering-yes it is a lot of information, but how much of it is worth saving? Sure, the IRS needs some of that info and court cases hinge on scraps of evidence. But what about that text message I sent to my husband on August 9? Ironically, as historians, we wish we had access to all those private conversations between Napoleon and Josephine, or just a few more tavern bills from the early fifteenth century. My mentor's research for her doctoral program included searching Tsar Peter Alexivich's account books of his grand embassy through Europe. She loved finding out how much the Russian treasury had to pay out for damages done to castles after particularily riotous parties. Clearly, we must be overtly concerned with databases. Our very livelyhoods depend on the condition of the primary sources currently locked away in archives. One point these articles did not bring up is the rush by certain archives to label important works as "National Treasures" and completely removing them from the hands of researchers. This process is happening at an alarming rate-occasionally before top quality scans of the works get made, whether for lack of funds or fears of harming the work. I was able to view an incanabula from 1483 in the Library of Congress last year, but they would not allow me to photograph the book, nor would they make digital images of it for fear of harming it-even though I was prepared to pay their posted rate. Ironically I was able to find a second copy of this work on the West coast, and had total access to it because someone had put the wrong publishing date in the card cataloge. Their records indicated the book was published in 1960! I wasn't about to dissuade them.
Crane's article "What do you do with a million books?" breifly touched on the problem of how to save the data in formats that allow the researcher the best suitable access to it. Beyond the problem of getting all the archives in developing nations--central Europe, for example, on board this movement. They have more pressing issues to spend their limited funds on. Crane's ideas I want to discuss is the desirability of the most effective levels of granularity and of noise to convert these files to. The outrageous amount of material covered in each book will make it imperative that some sort of mechanical process, much like Dan Cohen's search tools, will be necessary to determine what are the key words for each document. Because of the nature of the English language, with its lack of organized spelling techniques, any book published before the nineteenth century will demand customized solutions for the varieties of spelling. This is a problem, as I understand these words, of both granularity-how many individual bits of information to use to search the book, and of noise--the unique peculiarities of the characters and the computer's ability to read them. Both these issues affect the researcher's ability to browse through and to actually use older books. To say nothing of the complexities of images and of marginalia. Will we see a day when we can access handwritten works via computer-based databases? For now, save up your frequent flier miles so you can go gaze upon the primary sources--and don't forget your white gloves.


Post a Comment

Links to this post:

Create a Link

<< Home