Monday, December 31, 2007

DIY Databases

Just before Christmas, while I was tidying up my study (read: moving piles from one place to another), I came upon a small box, which contained a stack of slips of paper. Each slip contains a single quotation, with the appropriate citation and, in some cases, additional notes. Type-written. This is a very old box, and the contents date to the long-ago days when I still worked as a librarian, and there was no Internet. Computers were very large, and programming was done on punch cards: end users who wanted to compile collections of quotations either wrote them out by hand, or typed them. So this little box has moved house with me several times, and I have intended to convert the contents to some convenient electronic format for some years now.

So I looked at the box, sifted through the bits of paper and thought "surely I have enough software to deal with this now, and then I can chuck out the paper, which would be a fine thing given the amount of "stuff" packed into my rather small study".

My first thought was Bento, and I quickly constructed a suitable database: no problem. Database design is very easy in Bento, and I input half a dozen records to try it out. Data entry was easy, I adjusted the input form to suit my requirements. Then I started looking at output, and there I hit a problem. Bento's facilities for outputting a formatted report are pretty primitive: I can't find a way to get it to fit multiple records on a page.

So I fossicked around, and turned up iList Data, which looked promising. Database set up was straight forward, though I suspect that iList Data is a whole lot more database than I really need. But the Report Design features are pretty limited - a bit better than Bento, in that I can get multiple records on a page, but nothing like the control I'm looking for in a report generation tool.

It has taken me a while to work out what it is that I am looking for, but I think I have figured it out. The whole problem that I am having with database applications goes back to the period between 1988 and 1990, when I was working with a product called BRS/Search. BRS was a remarkable piece of software: it was a full text indexing database, but it was not relational. It had straightforward design tools (well, I thought they were easy to use), that would let you define the fields that you wanted in your database and design an input form. Then you loaded whatever the data was into the database, and it indexed it using a reverse index method: it built a table (I'm working from memory here) that listed each word precisely once, and then recorded the "coordinates" for each instance of that word in the data. So if the word "car" appeared, it would be referenced as appearing in Record N, Field F, Line L and Word W or whatever. If the word appeared again, another coordinate record would be added to the "car" entry. This gave it a very fast search engine, and made the construction of complex Boolean searches very easy.

And BRS had a report generation language which was, as I recall, effectively a 4GL scripting language. You could control the layout of reports quite closely, displaying part or all of fields, and positioning them where you wanted on screen or paper.

The company I worked for at that time sold BRS to all sorts of customers. Lawyers used it for litigation support. Advertising companies used it to handle campaign details. It was used to catalogue music and index contracts. It ran on various flavours of Unix, and it appears to have been swallowed up by some larger company, and I don't believe that it exists as a separate product anymore, certainly not in the way it did 20 years ago.

I think that is what I am looking for now: I don't need a relational database, I just need something to index chunks of text, with flexible input and output. Any suggestions?


Bookmark and Share