Sunday, April 08, 2007

The Fine Art of Storing Things So That They Can Be Found

Part 1

One of the advantages that Web 2.0 is supposed to bring us is the ability to organize information to suit ourselves, using mechanisms such as tagging. This is all well and good in theory, because it empowers the user, theoretically freeing them from the artificial constraints of formal classification theory. In reality, what appears to happen is that the user applies some combination of folders, tags, flags and labels to their documents/URLs/emails/whatever, and then discovers when they attempt to retrieve something that they have used "Spade" as a tag on one day and "Shovel" on another, and that they have created their own personal implementation of chaos.

I am not suggesting for one moment that the standards used to organize books (for which also read journals, newspapers, film and anything else that a library may hold) should be applied to personal electronic document storage. Having spent many years working with both the Dewey Decimal System and the Library of Congress Subject Headings (and similar tools), I have no desire to introduce either into my own or anyone else's daily work flow. There is a place for such formal tools and methods, and my hard disk is not it. However, there are some concepts that can be drawn from library cataloging practice, and these can be usefully applied without serious pain. That is what I want to explore here.

The process known as "cataloguing" in most libraries is actual three different activities, with different outputs. The first is what is called descriptive or physical cataloguing. The output of that process is a physical description of the item catalogued. For example, a book will be described as hard cover or paperback, the number of pages will be noted, and so on. If the item is a DVD or a microfiche, that will be noted. And in most libraries, physical format has implications for storage location, and we'll come back to that in a moment.

Then there is classification. This is the activity that determines the Dewey Decimal number (or whatever call number system is used) that is put onto the item. Classification attempts to say "the primary subject of this book is, for example, "Russian History", therefore we will stamp 947 on the spine, and it will be shelved with all the other books about Russian History, which also have 947 on the spine.

And then there is subject cataloguing, which puts an entry into the subject catalogue under Russian History, but adds some other subject entries for other topics that our book covers, perhaps "Russian Art", or "Climatology, Russian", or whatever. The book can only be in one place, physically, so the subject entries attempt to draw the potential reader to the work from other subjects that they might be interested in exploring.

So our theoretical book is placed on the shelf. If you look up the author or title (these will be in the descriptive cataloguing record), you will be directed to the 947 area of the library. But if you look up the subject catalogue under "Climatology, Russian", there will be a entry for this book which is mainly about Russian History, so you might want to check it out.

And the subject catalogue contains two other types of entry. One is the "SEE" entry. A SEE entry says "the term you looked up is not the one we are using, use this one instead". So if you look up "Russian Climatology", you might find an entry that says "Russian Climatology SEE Climatology, Russian". The second is the SEE ALSO entry, which attempts to suggest related terms that you might not have thought of yourself. So you look up "Russian History" and find an entry that says "Russian History SEE ALSO Russian Mythology".

Now let's go back to storage locations. A library has specific areas for different types of media, partly because some things need special handling, and partly to make the most efficient use of shelf space. For example, very large format books are normally stored on a different, taller, shelf to standard paperbacks, even though they may be about the same subject. Film will be stored in a different area to magazines, and so on. There is a well understood (well, at least by the librarians) system for determining what goes where, so that items returning to the shelves always go back to exactly where they came from, and it is vital that the system is followed consistently: in a large library, a book shelved in the wrong place might as well be lost.

Now you and I, dear reader, need to work out a similarly predictable system for managing the stuff we are storing. Our system only has to suit us, and we don't have to justify it anyone else, but if we don't enforce a few basic rules on how we do things we shall have a mess. A moment of distraction, being too tired or under the weather, and we won't be able to find what we are looking for next week.

I am working with EagleFiler: if you are working with a different application, you will need to draw some parallels here. If you can't make sense of this, mail me and I'll try to clarify things. Let's start with a walk through the levels of organization that we have to play with.

In EagleFiler, at the highest level we have the Library (database). Below that we have folders and subfolders. Then there are tags, labels and flags. The different EagleFiler Libraries roughly correspond to different branches of a real library. I used to work for the Public Health Department of Western Australia, which had a central library in the head office, that kept a wide range of medical books; then there was a branch library for Community and Child Health Services, which stored material on early childhood, childbirth and such; another branch was associated with the State X-ray Laboratory, and so on. The materials were stored as close as was practical to the people likely to need them, according to speciality.

I have set up one main library database, which is the dumping ground for all sorts of ephemera, trivia, web pages, notes, receipts and so on. It is my in tray: stuff with no better home gets parked here. I have set up a second library to store recipes. If I decide to use EagleFiler to manage documents related to client projects, I shall set up one library per customer. So the Library division I am using is partly about subject matter (food or non-food), but also about function: working projects separated from the random litter.

Then we have folders. Folders in my mind roughly equate to the classification of a piece of material, the main topic. Unless I want to keep two copies of an item, each item can only exist in one folder. So folder will normally relate to subject, except when it relates to format. If I am creating folders for project documentation, I keep diagrams in one folder and reports in another, and the reason for this is simple: I name my reports and diagrams according a scheme which includes the date, in YYYYMMDD format, followed by a revision number. I want to be able to see them all together, so I can spot the latest, and that will be harder to do if diagrams are mixed with reports. But note that this only works if I am completely consistent about naming files.

Then we come to tags, flags and labels. In EagleFiler, there is only one flag. This is if limited informational utility, and I can only imagine using it for functional purposes, as a visual reminder to deal with something promptly. EagleFiler has access to 7 labels, but they are not implemented in the application itself: they are the same labels that you use in Finder. Label something in Finder, and the label will appear in EagleFiler. Seven labels is not enough to be informationally useful. Functionally, you might choose to use them for the days of the week, if your workflow is very time oriented. Or you might use them to mark items for particular purposes, for example: mark all this year's receipts green, to you know that they are for this tax year. Or mark particular content with a red label, so you know it is not suitable for work, and should not be displayed in the office. I do have another suggestion for the use of labels, but I want to cover tags first.

Tags are your subject headings, your pointers to one document from many different places. For tagging to be effective, it is important that it be consistent. Let's say that you are collecting material on vehicles and transport in general. You have a folder called vehicles, containing subfolders called "cars", "trains", "bikes" and "boats". You acquire a nice picture of a Porsche Boxster, but in moment of vagueness (we all all have them), you tag the graphic "boxster" instead of "porsche". You file the image in the "cars" folder. Next week you search the "cars" folder for all documents tagged "porsche" and wonder why that nice picture you know you saved recently does not appear. Because the thing is a graphic, full text searching won't help you, unless the file name contains the word "porsche". You will have to sift through the whole "cars" folder to find the picture. A forced example, but you get the idea: if you tag using the first word that pops into your head, you are unlikely to get a consistent result.

To make the best of tagging, you need to train yourself to be consistent, and this will be easiest if you can establish some personal "rules" before you start applying tags. But even if you already have a few hundred (thousand?) tagged documents, you can still achieve order and control. To do this you need two things. The first is the hard one: you need to make a personal "definition of terms". You need to decide what particular words mean to you. We'll come back to that one, because I suspect that it warrants a whole separate post. Believe me that this is do-able, and if you can get that particular skill, you will have learned something that you can apply in many different aspects of your life. The second is a little tricky, but equally achievable (at least in EagleFiler, I can't warrant your app.): you need to establish some pointers inside the application, to help you choose the right tag consistently. To do that, we will go back to the library's idea of SEE and SEE ALSO references. You can set up a similar mechanism using a combination of tags, labels and special purpose documents.

First, if you already have a repository of tagged material somewhere (EagleFiler, Furl or whatever), have a look at the tags you have already used. See any that conflict or overlap? Pick the term that you prefer, the one you want to use consistently in future. Let's say that you have a tag "managers" and a tag "management", and that you prefer to use "management" in future. To get things consistent you need to start by assigning the tag 'management" to all the items that have the tag "managers", and getting rid of the "managers' tag. In EagleFiler this is easy: find all the items tagged "managers", and drag them to the tag "management" in the Source pane on the left of the screen. That assigns the tag. Now delete the tag "managers". That will unassign the tag from all the items in your library. Now, to remind yourself not to use the "managers" tag again, recreate it, but this time put an underscore at the end of the word: managers_ . The next time you go to assign a tag, check that the one you are choosing does not end in an underscore.

For some people this will be enough, but you can take it further if you like. Add a document to you library, and call it "managers". The content of the document should say:

SEE: management

you may want to tag the document with your "managers_" tag. Do what makes sense to you, personally. Then change the label on the document, and set it to a colour, perhaps orange. Every time you create one of these SEE documents, to point yourself to a different tag, colour it orange. You can also set up a SEE ALSO structure in the same way: create a document called "management" and put in it a list of all the other tags that are related, for example "staff development"; you might want to add a definition of terms, for your own future reference. Change the label and colour the document to suit your tastes, and tag it appropriately. In future, these special purpose documents will turn up when you search, and their colouration will remind you of what they are, and prompt you to check other tags.

You may notice that I'm specifying all tags in lower case letters. This is because I've noticed that EagleFiler is sorting tags according to the ASCII character set, so it is presenting tags beginning with capital letters before tags beginning with lower case letters, for example "Librarianship" before "lace". I want tags to present in strictly alphabetical order, so I'm keeping everything in lower case.

You will need to experiment with these suggestions, and decide what works for you. If the labour of setting this up outweighs the benefits you get from it, don't bother. Do as much or as little as you need to make things work efficiently for you.

Next post: identifying the appropriate tags for a document.


Bookmark and Share