Monday, May 14, 2007

The Fine Art of Storing Things So That They Can Be Found

Part 2: Tagging Effectively


"When I use a word" Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean - neither more nor less"
from Lewis Carroll's "Through the looking glass", Chapter 6

I speak English. I have a little Latin, which comes from spending too much time around doctors and lawyers early in my career, and the trivial amount of French you can get from unenthusiastic attendance at three years of high school French classes. But I'm interested in language generally, how it works and where it comes from, and I have numerous books on the subject. Modern languages are constantly evolving, with new words being added, old words falling into disuse, and some words acquiring new meanings. English, as a language, is a particular mine field, because it attaches more than one meaning to many words. Take for example the word "heat": are we talking about temperature, or a single bout in a sporting event? There are thousands of examples, and the situation becomes more complicated year by year as various industries spew clouds of new acronyms into the public vocabulary, and as their marketing departments attempt to redefine indifferent design and sloppy production as "innovation", black as white and rampant greed as "customer service". And don't start me on the state of modern education, and how English is taught!

However, most people are a bit like Humpty Dumpty: they know what they mean when they use a particular word or acronym (the dictionary may not agree with them), and they will use words pretty consistently. If you have ever had to proof read another person's writing, you rapidly spot their little verbal tics: the words they can't spell, the turns of phrase, the grammatical abominations and the critical failures of punctuation that that they repeat. We do seem to have reached the point where most of the allegedly literate population cannot tell the difference between a colon and a semi-colon, and avoids using both in consequence.

What has this to do with tagging you ask (as I turn down the rant dial)? Well, habits are good. If you can be consistent about how you use words, you can tag effectively. The goal of effective tagging is really the creation of a personal language, to allow you, the user, to attach meaning to items that you will want to retrieve later. The tag words will mean what you want them to mean, and they do not have to mean anything to anyone else.

Now, before we go any further, did you read the post before this one, Part 1? If you didn't, some of this post may not make sense. Sorry if you found the last one rather heavy going: few people are excited by cataloguing, but I needed to get the background established.

Part 1 was about the use of folders, labels and tags to create structure. This post will attempt to address the question of how to choose good tags.

In the last post, I suggested that you round up all the tags that you are using at the moment, and have a look at them as a set. I did this myself, collating folder and tag names from del.icio.us, Furl and my EagleFiler databases. I found I had been pretty consistent, but there were a couple of annoying discrepancies (which I promptly fixed). I was interested to note that the problems were all in del.icio.us, which probably reflects the fact that I have used it longest, and refined my tagging techniques unconsciously while using it. So let's have a look at the lists (no need to read these in detail):

The folder structure for EagleFiler is as follows

Admin
Licenses
Passwords
Receipts
Bookmarks
CISSP
Consulting
Craft
Beading
Crochet
Knitting
Exercise
Graphics
Humour
Ideas
ITIL
Learning
Organisation
Personal
Personal Development
Project Management
Security
Six Sigma
Society
Technology
Mac OS
Microsoft
Networking
Performance
Shells
Software
Solaris
SQL
Tech Support
Unix
Travel
Work
Xref


(Note that I spell in English English most of the time). Because EagleFiler supports a nested folder structure, I am using it to apply two levels of classification to the stuff I store. Tagging then give me a third (coming back to this in a moment).

Tags from EagleFiler

blogging
blogs
bluetooth
books
cats
CISSP
classics
consulting
encryption
environment
gadgets
how_to
lace
librarianship
libraries_
licence
mac
math
motivation
mp3
museum
networking
news
ntp
palm
political_correctness
politics
privacy
programming
puzzles
receipts
sales
security
solaris
spam
sql
tcpip
technology
tools
travel
xref

Note the tag "libraries_" with the trailing underscore, to tell me not to use this tag (see previous post for explanation). You can see that there is some double up between folder names and tags, I will explain in a while.

Tags from del.icio.us

blackberry
blogs
books
changemgt
comics
community
cooking
craft
culture
daily
datacentre
design
dr/bcp
economics
education
employment
firefox
fixes
food
gadgets
galley
gaming
hardware
howto
humanity
humour
informix
it_industry
language
ldap
learning
legislation
lifehacks
linux
lists
logins
mac/win
macos
magazines
management
manuals
math
museums
music
networking
news
ntp
osource
palm
performance
perl
philosophy
photography
podcasts
presentations
productivity
programming
projmgt
puzzles
radio
rants
recycling
reference
science
search
security
services
sf
shopping
skepticism
soa
sql
stupidity
sun
sunint
support
technology
testing
thinking
tips
tools
training
travel
trivia
unix
usability
webdesign
wiki
world
writing
xml


I rely utterly on del.icio.us when I am away from my own systems, and I rarely bookmark anything locally. There will be a couple of mysteries in that list, explanation follows.

And finally, tags from Furl

Advice
age
Craft
Customer satisfaction
Development
economics
Education
Finance
Food
Heinlein
History
humour
Informix
Learning
Linux
Management
marriage
Microsoft
Misinformation
Network research
Networking
People
Perception
Performance
photoshop
Projects
psychiatry
Reference
Self improvement
SQL
Startups
Stress management
stupidity
Sun
Tips
Traffic
unix
Writing

Now here you will see a bit of inconsistency, because Furl supports proper alphabetical sorting, rather than ASCII sorting, so I haven't capitalised consistently. I use Furl to store pages that I shall want again, actual content rather than bookmarks. Stuff from Furl generally gets transferred to EagleFiler at a later date, but sometimes I need Furl to find things when I don't have my Mac handy.

You can see some variations. EagleFiler and del.icio.us don't support tags with spaces, while Furl does, and I am using the programs in slightly different ways. Notice the tags osource and sunint in the del.icio.us list: these are examples of personal tags. "osource" to me means "open source". del.icio.us doesn't support spaces in tags, so I made a contraction. Same with sunint, which is used to store URLs that will only work if I am logged into a particular vendor's partner network. This is what I mean by personal language: you can make it up, as long as you can be consistent.

But how to actually choose names and tags?

One of the greatest contributors to the field of modern librarianship and classification theory was a man called S.R. Ranganathan. He propounded the theory of colon classification, and illuminated it with his "Wall-Picture" principle (sometime called the cow-calf principle). Be warned before you go off and Google those names and terms: absolute reams have been written about this subject, and much of it is of a highly soporific nature. I'll try to keep this short.

The theory of Colon Classification states that for every object (for object in this context read "book"), there are five aspects or facets that can be used to describe the subject:

  • Personality—what the object is primarily “about.” This is considered the “main facet.”
  • Matter—the material of the object, what it is made from
  • Energy—the processes or activities that take place in relation to the object, how it is used etc.
  • Space—where the object happens or exists, typically geographic location
  • Time—when the object occurs, typically a time period rather than a specific date
Colon classification (the aspects are separated by colons) was supposed to allow you to use as many facets as you needed in the appropriate order to describe the object you were dealing with. It was flexible. It's not what we need for tagging, but it embodies a useful idea: every subject has different facets, which are simpler than the overall subject, and therefore easier to encapsulate in a tag.

The Wall-Picture principle is also helpful in tagging. It states that if concept A makes concept B understandable, then A must be listed first. As I recall this being explained to me in class (many moons ago), if you look at a picture of a boat hanging on a wall, the image of the boat depends on the picture, which depends on the wall. So Wall precedes Picture which precedes Boat in order of importance.

Now, hoping fervently that you are with me so far, let's try and apply that to tagging. Select a document/article/web page, and consider it: what are its five facets? Personality, Matter, Energy, Space and Time (librarians remember this as PMEST). You don't have to use all five facets, and I'm finding that for most of my material, three is enough. Your mileage may vary. Let's try some examples. First, Slate's review of "Pirates of the Caribbean III". Your first instinct might be to use "pirates" as a tag, but stop and think about the facets, and the Wall-Picture method. The primary tag should be "Film", because without the film there would be no pirates, in this case. Matter in this case isn't much use to us - by the time I get around to this, it will be a DVD. Energy is "Review" - remember, this is a review of a film about pirates. The subject of the film is pirates, the "energy" of the article is a review of the film. I would skip Space in this case, and for Time I would put 2007. Sticking to a pirate theme, consider this review of a production of the Pirates of Penzance. Here, I would make the primary tag "Operetta", the next tag "Review" and the third tag 2004. If I were collecting information about a particular actor, I might add a fourth tag using his/her surname.

Another example: this page is about the painting on the ceiling of the Sistine Chapel. For the Personality tag I would probably choose "Art". I would skip Material (too complex to be useful in this case). Energy is "Painting". Space would be "Vatican" - "Sistine Chapel" is too specific for me, though it might suit some dedicated scholars better; apply your "personal language" appropriately here". Time is 16th Century. If I were collecting a lot of material about art, I would probably add a tag for the artist's name, as well.

Try this one, an article from the Slow Leadership blog about delegation. What the "Personality" facet here? For me, it is "Management". The Material facet isn't useful in this case, skip it. The Energy facet is "delegation". Space and Time don't mean much in the context of this article. Why did I pick "Management" as the Personality facet? Well, I could have used "Leadership", but that is a little too generic for the content of the article: leadership could refer to military leadership. I track articles about good management (having been exposed to so much bad management over the years, I'm making a study of what to avoid in future), so "Management" to me has meaning.

Some articles will give you a clue: this one has a nice hint right there at the top: "Human Behavior". However, that is a bit long, so I use "People" instead. I mean "Human Behavior", but People is sufficient for me. I wouldn't apply any other tag to that page.

Going back to the folder and tag lists above, you may now be able to see what I am doing. In some cases, the same word appears as both a Folder name and a tag (example: networking). If the main focus of an article is networking (for me that means something involving computers, not people), then it goes in the Networking folder. But an article about troubleshooting computers might have some information about networking, which could be considered as the "Energy" facet: so I use the word twice, in two slightly different ways. Also, in EagleFiler, I am using the nested folders to describe some subjects in more detail, using the Wall-Picture principle. For example, Craft has subfolders for Beading, Crochet and Knitting.

I do hope that this makes some sort of sense, to someone other than me. Look for the five facets (Personality, Material, Energy, Space, Time). Apply the Wall-Picture principle, so you don't pick too narrow a topic. Develop your own personal tagging language. I'd suggest reviewing your tags regularly, to help reinforce consistency of use.

I'm hoping for some feed back on this one - did it help? Did it give you a headache? Any suggestions for a better approach? Drop me a line or leave a comment.

For those who have enquired, yes the jigsaw is now finished. Slightly more regular blogging will now resume - thank you for your patience!

11 comments:

Forrest said...

Thanks for the great post! Really appreciate it.

Will need some time to digest but it is interesting enough.

PS- i read the post twice :)

Dana said...

It's great to see someone else's take on organization. I love it and I look forward to future posts.

I've been using Eagle Filer for almost a month now and the trial is just about up. I was really excited about it when I first started using it, for most of the same reasons that you chose it. However, I've found it rather unstable. I'm running it on a MacBook Pro with 3GB of RAM running OS X 10.4.9 and I experience at least one crash a day. In one case, the meta data actually became corrupt and I spent an hour retagging and pointing the app to files it thought were missing. Thankfully, nothing was lost because everything is kept in the file system -- BIG bonus points there.

But I was wondering how you were finding working with Eagle Filer on a daily basis. Have you experienced any unstability?

Melodie Neal said...

I'm glad you folks are finding my efforts useful! On the subject of EagleFiler stability, I've had no problems at all. I'm running a 15" PowerBook with 1GB of RAM, running Mac OS X 10.4.9. I have a lot of applications open simultaneously (I really need more RAM, but I'm holding out until later in the year to update my kit). I have EagleFiler open all the time, and drop things into it as I'm working, and it hasn't crashed on me once.

If you are having stability problems, the first thing I'd suggest is going to the Disk Utility, and do Repair Disk Permissions.

Stan said...

Thanks for the very helpful posts on storing and tagging documents—I'm finding them very helpful. These concepts (colon classification, wall-picture) aren't easy to ferret out of the literature by a layman. I have questions on tagging and on Eaglefiler:

Tagging

I can see that it makes sense to tag documents when you're dealing with hundreds of documents that have to be cross-referenced and coordinated for a large research or other project. I am, however, wondering whether it's really necessary for general use. I seem to be able to find most things with Spotlight and/or other search tools. As we can reasonably expect these tools to continue to improve, is it really worth the effort to manually tag all documents? Particularly if one takes a little care in deciding where to file it so that browsing your file structure alone can be productive. I suppose that we’re looking at a continuum: on one end is a more-or-less well-thought-out file structure with all documents (and aliases in those cases where you really do need to put a document in more than one place) filed appropriately; at the other would be keeping everything in a single undifferentiated heap but tagging every item meticulously and using some combination of searching on tags and smart folders to sort and find things. I’d be very interested in how you and others think about this question.

Eaglefiler

Aside from sluggish performance and occasional awkwardness in the interface, I haven’t had any problem with Eaglefiler. It seems to me that it’s primarily a substitute for the Finder. As with all software, I have to ask, “Do I need this?” For example, I can’t live without a high-end word processor or a high-end spreadsheet. I can’t imagine how I ever lived without Curio. But I’m on the fence about Eaglefiler (or similar products). Given that the Finder will most certainly continue to improve, what are the features in Eaglefiler you (anyone reading this) can’t live without that the Finder either doesn’t do or does only with great effort?

Thanks again for your efforts—they have been extremely helpful.

Forrest said...

@stan

I can see your point. Tell you what, i've bought PathFinder ages ago, re-installed twice, but still using Finder. But i am observing EagleFiler closely

Melodie Neal said...

Sorry for the delayed response to comments, still a bit backlogged....

I agree with Stan's comments, that Finder can be used to locate most things, and can be expected to improve. There are, however, a few things that I don't think it will ever deal with perfectly: diagrams, pictures and other graphics. And I'm sure it will never learn to read my mind. Some times the meaning of a file is only going to obvious to me. Let me try to provide some examples.

I work in the IT industry, and I have files on my Mac that relate to 2 decades of projects. I specialize in Unix-based systems and security, and almost every document that I write contains a common set of words (Solaris, AIX, Sun, IBM, Checkpoint, Juniper, vulnerability, etc, etc). If I search on those words in Finder, I get thousands of hits. I then have to look at the folder structure, to get a feel for what document relates to what project. Furthermore, some things just don't search well: I have pictures of pieces of equipment and racks of kit, diagrams of networks and packet flows, and a large collection of cartoons (Viva! Non sequitur, Dilbert, et al.) Other than the file title, Finder may not have a lot to go on. I also store Visio diagrams on my Mac: until the day that there is a native version of Visio for Mac, Finder can't read those files, so it can't index them.

This has happened to me before: I remember that I have drawn a diagram, or written a document that explains something in tedious detail, and I cannot recall which customer I did it for, or in what year it was done. In fact, this happened last week. Somebody asked me for a template for a system build document (basically, what questions should we ask the customer so that Solaris gets installed to their liking on their new machine, and we don't have to reconfigure the disk layout several times before everyone is happy). I knew I had drawn up such a document for a customer in the past, on several occasions. I knew I would need the most recent, because this is for Solaris 10. I file documents in my file system in a folder structure that goes Customer Name/Project Name/files. Date order filing is meaningless, because big projects span years, particularly if you get lumbered with a real death march job. I could not recall the name of the customer I needed. Much brain racking ensued, but still it would not come (I could remember that they were in Queensland, but that was no help). Finder could not help me, because there were thousands of matching documents for anything I could think of searching.

Finally, I resorted to sorting my top level (Customer Names) directory in date order, and scanned the most recent customer names - I knew I'd done the job in mid 2005, because I only just managed to close it before I quit Sun. Even then, I had trouble spotting it - the customer was a government department with a long and inconvenient name, and I had filed them under their preferred three letter acronym, which didn't leap out at me. However, I did find the file I wanted. It took me about 25 minutes. Tagging could have saved me a lot of time.

Finder can only deal with the content of a file: if can't tell what it means to you.

stan said...

Your issues/frustration are similar to mine, Melodie. And EagleFiler certainly provides a solution. I'm still not clear on what EagleFiler's tags do that putting tags (with a prefix character to distinguish them as tags) in Spotlight comments don't.

Thanks for the pointer, Forrest. I'm downloading PathFinder now to see how it compares to EagleFiler.

michaelbywater said...

Your posts on organization have that "Aha!" quality which 99% of the things people blog about simply don't trigger in me. I mean particularly the enormous amount of "personal productivity" and "organisation" stuff currently floating about, for example.

The Wall:Painting example is superb. Soporific? No. Galvanizing. This is the sort of thing they ought to teach at school, or at least at University. I went to higher degree level and nobody -- not even at one of the world's "top three" universities -- ever took me aside and said, "See, this is how you can organise things."

I agree that taxonomy/folksonomy is changing; I agree that Spotlight (especially now Leopard has got it right, or at least Much Better) changes the way we do things. But we still need to organise our stuff -- I use DevonThink -- and we still therefore need some rational system for organising it.

The one thing I've never had problems using is the Cambridge University Library, a monster of a thing but easy to navigate because it has a system. I should have guessed that a trained librarian would come up with a system which makes sense for me...

As for finding images; might I suggest you try LEAP (odd name but a project with superb potential) from

http://www.ironicsoftware.com/

It's at odds with logical cataloging, but one thing it does beautifully is locate things like the JPG you describe. I think you may like it.

Thank you for a genuinely useful post. Now, if you'll excuse me, I have some re-filing to do...

(Which reminds me. Someone once said the big breakthrough in his own filing system was when he realised it wasn't about filing; it was about *retrieval*. Good line, I think.)

Melodie Neal said...

Hi Michael

Response to your comments in post here

John (with an h) said...

I just started using EagleFiler and read this amazing post on good tagging.

So I started organizing folders and sub-folders.

And started wondering what the advantage of doing this in EagleFiler over Finder was.

Further reading, and your post response to Michael seemed to answer that question with:
"Ease of information retrieval."

Am I understanding this correctly? tagging in EagleFiler is easier than adding tags to a file's Spotlight info field. And doing tag searches (and pre-configured Smart Folders) makes finding information which one is looking for ... easier.

Would appreciate a clarification of anything I'm missing.

Melodie Neal said...

Hello John

I personally find updating the Spotlight comments field rather clunky. It doesn't fit my workflow, so it's likely not to get done. For me, it is all about ease of use, ease of retrieval. EagleFiler gives me that: it works the way that I work.

Thanks for commenting - I had forgotten this sequence of posts, good to see that someone is still finding them useful. I wrote them during a period in my life where I had spare time, which is now a very scarce resource. I will try to get back to the subject in the not too distant future.

Addthis

Bookmark and Share