Monday, May 14, 2007

The Fine Art of Storing Things So That They Can Be Found

Part 2: Tagging Effectively


"When I use a word" Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean - neither more nor less"
from Lewis Carroll's "Through the looking glass", Chapter 6

I speak English. I have a little Latin, which comes from spending too much time around doctors and lawyers early in my career, and the trivial amount of French you can get from unenthusiastic attendance at three years of high school French classes. But I'm interested in language generally, how it works and where it comes from, and I have numerous books on the subject. Modern languages are constantly evolving, with new words being added, old words falling into disuse, and some words acquiring new meanings. English, as a language, is a particular mine field, because it attaches more than one meaning to many words. Take for example the word "heat": are we talking about temperature, or a single bout in a sporting event? There are thousands of examples, and the situation becomes more complicated year by year as various industries spew clouds of new acronyms into the public vocabulary, and as their marketing departments attempt to redefine indifferent design and sloppy production as "innovation", black as white and rampant greed as "customer service". And don't start me on the state of modern education, and how English is taught!

However, most people are a bit like Humpty Dumpty: they know what they mean when they use a particular word or acronym (the dictionary may not agree with them), and they will use words pretty consistently. If you have ever had to proof read another person's writing, you rapidly spot their little verbal tics: the words they can't spell, the turns of phrase, the grammatical abominations and the critical failures of punctuation that that they repeat. We do seem to have reached the point where most of the allegedly literate population cannot tell the difference between a colon and a semi-colon, and avoids using both in consequence.

What has this to do with tagging you ask (as I turn down the rant dial)? Well, habits are good. If you can be consistent about how you use words, you can tag effectively. The goal of effective tagging is really the creation of a personal language, to allow you, the user, to attach meaning to items that you will want to retrieve later. The tag words will mean what you want them to mean, and they do not have to mean anything to anyone else.

Now, before we go any further, did you read the post before this one, Part 1? If you didn't, some of this post may not make sense. Sorry if you found the last one rather heavy going: few people are excited by cataloguing, but I needed to get the background established.

Part 1 was about the use of folders, labels and tags to create structure. This post will attempt to address the question of how to choose good tags.

In the last post, I suggested that you round up all the tags that you are using at the moment, and have a look at them as a set. I did this myself, collating folder and tag names from del.icio.us, Furl and my EagleFiler databases. I found I had been pretty consistent, but there were a couple of annoying discrepancies (which I promptly fixed). I was interested to note that the problems were all in del.icio.us, which probably reflects the fact that I have used it longest, and refined my tagging techniques unconsciously while using it. So let's have a look at the lists (no need to read these in detail):

The folder structure for EagleFiler is as follows

Admin
Licenses
Passwords
Receipts
Bookmarks
CISSP
Consulting
Craft
Beading
Crochet
Knitting
Exercise
Graphics
Humour
Ideas
ITIL
Learning
Organisation
Personal
Personal Development
Project Management
Security
Six Sigma
Society
Technology
Mac OS
Microsoft
Networking
Performance
Shells
Software
Solaris
SQL
Tech Support
Unix
Travel
Work
Xref


(Note that I spell in English English most of the time). Because EagleFiler supports a nested folder structure, I am using it to apply two levels of classification to the stuff I store. Tagging then give me a third (coming back to this in a moment).

Tags from EagleFiler

blogging
blogs
bluetooth
books
cats
CISSP
classics
consulting
encryption
environment
gadgets
how_to
lace
librarianship
libraries_
licence
mac
math
motivation
mp3
museum
networking
news
ntp
palm
political_correctness
politics
privacy
programming
puzzles
receipts
sales
security
solaris
spam
sql
tcpip
technology
tools
travel
xref

Note the tag "libraries_" with the trailing underscore, to tell me not to use this tag (see previous post for explanation). You can see that there is some double up between folder names and tags, I will explain in a while.

Tags from del.icio.us

blackberry
blogs
books
changemgt
comics
community
cooking
craft
culture
daily
datacentre
design
dr/bcp
economics
education
employment
firefox
fixes
food
gadgets
galley
gaming
hardware
howto
humanity
humour
informix
it_industry
language
ldap
learning
legislation
lifehacks
linux
lists
logins
mac/win
macos
magazines
management
manuals
math
museums
music
networking
news
ntp
osource
palm
performance
perl
philosophy
photography
podcasts
presentations
productivity
programming
projmgt
puzzles
radio
rants
recycling
reference
science
search
security
services
sf
shopping
skepticism
soa
sql
stupidity
sun
sunint
support
technology
testing
thinking
tips
tools
training
travel
trivia
unix
usability
webdesign
wiki
world
writing
xml


I rely utterly on del.icio.us when I am away from my own systems, and I rarely bookmark anything locally. There will be a couple of mysteries in that list, explanation follows.

And finally, tags from Furl

Advice
age
Craft
Customer satisfaction
Development
economics
Education
Finance
Food
Heinlein
History
humour
Informix
Learning
Linux
Management
marriage
Microsoft
Misinformation
Network research
Networking
People
Perception
Performance
photoshop
Projects
psychiatry
Reference
Self improvement
SQL
Startups
Stress management
stupidity
Sun
Tips
Traffic
unix
Writing

Now here you will see a bit of inconsistency, because Furl supports proper alphabetical sorting, rather than ASCII sorting, so I haven't capitalised consistently. I use Furl to store pages that I shall want again, actual content rather than bookmarks. Stuff from Furl generally gets transferred to EagleFiler at a later date, but sometimes I need Furl to find things when I don't have my Mac handy.

You can see some variations. EagleFiler and del.icio.us don't support tags with spaces, while Furl does, and I am using the programs in slightly different ways. Notice the tags osource and sunint in the del.icio.us list: these are examples of personal tags. "osource" to me means "open source". del.icio.us doesn't support spaces in tags, so I made a contraction. Same with sunint, which is used to store URLs that will only work if I am logged into a particular vendor's partner network. This is what I mean by personal language: you can make it up, as long as you can be consistent.

But how to actually choose names and tags?

One of the greatest contributors to the field of modern librarianship and classification theory was a man called S.R. Ranganathan. He propounded the theory of colon classification, and illuminated it with his "Wall-Picture" principle (sometime called the cow-calf principle). Be warned before you go off and Google those names and terms: absolute reams have been written about this subject, and much of it is of a highly soporific nature. I'll try to keep this short.

The theory of Colon Classification states that for every object (for object in this context read "book"), there are five aspects or facets that can be used to describe the subject:

  • Personality—what the object is primarily “about.” This is considered the “main facet.”
  • Matter—the material of the object, what it is made from
  • Energy—the processes or activities that take place in relation to the object, how it is used etc.
  • Space—where the object happens or exists, typically geographic location
  • Time—when the object occurs, typically a time period rather than a specific date
Colon classification (the aspects are separated by colons) was supposed to allow you to use as many facets as you needed in the appropriate order to describe the object you were dealing with. It was flexible. It's not what we need for tagging, but it embodies a useful idea: every subject has different facets, which are simpler than the overall subject, and therefore easier to encapsulate in a tag.

The Wall-Picture principle is also helpful in tagging. It states that if concept A makes concept B understandable, then A must be listed first. As I recall this being explained to me in class (many moons ago), if you look at a picture of a boat hanging on a wall, the image of the boat depends on the picture, which depends on the wall. So Wall precedes Picture which precedes Boat in order of importance.

Now, hoping fervently that you are with me so far, let's try and apply that to tagging. Select a document/article/web page, and consider it: what are its five facets? Personality, Matter, Energy, Space and Time (librarians remember this as PMEST). You don't have to use all five facets, and I'm finding that for most of my material, three is enough. Your mileage may vary. Let's try some examples. First, Slate's review of "Pirates of the Caribbean III". Your first instinct might be to use "pirates" as a tag, but stop and think about the facets, and the Wall-Picture method. The primary tag should be "Film", because without the film there would be no pirates, in this case. Matter in this case isn't much use to us - by the time I get around to this, it will be a DVD. Energy is "Review" - remember, this is a review of a film about pirates. The subject of the film is pirates, the "energy" of the article is a review of the film. I would skip Space in this case, and for Time I would put 2007. Sticking to a pirate theme, consider this review of a production of the Pirates of Penzance. Here, I would make the primary tag "Operetta", the next tag "Review" and the third tag 2004. If I were collecting information about a particular actor, I might add a fourth tag using his/her surname.

Another example: this page is about the painting on the ceiling of the Sistine Chapel. For the Personality tag I would probably choose "Art". I would skip Material (too complex to be useful in this case). Energy is "Painting". Space would be "Vatican" - "Sistine Chapel" is too specific for me, though it might suit some dedicated scholars better; apply your "personal language" appropriately here". Time is 16th Century. If I were collecting a lot of material about art, I would probably add a tag for the artist's name, as well.

Try this one, an article from the Slow Leadership blog about delegation. What the "Personality" facet here? For me, it is "Management". The Material facet isn't useful in this case, skip it. The Energy facet is "delegation". Space and Time don't mean much in the context of this article. Why did I pick "Management" as the Personality facet? Well, I could have used "Leadership", but that is a little too generic for the content of the article: leadership could refer to military leadership. I track articles about good management (having been exposed to so much bad management over the years, I'm making a study of what to avoid in future), so "Management" to me has meaning.

Some articles will give you a clue: this one has a nice hint right there at the top: "Human Behavior". However, that is a bit long, so I use "People" instead. I mean "Human Behavior", but People is sufficient for me. I wouldn't apply any other tag to that page.

Going back to the folder and tag lists above, you may now be able to see what I am doing. In some cases, the same word appears as both a Folder name and a tag (example: networking). If the main focus of an article is networking (for me that means something involving computers, not people), then it goes in the Networking folder. But an article about troubleshooting computers might have some information about networking, which could be considered as the "Energy" facet: so I use the word twice, in two slightly different ways. Also, in EagleFiler, I am using the nested folders to describe some subjects in more detail, using the Wall-Picture principle. For example, Craft has subfolders for Beading, Crochet and Knitting.

I do hope that this makes some sort of sense, to someone other than me. Look for the five facets (Personality, Material, Energy, Space, Time). Apply the Wall-Picture principle, so you don't pick too narrow a topic. Develop your own personal tagging language. I'd suggest reviewing your tags regularly, to help reinforce consistency of use.

I'm hoping for some feed back on this one - did it help? Did it give you a headache? Any suggestions for a better approach? Drop me a line or leave a comment.

For those who have enquired, yes the jigsaw is now finished. Slightly more regular blogging will now resume - thank you for your patience!

Addthis

Bookmark and Share