Paul’s latest blog (I’m sure his stop at the Semtech2009 conference will have given him lots of fresh ammunition) contains some interesting citations that provide ample food for thought. What struck me most is the following passage, Paul is citing Thomson Reuters: “(…) we’re introducing OpenCalais Social Tags. Social Tags is our attempt to emulate how a human might tag the document. Social Tags does some fairly sophisticated analysis of your entire document and maps it to a knowledgebase based on Wikipedia and other assets. From that process we generate Social Tags.“ I hear my own thoughts in my head in the high pitched voice of an amazed Stewie Griffin (Family Guy): hmm well … that’s cute. So we have a system emulating humans emulating a formal system of which they don’t know the rules. Formalised social tagging? Tagging on the basis of a set of rules? Hmm…? a thesaurus perhaps? Or a taxonomy? Or a classification…? And to give it a certain degree of human sloppyness it is mapped against the wikipedia.
There’s a strange elliptic movement here, but very interesting nevertheless. Reuters has been one of the pioneers of social tagging well before it became the megahype it is now, so there is some logic in them pushing the boundaries of a methodology that they know to the bone.
I would like to see their semantic data extraction engine, complete with Social Tags, ploughing through a large field of cultural heritage documents. It might be a great application in conjunction with Europeana.
EDIT: speaking about ‘atomizing’ ‘our once centralized content’, Paul has atomized himself, his Semantic Web blog gives additional aspects of the story.