One of the websites I’ve worked on recently has a backend database of over 20,000 images with accompanying metadata in English and Welsh. Every image has been categorised with subject terms culled from five standard thesauruses, and each has been put into a three-tiered subject tree using terms that are deliberately more colloquial than the controlled vocabulary of the thesauruses. The subject tree is expressed bilingually, too.
Deriving relationships
This got me thinking I could use the subject tree and subject categories to create a bilingual subject ontology and to offer subject mappings and subject searches to other websites and organisations associated with this one.
It should be possible automatically to derive nested classes of our controlled vocabulary subject terms. The subject terms themselves are entered in a flat, comma-delimited list against each item, with no significance in their order. A careful trawl through the database, though, should be able to spot common occurrences of terms for items at the same point in the subject tree from which we will be able to derive a bilingual controlled, subject-term hierarchy. Once we have that we will be able to offer advanced searching possibilities like ‘broadening’ or ‘narrowing’ the search —- esentially, moving up and down the subject tree.
The biggest problem in getting this done, though, is describing to people who control project funds what it is I’m proposing. I don’t expect professional administrators or accounts to understand technical jargon like ontology but I’m at a bit of a loss to know how to talk about it without using jargon. In the past I̱′ve resorted to doing stuff anyway, under the guise of producing a pilot, because it’s the only common language: we can all then see what it does.