Shirky, C. (2005). Ontology
is overrated: Categories, links, and tags. Blog
post.
Shirky
states that many of our strategies for attempting to categorize resources in a
web environments are holdovers from a time when different categorization
strategies made sense and that our assumptions are outdated. She argues that
hierarchical classification is extremely useful when it comes to small numbers
of things to categorize, when those things have definitive markers, making them
difficult to misclassify, and when both the creators of the hierarchy and its
users are subject experts. She uses the examples of the periodic table and the
psychology DSM as examples when hierarchical structure works well, but posits
that as human knowledge continues to grow, especially as we look at the extreme
growth of web-based knowledge, these hierarchical structures become less
useful.
For
one thing, the “aboutness” of a work, which she refers to as its “isness” or
essence, isn’t a concrete concept, but is variable with context. A number of
people may think of the same concept from a multitude of viewpoints and thus use
a multitude of terminologies to refer to it. Additionally, if all users of the
system are not experts on not only the subject, but the hierarchical scheme involved,
it will prove difficult for them to find information in a large system. The
burden of needing to not only read the minds of all potential searchers, but
predict how they will continue to search in the future, is too much for
catalogers to maintain in a large system.
Because
of the broadness of web information, none of our current limited classification
schemes are universal enough for the task. The author specifically demonstrates
several biases inherent in all classification schemes, from Soviet
over-classification of Communist literature, to preferential classifications
for Christianity in Dewey’s scheme, to geographical preferential treatment
given to Western thought in LC classification. These biases arise because we
are not truly attempting to classify all knowledge, but rather to solve a
concrete problem. These classification schemes are all designed to classify the
book in hand, and organize the items in a collection. If the items in the
collection have a bias toward Western thought, since we reside in an
English-speaking country, then the classification system designed around them
will necessarily develop such a bias. Bias in hierarchy is unavoidable.
Shirky
argues that we have forgotten that there is no shelf for online resources,
which is why when Yahoo initially began compiling internet pages, it created a hierarchical
system, and assigned a “shelf” to each group of links in an antiquated fashion.
Pages need not be limited to a single category of knowledge the way physical
items are, and may be linked to from anywhere. When Google came along, it took
a different approach and uses a post-coordinated collocation system when the
user searches, rather than a hierarchical model. The author argues that this
leads to greater success in a web-based environment.
The
potential of non-hierarchical systems of organization, such as folksonomic user
tagging is a lessening of binary thinking. A resource is not simply either one
thing or another thing, nor is it an aspect of a thing within a broader
category, but it can be multiple equally represented things at the same time.
This crowd-sourcing form of information management is often effective, if also
at times inelegant. It allows the user to decide what is important or relevant,
and offers filtration only after publication, a complete reversal of the print
publishing industry. The lack of controlled vocabulary allows users to maintain
the nuances inherent in their terminology, rather than squeezing their concepts
into over-arching categories which include tangential, or even unrelated
subjects.
I
personally find folksonomies, and user-generated classification fascinating
because of the mathematics involved. A majority of users will tag something as
what it is, and use various terms to do so. With a great enough volume of user
tags, irrelevant subjects are edited out, or decreased in relevancy to the
point that they do not influence the user perception of the subject. However, I
would caution that ‘rule of the mob’ is not always fair or just, and it is
possible to mobilize a large number of users to the detriment of a given link
or subject. Online harassment makes this possibility quite clear. Additionally,
when knew knowledge is presented it needs initial tags in order to gain
legitimacy and categorization. New knowledge is a problem, when the idea is
that the greater the number of taggers, the greater the accuracy. When a
subject is new, it has few tags, which means decreased accuracy.