Monday, October 19, 2015

Article Response for Lecture 9 - Rotenberg & Kushmerick

Rotenberg, E. & Kushmerick, A. (2011). The Author Challenge: Identification of the Self in the Scholarly Literature. Cataloging & Classification Quarterly 49(6), 503-520
                This article began as an effective examination of the problems with attribution of scholarly scientific publications, and then submitted a given solution. The first half distinguished attribution as a necessity for allocation of government and grant funding, as well as for tenure decisions for individuals. However, names can be common, and individuals can often have similar names, making attribution tricky. Additionally, scientific scholarly output is increasing at a rapid pace, adding more common names to the jumble. Non-traditional forms of publication, such as web published pieces, and those in three-dimensional models instead of writing, proliferate the scientific landscape.
                Several different international organizations are currently working on name disambiguation, in which authors themselves claim their work. The authors suppose that no one single company can cover all disambiguation in the world, so disambiguation must necessarily be a collaborative effort. The international entities linked to one another, with the authors supporting each entity in a pseudo-folksonomic fashion can create a web of disambiguation. The web particularly discussed was Web of Science and its particular disambiguation community ResearchID.
                Web of Science uses an algorithm to collocate works by a single author. The difficulty mentioned with algorithmic disambiguation within the Web of Science search engine is that it collocates incorrectly whenever authors don’t stick to a strict subject matter, or when authors change names. ResearchID is an attempt to fix this difficulty. The feedback system was mentioned as a critical component to disambiguating correctly, because human users can disambiguate in such cases better than the algorithms, without the added cost in employee searching.
                ResearchID offers identification numbers to each individual author, as well as citation metrics to allow authors to disambiguate themselves. It allows interactive maps of collaborators and citations to analyze an author’s geographic spread of knowledge. In many programs and communities, it has been implemented to help disambiguate authors from inventors and principle investigators. Instead of relying solely on the metadata attached to the article itself, it pulls author data from grant databases and other sources, and allows self-disambiguation

I find it significant that the authors do not see fit to mention NACO, or indeed any LC disambiguation, but only lend credence to disambiguation systems done by authors themselves rather than catalogers. While I agree with their assessment of the value of folksonomy-type disambiguation, I find it disingenuous not to at least mention a divergent way of doing things, and possible criticisms.  The second half of the article seemed more and more like an advertisement for Thomson Reuters projects and products as I continued reading. While the authors seem to believe that further interoperability is the sole goal of future projects, I find it significant that no mention is made of author fraud. I would think that with a folksonomy-type system this would become an issue, or if it is not, is at least worth a mention.

Sunday, October 11, 2015

Article response for lecture 8 - Knowlton

Knowlton, S.A. (2005). Three decades since prejudices and antipathies: A study of changes in the Library of Congress Subject Headings. Cataloging & Classification Quarterly 40(2):123-45.

            This article addresses biases inherent in subject cataloging, and assesses modern improvements, and how well the previous objections had been satisfied. It points out a philosophical balancing act between the stated goals of search optimization and universal bibliographic control. Subject categories are designed to enable ease of searching, and allow users to find resources by the most common term, or the term they are most likely to search by. However, whenever one presumes to imagine what a user will search under, or what the most common term might be, which allows personal biases and prejudices to play a role in cataloging. There is a danger of normalizing a single experience, and overwhelmingly the average viewpoint is white, male, heterosexual, and Christian. This bias runs the risk of stigmatizing any group that does not fall within those specific norms, and create subject headings which make resources harder to find for certain users.
            Specifically, Sanford Berman published one of the first widely regarded critiques of bias in Library of Congress subject headings. Since its publication, many of the modifications suggested by Berman have been at least partially implemented. In the past several decades, terminology has changed, which to some degree necessitated different changes from those Berman suggested, accounting for some of the disparity between his recommendations and actual changes. Additionally, the vast majority of his recommendations for subject changes related to African-Americans and women have been implemented, perhaps indicative of the social climate and movements of the times between then and now.
            One subject area which has remained stubborn is religion. Religious subject categories without qualification are assumed to be Christian. Thus, religious subheadings which relate to Christianity are not qualified as such. While some cases of this could be considered exclusionary toward other religions, I would argue that most of those listed are subjects particular to Christianity, and not subject to confusion.  Obviously the term ‘God’ could be construed in many different religions, but other subjects such as ‘Virgin Birth’ is mythologically associated with Christianity, and not necessary to disambiguate. Such unnecessary disambiguations may account for some of those not addressed.
            Other than religious subjects, I did notice that two other types of subjects were not addressed. Subjects under ‘poor’ and many of those regarding poverty and economic disparity was not disambiguated or made into less offensive categories. Perhaps this is because of the lack of emphasis on socio-economic disparity until very recently in the history of social justice. Likewise, several aberrant subject headings involving indigenous populations were not altered. Similarly, social justice issues in US culture have not emphasized international themes historically until very recently, and so many of these headings are likely still catching up to culture.

            Ultimately, I think the alterations to LC subject headings since Berman’s original study have been fairly adequate, and have stayed abreast of modern social attitudes as well as can be expected for a complex cataloging structure. However, the age of the original study makes me wonder if there have been more modern ones reassessing LC subject headings to see what a more modern take on biases would reveal. If we are still considering a decades old study as a litmus strip for innovation, it’s unsurprising that LC subject headings pass the test.  I think we could use a more modern litmus test.

Sunday, October 4, 2015

Article Response for Lecture 7 - Naun

Naun, C. C. (2008). Objectivity and Subject Access in the Print Library. Cataloging & Classification Quarterly, 43(2), 83-95.
                This article is extremely dense with practical and philosophical points about the nature of objectivity, the advantages and disadvantages of subject access, and print as compared to electronic resources. It has been my favorite article so far for this class. The author starts with the ideology behind libraries.  The reason libraries are a public good is because books and journals are expensive, and only become more so. This idea of giving every person access to information is a noble ideology, but “it is underwritten by logistics,” specifically those of the economic market. Objectivity is one of these core ideologies underwritten by logistics.
                Objective subject representation depends on our social values, and hopefully our social values place the library as an open realm of discourse where all subjects are equal. Often, however, these ideals are not enough to obtain objectivity. “An attempt to capture what a document is about requires a frame of reference that may encompass a host of interests, assumptions, and values.” Ideally, subject categories should always reflect the most commonly used term. Oftentimes, however, subject categories are changed to less offensive terms, even if these are not the most used term.
                This exception creates an environment of backhand censorship, where controversial subjects, which have a likelihood of causing offense, are often placed under hidden vocabulary, which most user don’t know the words to find, because they are not arriving at the subject with all the biases of the cataloger. Highly regulated subjects, can also be highly normalized. Things are shoved into preconceived boxes, until the boxes overflow to create new subject categories. After all, controlled vocabularies are exclusionary in nature, by choosing to use certain words over others.  How can such a choice not contain bias?
                Full text searching of electronic resources can remove subject and description interpretation and thus remove bias. However, natural language contains its own biases. In defense of print resources, subject classification can also remove bias. Competing views are normally shelved together, so the user has many options of viewpoints to look at. Correct indexing is by usage, not by preconception. However, librarians also are free to consider literary warrant in indexing, which is objectivity “in relationship to human discourse.” This seems rather flexible, and could be prone to misuse as well.

                If indexing is done in the most objective way by how users search, which users are considered? This is another potential level of bias. Users must be visualized as a “potentially diverse community of users” in order to avoid bias, and it is logistically impossible to poll or visualize every type of possible user. Finally, the author gives us a single common-sense solution to these difficult questions of impartiality and objectivity. “Impartiality does not demand infallibility so much as vigilance.” In other words, it’s impossible to be completely objective every time, but watching our own biases and checking them as much as possible, while being prepared to correct mistakes, can get us much further than just strict implementation of established rules.

Thursday, September 24, 2015

Article Response for Lecture 6 - Emanuel

Emanuel, M. (2011). A fistful of headings: Name authority control for video recordings. Cataloging & Classification Quarterly, 49(6), 484-99.

This article enumerates some of the many difficulties in authorship, using examples from the seemingly incongruous MARC designations for spirit communications. Spirit mediums, around the mid-1800s, began purporting contact with potential authors, and indeed publishing books under their own names and the names of the spirits they said to have contacted.  Such works lead to difficulties in attribution of responsibility.  Is the medium responsible for the work, or is it the supposed author? If it is the purported author, how much of the work is collaborative, and thus complex in attribution?
            Without initial guidance for these specific cases in the rulebook, many catalogers used the rules for interviews to categorize attribution of medium-interpreted works.  The medium functioned as the interviewer, and the spirit as an interviewee, and attribution was based on the amount of intellectual content contributed by either party. This led to works being attributed to Samuel Clemens (Mark Twain) after his death, with only a side note for the contribution of the medium. Copyright protections, then, became a major issue. Some effort was made to classify spirits as supposed or presumed authors, rather than official authors, but such rules didn’t apply to interviews, so inconsistencies abounded.
            Revisions of the cataloging rules to incorporate medium-written works cleared up some of the copyright issue, by only allowing the medium to be considered the main author, and the deceased to be a side note. However, difficulties still arose in authority control, since the name-data depicting Mark Twain’s ghost was the same as the name-data depicting Mark Twain, and the changes in rules mid-stream, led to some conflicting entries, as older entries were written with different attributions. Is the supposed spirit of Mark Twain really Mark Twain?
            Eventually three distinct systems emerged for categorizing authorship of spirit communications.
Spirit communications in which the spirits were established historical figures–like Parker and Hale–entailed main entry under the medium and added entry under the historical figure. Spirit communications in which the spirits were prolific, well-known, but not of proven historical existence–like Worth–entailed main entry under the spirit, with qualifier added to the heading. Spirit communications in which the spirit was either not prolific–like Grayland and Pheneas–of unknown origin–like Ka-Ra-Om–or in debate–like Twain–entailed entry only for the medium.
            When Lubetzky stepped in to simplify and codify the rules, he standardized and enumerated the concept of spirit attribution. A work “attributed to the spirit of another person, is entered under the person who prepared it, with an added entry under the person to whom it is attributed.” This speaks directly to social construction, as during the time when attribution to the spirit itself was prevalent, was a time in which spiritualism was popular and accepted as truth, whereas in Lubetzky’s time it had fallen out of favor.
            However, the definition of spirit communication as an interviewer and interviewee relationship persisted, with the caveat that the spirit was a supposed author, and not simply attributed as author until the AACR2, which gave spirit communication its own subheading, with primacy given to the spirit itself. This is because of a shift in philosophy, toward neutrality in cataloging.  Catalogers were instructed not to judge topical issues, like whether the ghost of Mark Twain really is Mark Twain, but simply perform attribution, and leave the scholars to decide.  The catalog is, after all a search method, and if one needs the spirit writings of Mark Twain, one is likely to search under Mark Twain.

            There is a distinct tug of war here, between the objectives of collocating everything by a given author, in which case spirit communication can be misattribution, and the defining of authorship at face value, without bias. To avoid misattribution, currently the qualifier of (Spirit) is added to main entry names of spirits, to distinguish them from their living versions, distinguishing between biographical personhood, and bibliographical personhood, like a surrogate record for authorship.

Monday, September 21, 2015

Article Response for Lecture #5 – Schottlaender

Schottlaender, B.E.C. (2003). Why metadata? Why me? Why now?. Cataloging & Classification Quarterly 36(3/4):19-29.
This article reviewed different types of metadata, and the various metadata standards of various divergent communities of data and data compilers that contribute to a confusing multiplicity of metadata standards. This complexity is demonstrated by the three different types of metadata standards that can often be interrelated on a single document, for nearly unlimited combinations of forms. In addition to the ‘normal’ schema of metadata – that is, how the data is stored – there is an additional schema as to how that metadata is encoded and displayed. Beyond those two there are other standards for the architectural schema of metadata, which seems to deal more with collocation and organization. He emphasized the mutability of e-documents, their natural flexibility requires more stringent metadata standards to tie them down so they aren’t lost to the ether.
He also identified two distinct types of organization schemes for metadata: syntactical and semantic. Most organizational schemes in the metadata communities are primarily focused on syntax, which is how the information is laid out, which information is included, and how those things are tagged and coded. Semantic considerations involve how the actual words and data are spelled out in terms of shared vocabulary, spelling, and punctuation. Most schemes rely on librarians to create semantic standards, or simply don’t have any, utilizing AACR standards and the Library of Congress subject classification to fill in gaps. 
However, many in the metadata community are starting to come around to the idea that semantics really are important in terms of searchability, and acknowledging the work and expertise of libraries. One prominent member of the metadata community stated that the Dublin Core schema needed to align more with the logic and underlying structure of FRBR, and needed to look more like library cataloging. My one note of concern, however, is that sometimes fresh eyes see a problem in a clearer way than those of experience, and librarians should be careful not to stifle innovative philosophy under the weight of decades of experience.
            I was particularly interested in the author’s description of architectural schema. Of the many articles we’ve read for class, I’ve seen many competing types of metadata standards, all of which are mutually exclusive in terms of collocation. The complexities of this system demonstrate to me how difficult it can be to find data and resources, and how this affects the dispersal of information. I can’t search every database in the world for the correct information when their metadata schemes don’t work together. Interoperability seems like it should be a priority for both metadata and library communities. Throughout the readings, I’ve been thinking, ‘someone needs to come up with a system to read and interpret all the different types of information storage, so that it can all be made jointly searchable.’

            Architectural schema seem to be the solution to this. The author mentioned the “Storage Resources Broker which is predicated on the Warwick Framework,” a type of architectural schema. “It is a software suite that allows one to pull a variety of digital objects into a container architecture that can handle basically any kind of metadata” (24). That sounds particularly exciting for the possibilities of multiple system collocation and information retrieval.

Monday, September 14, 2015

Article Response for Lecture #4 - Creider

 Creider, L.S. (2006). Cataloging, reception, and the boundaries of a "work." Cataloging & Classification Quarterly 42(2):3-19.

In Creider’s “Cataloging, reception and the boundaries of a “work,”” the author gives several examples of complexities arising from the definition and distinguishment of one work from another.  Complications get in the way of defining any given work by a single criteria.  For example, editors and publishers often make alterations to an author’s work before its even published, so author intentionality isn’t a basis on which to define a work. 
Translations are often considered different expressions of the same work. However, many translations, and translations of translations, wind up altering the text through sloppy translations, or just through the difficulty in transliterating concepts between languages.  The concept of the “work” in the translator’s mind is what ultimately determines what is translated, which may be entirely different from the “work” in the author’s mind, or for that matter in any reader’s mind. 
Every reader has a different conception of the work, even the cataloger.  This conception can depend on how thoroughly one reads the work, and what surrounding knowledge and scholarship the reader knows about the work, as well as preconceptions from what access point the reader arrived at the work from.  Hearing about the work already predisposes a reader to a certain viewpoint.  This is significant for the cataloger because one can’t expect a cataloger to do a full scholarship study on the work in front of them in order to categorize it, so social factors play an even bigger role in how to define a work.
When a single “work” has several sections, all written at different times by different authors, with different, sometimes contradicting knowledge and viewpoints, it seems to be a collection of several works.  But often, especially in terms of ecclesiastical literature, they are considered the same work because of the same point of origin.  Writings which are nothing alike can be considered sections of a single “work.”
There seems to be no single defining feature in what distinguishes one work from another.  It reminds me of Wittgenstein’s game theory, wherein we don’t define a game, or any word really, by a single definition, but rather by association and social construct.  Works are similar in this respect because they can’t be defined by a single factor, or even a set of factor, and are defined more by association, or feeling, of the person doing the categorization.
The solution posited by the author, is that catalogers must be willing to change their definitions later on as research and scholarship continue to evolve and study the connections between similar works, or similar expressions and manifestations of those works.  I approve that sentiment, but also add another caveat in terms of usage. 

I would add that subtle differences that define whether a writing is a distinct work or a different expression of the same work involve in-depth scholarly analysis.  A person doing that needs to find and study disparate versions of a work for which the line between expression and separate work is blurry and uncertain is likely a scholar, and will certainly search for different expressions of a work, as well as for different works of scholarship associated with the work, and thus is likely to find another version of a work.  If we tag keywords and subjects that associate two distinct works, for which the line is blurry, a scholar is likely to find those works.  So, I would say that it is safer to err on the side of considering a writing a new work rather than an expression of the same work, because the majority of people searching for works with such subtle distinctions will be able to find what they need, and the casual reader will not be so inundated with various versions, as different expressions.  I would say that approach would serve more users more effectively.

Monday, September 7, 2015

Article Summary for Lecture #3 - Galeffi

Biographical and Cataloging Common Ground: Panizzi and Lubetzky, kindred spirits seperated by a century.

Panizzi and Lubetzky are two librarians both extremely influential in the creation of modern models for effective cataloging. Many of Panizzi's original precepts carried over into Lubetzky's work, a full century later.  However, Lubetzky applied that work in new and innovative ways to the modern systems and trends Panizzi could have never foreseen, and his work was not purely derivative.  Additionally, both men had similar rocky backgrounds as refugees and poor immigrants, who attained their position almost by accident, and by being brilliant and getting recommended for positions of importance.
The majority of Panizzi's work involved standardizing cataloging processes for a single large library, so many of his rules focus on that precept.  He had to create very utilitarian systems.  Lubetzky, on the other hand, was more focused on the underlying framework of cataloging, and in making the systems universally applicable to an array of libraries.  He was focused more on the goals for cataloging that mostly remained implicit within Panizzi's rules.
Lubetzky focused far more attention to breaking down bibliographic descriptions into atoms, so that the bibliography could be more easily searchable, and did far more work with access points, whereas Panizzi had a mess to clean up, and so he concentrated more on standardizing general cataloging practices, and placing items in a given order and structure so that the multiple catalogers were each following the same system.  The goals he followed, to make catalogs more easy to use and items easier to locate, were implicit in this standardization, whereas for Lubetzky, the rules needed to be explicit and foundational, because the goals, and ways of reaching them became more and more complex.
One main difference between the two was that Panizzi almost universally respected the authority of the title page.  Lubetzky complicated matters in that regard, by pointing out how printing errors, pseudonyms, changes in naming, and other factors can make the title page out of date, and make searching more difficult for the user.  He was an advocate of updating and modernizing beyond the simple title page.  Partly this is because of all the work Lubetzky did with defining authorship, and making explicit the implicit rules of Panizzi’s work.
            Lubetzky was a big fan or Panizzi, and drew on his work to conceptualize the idea of corporate authorship, as well as making the basic FRBR principles found in Panizzi’s work explicit. They both hated the idea of guesswork on the part of the user, and wanted to make as much available, as easily as possible.  Panizzi sometimes used the standard convention of separating out dictionaries and encyclopedias into separate entries for easier searching and access.  Lubetzky acknowledge the occasional convenience of this, but ultimately argued that the standard scheme should apply to all books, so that users aren’t left guessing where to look up a certain book.
In my opinion, the convention of the reference section could be an exact corollary of the convenience of separate cataloging, but the impracticality of creating a different system for a different type of resource.  Items in reference, such as dictionaries and encyclopedias, would have been in Panizzi’s separate catalog, and they are not cataloged separately, as Lubetzky advocated, but simply located separately for easier reference.

Overall, I empathize strongly with Panizzi’s practical, utilitarian systems and the need for them.  Lubetzky’s criticisms, in terms of respecting more modern name changes in authorship etc. are valid, especially in terms of card catalogs, but the universality and ease of Panizzi’s systems seem to outweigh the concerns of authorship confusion.  I think in a modern setting, where author pseudonyms can be linked via search engines and made just as searchable as an author’s real name, Panizzi’s system seems more favorable.  I agree with the article’s sentiment that Panizzi and Lubetzky were contemporaries separated by a century, mostly because I found Panizzi’s work to be so far ahead of his time.