Allie's Kitchen: 2015

Tuesday, December 1, 2015

Article Response for Lecture 14 - Shirky

Shirky, C. (2005). Ontology is overrated: Categories, links, and tags. Blog post.

Shirky states that many of our strategies for attempting to categorize resources in a web environments are holdovers from a time when different categorization strategies made sense and that our assumptions are outdated. She argues that hierarchical classification is extremely useful when it comes to small numbers of things to categorize, when those things have definitive markers, making them difficult to misclassify, and when both the creators of the hierarchy and its users are subject experts. She uses the examples of the periodic table and the psychology DSM as examples when hierarchical structure works well, but posits that as human knowledge continues to grow, especially as we look at the extreme growth of web-based knowledge, these hierarchical structures become less useful.

For one thing, the “aboutness” of a work, which she refers to as its “isness” or essence, isn’t a concrete concept, but is variable with context. A number of people may think of the same concept from a multitude of viewpoints and thus use a multitude of terminologies to refer to it. Additionally, if all users of the system are not experts on not only the subject, but the hierarchical scheme involved, it will prove difficult for them to find information in a large system. The burden of needing to not only read the minds of all potential searchers, but predict how they will continue to search in the future, is too much for catalogers to maintain in a large system.

Because of the broadness of web information, none of our current limited classification schemes are universal enough for the task. The author specifically demonstrates several biases inherent in all classification schemes, from Soviet over-classification of Communist literature, to preferential classifications for Christianity in Dewey’s scheme, to geographical preferential treatment given to Western thought in LC classification. These biases arise because we are not truly attempting to classify all knowledge, but rather to solve a concrete problem. These classification schemes are all designed to classify the book in hand, and organize the items in a collection. If the items in the collection have a bias toward Western thought, since we reside in an English-speaking country, then the classification system designed around them will necessarily develop such a bias. Bias in hierarchy is unavoidable.

Shirky argues that we have forgotten that there is no shelf for online resources, which is why when Yahoo initially began compiling internet pages, it created a hierarchical system, and assigned a “shelf” to each group of links in an antiquated fashion. Pages need not be limited to a single category of knowledge the way physical items are, and may be linked to from anywhere. When Google came along, it took a different approach and uses a post-coordinated collocation system when the user searches, rather than a hierarchical model. The author argues that this leads to greater success in a web-based environment.

The potential of non-hierarchical systems of organization, such as folksonomic user tagging is a lessening of binary thinking. A resource is not simply either one thing or another thing, nor is it an aspect of a thing within a broader category, but it can be multiple equally represented things at the same time. This crowd-sourcing form of information management is often effective, if also at times inelegant. It allows the user to decide what is important or relevant, and offers filtration only after publication, a complete reversal of the print publishing industry. The lack of controlled vocabulary allows users to maintain the nuances inherent in their terminology, rather than squeezing their concepts into over-arching categories which include tangential, or even unrelated subjects.

I personally find folksonomies, and user-generated classification fascinating because of the mathematics involved. A majority of users will tag something as what it is, and use various terms to do so. With a great enough volume of user tags, irrelevant subjects are edited out, or decreased in relevancy to the point that they do not influence the user perception of the subject. However, I would caution that ‘rule of the mob’ is not always fair or just, and it is possible to mobilize a large number of users to the detriment of a given link or subject. Online harassment makes this possibility quite clear. Additionally, when knew knowledge is presented it needs initial tags in order to gain legitimacy and categorization. New knowledge is a problem, when the idea is that the greater the number of taggers, the greater the accuracy. When a subject is new, it has few tags, which means decreased accuracy.

Tuesday, November 3, 2015

Article response for lecture 11 - Rafferty

Rafferty, P. (2001). The representation of knowledge in library classification schemes. Knowledge Organization, 28, 180-91

The main argument of this article is that power is no system of classification is without some level of bias. This is important because social power and structure are conveyed through classification schemes. The ways in which libraries categorize and classify knowledge mirrors the way society views that knowledge. Library classification schemes are rooted in the practical applications of how users search for and use knowledge. Controlled vocabularies and hierarchical structures attempt to optimize usability. However, the question remains, for whom are such systems optimized?

When a subject is defined as a main class, it becomes of primary importance, within which subclasses are secondary, and further divisions tertiary, etc. So, the structure itself is necessarily fraught with bias. They simultaneously dominate over a given piece of information, forcing it to conform to a given structure and organization into which it may not easily fit, and enable easier searchability and greater open access to varied ideas. Therefore, organizational schemes can both maintain and subvert a given paradigm, often at the same time.

Because classification schemes are built on other existing classification schemes, and prior knowledge, they are necessarily biased by what came before. For example, notational language and controlled vocabulary must, by necessity, be exclusionary. Those who don’t use the right terminology or correct notation are considered inferior. How we place and organize records is influenced and controlled by how we feel and think about them and their subjects. There is a necessary subjective bias.

Many organizational schemes were originally defined by religiosity, with God at the top of the hierarchy of classification, and the dominant religion of the culture considered the standard default in discussions of religion. Likewise, a schema which places academia at the center, and utilizes the organizational schemes derived from various disciplines of academia, has the advantage of taking into consideration the information organization that a majority of users will utilize. However, this presents the bias of illegitimizing sources external to academia, particularly Western academia, and promulgating a specific view of what is important and what is not, based on a particular world-view.

What is classified and what is not marks a boundary for what is important knowledge and what it not. This defines the self vs. other, and demarcates the boundary for what matters to society. Libraries as social institutions are completely involved in what information, and what sources, are legitimized. For example, fiction is often devalued, until it happens to sell enough copies, at which point it is accepted by academia as a social signifier. So, classification makes its way from libraries to realms of social thought and vice versa.

Friday, October 23, 2015

Article Response for Lecture 10 - Underwood

Underwood, T. (2014). Theorizing research practices we forgot to theorize twenty years ago. Representations 127, 64-72.

When we search large sets of data, we run into the problems of confirmation bias. That is, if we search for a question that we already believe we know the answer to, we are likely to find, whether or not it is correct. This is because given a large enough dataset, it is easy to find at least one example of any concept no matter how fallacious. The number of example that we need in order to prove a point depends on the size of the dataset, and few researchers know the size or scope of the datasets they search. Researchers come to the search process with preconceived notions of what is true, and what they need to find. They come with very specific questions, and want to find a particular answer that they intuitively believe is correct. Data mining of large sets of data is often a fishing expeditions to try to find the select few examples that confirm are already preset notions of what is true, particularly in the humanities and with the studies of linguistics.

Additionally, ranking by relevancy can filter out any information which disproves our preconceived notion. When we use search engines, especially full text ones, algorithms show us immediately what we search for, regardless of whether the subject is correct or not. More difficult than the issue of synonym exclusion, is the idea that every facet of data that does not conform to the language of our search, and thus our bias, is filtered down, so we are less likely to see any contradictory information.

Scholars of the humanities often search for certain keywords, using the distributional hypothesis. The distributional hypothesis implies that the “meaning of a word is related to its distribution across contexts.” This is like Wittgensteinian game theory, wherein meaning is determined by usage. While this approach has merit, seeing that a given word is associated X number of times with another given word, does nothing to inform the searcher about its usefulness without also knowing what other words may be associated with it more frequently, what context the associations are in, and how large the dataset is. These considerations are often omitted from scholarly research because of an over-reliance on search algorithms to ascertain value and truth.

An algorithm should not be trusted to automatically confer authority and relevancy, because algorithms are not simply blunt instruments, tools hammer datasets into shape, but come with their own inherent biases and limitations. Most algorithms, however, are proprietary, and thus not subject to public scrutiny of its mechanism, so we have no way to contextualize the search process, and provide meaning to datasets of associated search terms. Computer scientists are working to fix this by using topic modeling to more clearly define associations of terms into clusters, so that words can be associated with other words in given contexts. This process can reveal subjects and ideas we didn’t know to look for in our initial search, and contribute more effectively to scholarship, rather than simply confirming what we already thought was true.

Monday, October 19, 2015

Article Response for Lecture 9 - Rotenberg & Kushmerick

Rotenberg, E. & Kushmerick, A. (2011). The Author Challenge: Identification of the Self in the Scholarly Literature. Cataloging & Classification Quarterly 49(6), 503-520

This article began as an effective examination of the problems with attribution of scholarly scientific publications, and then submitted a given solution. The first half distinguished attribution as a necessity for allocation of government and grant funding, as well as for tenure decisions for individuals. However, names can be common, and individuals can often have similar names, making attribution tricky. Additionally, scientific scholarly output is increasing at a rapid pace, adding more common names to the jumble. Non-traditional forms of publication, such as web published pieces, and those in three-dimensional models instead of writing, proliferate the scientific landscape.

Several different international organizations are currently working on name disambiguation, in which authors themselves claim their work. The authors suppose that no one single company can cover all disambiguation in the world, so disambiguation must necessarily be a collaborative effort. The international entities linked to one another, with the authors supporting each entity in a pseudo-folksonomic fashion can create a web of disambiguation. The web particularly discussed was Web of Science and its particular disambiguation community ResearchID.

Web of Science uses an algorithm to collocate works by a single author. The difficulty mentioned with algorithmic disambiguation within the Web of Science search engine is that it collocates incorrectly whenever authors don’t stick to a strict subject matter, or when authors change names. ResearchID is an attempt to fix this difficulty. The feedback system was mentioned as a critical component to disambiguating correctly, because human users can disambiguate in such cases better than the algorithms, without the added cost in employee searching.

ResearchID offers identification numbers to each individual author, as well as citation metrics to allow authors to disambiguate themselves. It allows interactive maps of collaborators and citations to analyze an author’s geographic spread of knowledge. In many programs and communities, it has been implemented to help disambiguate authors from inventors and principle investigators. Instead of relying solely on the metadata attached to the article itself, it pulls author data from grant databases and other sources, and allows self-disambiguation

I find it significant that the authors do not see fit to mention NACO, or indeed any LC disambiguation, but only lend credence to disambiguation systems done by authors themselves rather than catalogers. While I agree with their assessment of the value of folksonomy-type disambiguation, I find it disingenuous not to at least mention a divergent way of doing things, and possible criticisms. The second half of the article seemed more and more like an advertisement for Thomson Reuters projects and products as I continued reading. While the authors seem to believe that further interoperability is the sole goal of future projects, I find it significant that no mention is made of author fraud. I would think that with a folksonomy-type system this would become an issue, or if it is not, is at least worth a mention.

Sunday, October 11, 2015

Article response for lecture 8 - Knowlton

Knowlton, S.A. (2005). Three decades since prejudices and antipathies: A study of changes in the Library of Congress Subject Headings. Cataloging & Classification Quarterly 40(2):123-45.

This article addresses biases inherent in subject cataloging, and assesses modern improvements, and how well the previous objections had been satisfied. It points out a philosophical balancing act between the stated goals of search optimization and universal bibliographic control. Subject categories are designed to enable ease of searching, and allow users to find resources by the most common term, or the term they are most likely to search by. However, whenever one presumes to imagine what a user will search under, or what the most common term might be, which allows personal biases and prejudices to play a role in cataloging. There is a danger of normalizing a single experience, and overwhelmingly the average viewpoint is white, male, heterosexual, and Christian. This bias runs the risk of stigmatizing any group that does not fall within those specific norms, and create subject headings which make resources harder to find for certain users.

Specifically, Sanford Berman published one of the first widely regarded critiques of bias in Library of Congress subject headings. Since its publication, many of the modifications suggested by Berman have been at least partially implemented. In the past several decades, terminology has changed, which to some degree necessitated different changes from those Berman suggested, accounting for some of the disparity between his recommendations and actual changes. Additionally, the vast majority of his recommendations for subject changes related to African-Americans and women have been implemented, perhaps indicative of the social climate and movements of the times between then and now.

One subject area which has remained stubborn is religion. Religious subject categories without qualification are assumed to be Christian. Thus, religious subheadings which relate to Christianity are not qualified as such. While some cases of this could be considered exclusionary toward other religions, I would argue that most of those listed are subjects particular to Christianity, and not subject to confusion. Obviously the term ‘God’ could be construed in many different religions, but other subjects such as ‘Virgin Birth’ is mythologically associated with Christianity, and not necessary to disambiguate. Such unnecessary disambiguations may account for some of those not addressed.

Other than religious subjects, I did notice that two other types of subjects were not addressed. Subjects under ‘poor’ and many of those regarding poverty and economic disparity was not disambiguated or made into less offensive categories. Perhaps this is because of the lack of emphasis on socio-economic disparity until very recently in the history of social justice. Likewise, several aberrant subject headings involving indigenous populations were not altered. Similarly, social justice issues in US culture have not emphasized international themes historically until very recently, and so many of these headings are likely still catching up to culture.

Ultimately, I think the alterations to LC subject headings since Berman’s original study have been fairly adequate, and have stayed abreast of modern social attitudes as well as can be expected for a complex cataloging structure. However, the age of the original study makes me wonder if there have been more modern ones reassessing LC subject headings to see what a more modern take on biases would reveal. If we are still considering a decades old study as a litmus strip for innovation, it’s unsurprising that LC subject headings pass the test. I think we could use a more modern litmus test.

Sunday, October 4, 2015

Article Response for Lecture 7 - Naun

Naun, C. C. (2008). Objectivity and Subject Access in the Print Library. Cataloging & Classification Quarterly, 43(2), 83-95.

This article is extremely dense with practical and philosophical points about the nature of objectivity, the advantages and disadvantages of subject access, and print as compared to electronic resources. It has been my favorite article so far for this class. The author starts with the ideology behind libraries. The reason libraries are a public good is because books and journals are expensive, and only become more so. This idea of giving every person access to information is a noble ideology, but “it is underwritten by logistics,” specifically those of the economic market. Objectivity is one of these core ideologies underwritten by logistics.

Objective subject representation depends on our social values, and hopefully our social values place the library as an open realm of discourse where all subjects are equal. Often, however, these ideals are not enough to obtain objectivity. “An attempt to capture what a document is about requires a frame of reference that may encompass a host of interests, assumptions, and values.” Ideally, subject categories should always reflect the most commonly used term. Oftentimes, however, subject categories are changed to less offensive terms, even if these are not the most used term.

This exception creates an environment of backhand censorship, where controversial subjects, which have a likelihood of causing offense, are often placed under hidden vocabulary, which most user don’t know the words to find, because they are not arriving at the subject with all the biases of the cataloger. Highly regulated subjects, can also be highly normalized. Things are shoved into preconceived boxes, until the boxes overflow to create new subject categories. After all, controlled vocabularies are exclusionary in nature, by choosing to use certain words over others. How can such a choice not contain bias?

Full text searching of electronic resources can remove subject and description interpretation and thus remove bias. However, natural language contains its own biases. In defense of print resources, subject classification can also remove bias. Competing views are normally shelved together, so the user has many options of viewpoints to look at. Correct indexing is by usage, not by preconception. However, librarians also are free to consider literary warrant in indexing, which is objectivity “in relationship to human discourse.” This seems rather flexible, and could be prone to misuse as well.

If indexing is done in the most objective way by how users search, which users are considered? This is another potential level of bias. Users must be visualized as a “potentially diverse community of users” in order to avoid bias, and it is logistically impossible to poll or visualize every type of possible user. Finally, the author gives us a single common-sense solution to these difficult questions of impartiality and objectivity. “Impartiality does not demand infallibility so much as vigilance.” In other words, it’s impossible to be completely objective every time, but watching our own biases and checking them as much as possible, while being prepared to correct mistakes, can get us much further than just strict implementation of established rules.

Thursday, September 24, 2015

Article Response for Lecture 6 - Emanuel

Emanuel, M. (2011). A fistful of headings: Name authority control for video recordings. Cataloging & Classification Quarterly, 49(6), 484-99.

This article enumerates some of the many difficulties in authorship, using examples from the seemingly incongruous MARC designations for spirit communications. Spirit mediums, around the mid-1800s, began purporting contact with potential authors, and indeed publishing books under their own names and the names of the spirits they said to have contacted. Such works lead to difficulties in attribution of responsibility. Is the medium responsible for the work, or is it the supposed author? If it is the purported author, how much of the work is collaborative, and thus complex in attribution?

Without initial guidance for these specific cases in the rulebook, many catalogers used the rules for interviews to categorize attribution of medium-interpreted works. The medium functioned as the interviewer, and the spirit as an interviewee, and attribution was based on the amount of intellectual content contributed by either party. This led to works being attributed to Samuel Clemens (Mark Twain) after his death, with only a side note for the contribution of the medium. Copyright protections, then, became a major issue. Some effort was made to classify spirits as supposed or presumed authors, rather than official authors, but such rules didn’t apply to interviews, so inconsistencies abounded.

Revisions of the cataloging rules to incorporate medium-written works cleared up some of the copyright issue, by only allowing the medium to be considered the main author, and the deceased to be a side note. However, difficulties still arose in authority control, since the name-data depicting Mark Twain’s ghost was the same as the name-data depicting Mark Twain, and the changes in rules mid-stream, led to some conflicting entries, as older entries were written with different attributions. Is the supposed spirit of Mark Twain really Mark Twain?

Eventually three distinct systems emerged for categorizing authorship of spirit communications.

Spirit communications in which the spirits were established historical figures–like Parker and Hale–entailed main entry under the medium and added entry under the historical figure. Spirit communications in which the spirits were prolific, well-known, but not of proven historical existence–like Worth–entailed main entry under the spirit, with qualifier added to the heading. Spirit communications in which the spirit was either not prolific–like Grayland and Pheneas–of unknown origin–like Ka-Ra-Om–or in debate–like Twain–entailed entry only for the medium.

When Lubetzky stepped in to simplify and codify the rules, he standardized and enumerated the concept of spirit attribution. A work “attributed to the spirit of another person, is entered under the person who prepared it, with an added entry under the person to whom it is attributed.” This speaks directly to social construction, as during the time when attribution to the spirit itself was prevalent, was a time in which spiritualism was popular and accepted as truth, whereas in Lubetzky’s time it had fallen out of favor.

However, the definition of spirit communication as an interviewer and interviewee relationship persisted, with the caveat that the spirit was a supposed author, and not simply attributed as author until the AACR2, which gave spirit communication its own subheading, with primacy given to the spirit itself. This is because of a shift in philosophy, toward neutrality in cataloging. Catalogers were instructed not to judge topical issues, like whether the ghost of Mark Twain really is Mark Twain, but simply perform attribution, and leave the scholars to decide. The catalog is, after all a search method, and if one needs the spirit writings of Mark Twain, one is likely to search under Mark Twain.

There is a distinct tug of war here, between the objectives of collocating everything by a given author, in which case spirit communication can be misattribution, and the defining of authorship at face value, without bias. To avoid misattribution, currently the qualifier of (Spirit) is added to main entry names of spirits, to distinguish them from their living versions, distinguishing between biographical personhood, and bibliographical personhood, like a surrogate record for authorship.

Monday, September 21, 2015

Article Response for Lecture #5 – Schottlaender

Schottlaender, B.E.C. (2003). Why metadata? Why me? Why now?. Cataloging & Classification Quarterly 36(3/4):19-29.

This article reviewed different types of metadata, and the various metadata standards of various divergent communities of data and data compilers that contribute to a confusing multiplicity of metadata standards. This complexity is demonstrated by the three different types of metadata standards that can often be interrelated on a single document, for nearly unlimited combinations of forms. In addition to the ‘normal’ schema of metadata – that is, how the data is stored – there is an additional schema as to how that metadata is encoded and displayed. Beyond those two there are other standards for the architectural schema of metadata, which seems to deal more with collocation and organization. He emphasized the mutability of e-documents, their natural flexibility requires more stringent metadata standards to tie them down so they aren’t lost to the ether.

He also identified two distinct types of organization schemes for metadata: syntactical and semantic. Most organizational schemes in the metadata communities are primarily focused on syntax, which is how the information is laid out, which information is included, and how those things are tagged and coded. Semantic considerations involve how the actual words and data are spelled out in terms of shared vocabulary, spelling, and punctuation. Most schemes rely on librarians to create semantic standards, or simply don’t have any, utilizing AACR standards and the Library of Congress subject classification to fill in gaps.

However, many in the metadata community are starting to come around to the idea that semantics really are important in terms of searchability, and acknowledging the work and expertise of libraries. One prominent member of the metadata community stated that the Dublin Core schema needed to align more with the logic and underlying structure of FRBR, and needed to look more like library cataloging. My one note of concern, however, is that sometimes fresh eyes see a problem in a clearer way than those of experience, and librarians should be careful not to stifle innovative philosophy under the weight of decades of experience.

I was particularly interested in the author’s description of architectural schema. Of the many articles we’ve read for class, I’ve seen many competing types of metadata standards, all of which are mutually exclusive in terms of collocation. The complexities of this system demonstrate to me how difficult it can be to find data and resources, and how this affects the dispersal of information. I can’t search every database in the world for the correct information when their metadata schemes don’t work together. Interoperability seems like it should be a priority for both metadata and library communities. Throughout the readings, I’ve been thinking, ‘someone needs to come up with a system to read and interpret all the different types of information storage, so that it can all be made jointly searchable.’

Architectural schema seem to be the solution to this. The author mentioned the “Storage Resources Broker which is predicated on the Warwick Framework,” a type of architectural schema. “It is a software suite that allows one to pull a variety of digital objects into a container architecture that can handle basically any kind of metadata” (24). That sounds particularly exciting for the possibilities of multiple system collocation and information retrieval.

Monday, September 14, 2015

Article Response for Lecture #4 - Creider

Creider, L.S. (2006). Cataloging, reception, and the boundaries of a "work." Cataloging & Classification Quarterly 42(2):3-19.

In Creider’s “Cataloging, reception and the boundaries of a “work,”” the author gives several examples of complexities arising from the definition and distinguishment of one work from another. Complications get in the way of defining any given work by a single criteria. For example, editors and publishers often make alterations to an author’s work before its even published, so author intentionality isn’t a basis on which to define a work.

Translations are often considered different expressions of the same work. However, many translations, and translations of translations, wind up altering the text through sloppy translations, or just through the difficulty in transliterating concepts between languages. The concept of the “work” in the translator’s mind is what ultimately determines what is translated, which may be entirely different from the “work” in the author’s mind, or for that matter in any reader’s mind.

Every reader has a different conception of the work, even the cataloger. This conception can depend on how thoroughly one reads the work, and what surrounding knowledge and scholarship the reader knows about the work, as well as preconceptions from what access point the reader arrived at the work from. Hearing about the work already predisposes a reader to a certain viewpoint. This is significant for the cataloger because one can’t expect a cataloger to do a full scholarship study on the work in front of them in order to categorize it, so social factors play an even bigger role in how to define a work.

When a single “work” has several sections, all written at different times by different authors, with different, sometimes contradicting knowledge and viewpoints, it seems to be a collection of several works. But often, especially in terms of ecclesiastical literature, they are considered the same work because of the same point of origin. Writings which are nothing alike can be considered sections of a single “work.”

There seems to be no single defining feature in what distinguishes one work from another. It reminds me of Wittgenstein’s game theory, wherein we don’t define a game, or any word really, by a single definition, but rather by association and social construct. Works are similar in this respect because they can’t be defined by a single factor, or even a set of factor, and are defined more by association, or feeling, of the person doing the categorization.

The solution posited by the author, is that catalogers must be willing to change their definitions later on as research and scholarship continue to evolve and study the connections between similar works, or similar expressions and manifestations of those works. I approve that sentiment, but also add another caveat in terms of usage.

I would add that subtle differences that define whether a writing is a distinct work or a different expression of the same work involve in-depth scholarly analysis. A person doing that needs to find and study disparate versions of a work for which the line between expression and separate work is blurry and uncertain is likely a scholar, and will certainly search for different expressions of a work, as well as for different works of scholarship associated with the work, and thus is likely to find another version of a work. If we tag keywords and subjects that associate two distinct works, for which the line is blurry, a scholar is likely to find those works. So, I would say that it is safer to err on the side of considering a writing a new work rather than an expression of the same work, because the majority of people searching for works with such subtle distinctions will be able to find what they need, and the casual reader will not be so inundated with various versions, as different expressions. I would say that approach would serve more users more effectively.

Monday, September 7, 2015

Article Summary for Lecture #3 - Galeffi

Biographical and Cataloging Common Ground: Panizzi and Lubetzky, kindred spirits seperated by a century.

Panizzi and Lubetzky are two librarians both extremely influential in the creation of modern models for effective cataloging. Many of Panizzi's original precepts carried over into Lubetzky's work, a full century later. However, Lubetzky applied that work in new and innovative ways to the modern systems and trends Panizzi could have never foreseen, and his work was not purely derivative. Additionally, both men had similar rocky backgrounds as refugees and poor immigrants, who attained their position almost by accident, and by being brilliant and getting recommended for positions of importance.

The majority of Panizzi's work involved standardizing cataloging processes for a single large library, so many of his rules focus on that precept. He had to create very utilitarian systems. Lubetzky, on the other hand, was more focused on the underlying framework of cataloging, and in making the systems universally applicable to an array of libraries. He was focused more on the goals for cataloging that mostly remained implicit within Panizzi's rules.

Lubetzky focused far more attention to breaking down bibliographic descriptions into atoms, so that the bibliography could be more easily searchable, and did far more work with access points, whereas Panizzi had a mess to clean up, and so he concentrated more on standardizing general cataloging practices, and placing items in a given order and structure so that the multiple catalogers were each following the same system. The goals he followed, to make catalogs more easy to use and items easier to locate, were implicit in this standardization, whereas for Lubetzky, the rules needed to be explicit and foundational, because the goals, and ways of reaching them became more and more complex.

One main difference between the two was that Panizzi almost universally respected the authority of the title page. Lubetzky complicated matters in that regard, by pointing out how printing errors, pseudonyms, changes in naming, and other factors can make the title page out of date, and make searching more difficult for the user. He was an advocate of updating and modernizing beyond the simple title page. Partly this is because of all the work Lubetzky did with defining authorship, and making explicit the implicit rules of Panizzi’s work.

Lubetzky was a big fan or Panizzi, and drew on his work to conceptualize the idea of corporate authorship, as well as making the basic FRBR principles found in Panizzi’s work explicit. They both hated the idea of guesswork on the part of the user, and wanted to make as much available, as easily as possible. Panizzi sometimes used the standard convention of separating out dictionaries and encyclopedias into separate entries for easier searching and access. Lubetzky acknowledge the occasional convenience of this, but ultimately argued that the standard scheme should apply to all books, so that users aren’t left guessing where to look up a certain book.

In my opinion, the convention of the reference section could be an exact corollary of the convenience of separate cataloging, but the impracticality of creating a different system for a different type of resource. Items in reference, such as dictionaries and encyclopedias, would have been in Panizzi’s separate catalog, and they are not cataloged separately, as Lubetzky advocated, but simply located separately for easier reference.

Overall, I empathize strongly with Panizzi’s practical, utilitarian systems and the need for them. Lubetzky’s criticisms, in terms of respecting more modern name changes in authorship etc. are valid, especially in terms of card catalogs, but the universality and ease of Panizzi’s systems seem to outweigh the concerns of authorship confusion. I think in a modern setting, where author pseudonyms can be linked via search engines and made just as searchable as an author’s real name, Panizzi’s system seems more favorable. I agree with the article’s sentiment that Panizzi and Lubetzky were contemporaries separated by a century, mostly because I found Panizzi’s work to be so far ahead of his time.

Saturday, August 29, 2015

Reflection on the Principle of Least Effort

The Principle of Least Effort listed evidence that people only set moderate goals for information retrieval, and "satisfice," or are satisfied with not even reaching those moderate goals, but consider lesser amounts of information "good enough." It argues effectively that as librarians, we have a responsibility to create an environment that allows the most quantitative and qualitative information retrieval for the least amount of work. The conclusion of the Bibliographic Objectives chapter restates this point. It emphasizes the need for a full-featured bibliographic system as a necessary adaption to modern needs of information retrieval, just like card catalogues originally made libraries searchable.

It's interesting to me that even art historians, a scholarly discipline which I would think would be more prone to in-depth searching and obtaining accuracy and depth of information, still fall victim to the principle of least effort. They still satisfice with less information. If even art historians don't research up to our high standards, how can we expect any non-librarian to

"The cost to the user of going beyond his immediate environment may outweigh the cost of using sources that are judged inferior by other people." I think this speaks not necessarily just to physical environment, but even more to comfort level, as the studies cited make repeated reference to "perceived" ease of use rather than just ease of use. Some sources are difficult to judge. If we google information, and read only the first article, our information may be a joke, an anomaly, a loud-spoken but ultimately incorrect or unvetted opinion. It could be anything. Similarly, people don't necessarily know which library resources and tools are liable to return the most relevant and most vetted results on a given topic. And, having found one that works once, it's hard to convince someone to roll the dice again. Sometimes we have a "system that makes some channels easy and others difficult to use - or difficult to perceive at all." Some research methods are difficult to find and utilize, and so, aren't considered as options.

The google search window display shows one trick that makes searching easier. When I search for a business, it doesn't just take me to the business's website directly, it accumulates and presents relevant information in a predictable way. I get the business's address, phone number, hours of operation, a map to the business, in many cases a description and reviews of the business, and a link to the website for more information, all organized so that the information I'm most likely to need is near the top. That's why people often use google instead of library resources. It's not just portability or usability, it's collocation and presentation. Obviously that doesn't work with all information, but if we could script our search engines to refine relevancy, people would have to put in less effort to find the depth of research necessary and we'd have more users and more in-depth research in general.

In Bibliographic Objectives, one paragraph still has me completely befuddled:

Also in breaking with tradition, the first IFLA objective does not specify the sets of entities to be found but relegates this task to an n accompanying entity-attribute-relationship model. This is problematic from a database design point of view. In the design of a database objectives should determine ontology and not vice versa, since for any given set of objectives, alternative models can be developed for alternative purposes. Moreover, a statement of objectives should embody a hypostatization of user needs. It should state just what it is that users need to find.

After some research I understand the concept of entity-attribute-relationship models. This refers to a type of database model wherein an entity, say a book, is coded with the attribute, say its title, by a relationship, i.e. this is the title. The article claims that a lack of specificity in the IFLA objective is a bad thing and goes on to list specific values (title, author, etc.) I don't see why this is necessary in a definition of objectives, or how it (according to the section above) contradicts alternative models and purposes and the idea of objectives determining ontology. I would think that the lack of specificity does the opposite and places more value and primacy on the objective. We know that searching by title and author are common objectives but I don't think limiting search methods and parameters by over-specifying presumed user needs does us any good, especially in terms of defining principles and general objectives.

Also, the use of the word hypostatization is used in a confusing way for me. As far as I'm aware, it refers to a fallacy of making a vague concept concrete, leading to erroneous or false assumptions. While I believe the over-specification may be committing that particular error, it's presented as a positive user outcome, so perhaps I am misunderstanding the term in this context.

The Invisible Substrate of Information Science ties all these concepts together. It is our underlying objectives and principles: our library culture. I would argue that it is also our misconceptions and biases, such as the notion of teaching people to search more in depth and do the difficult research, versus presenting information in an easily retrievable format. We like to research and organize information, and perceive our own value and that of others by the standard of how well they do it and how much work they're willing to put in to gather the right data. However, that bias can lead us to labeling scholarship as lazy, or even people as lazy, because their particular passion or field does not necessitate or lend itself to the kind of in-depth research we're judging them on the basis of.

Thus, I think the Principle of Least Effort is absolutely a necessary tool to curb the impulse to assign value to people's research methods. As Bibliographic Objectives states, our job is to make scholarship easier, to encourage more effective and efficient scholarship, rather than requiring a specialized scholar to learn an entirely new field: that of information retrieval.

Allie's Kitchen