Monday, October 21, 2013

Controlled Vocabularies vs. User Added Description

Ever since my first encounters with word clouds and Library Thing, I have been skeptical of the value of tagging. The tags in the Beatley OPAC record for book I know very well really miss the mark, which immediately demonstrated to me how little the commenters understood the book. Moreover, I have never been able to forget the useless but enthusiastic “woo hoo” tag we found on the webpage for a painting on a museum website in the summer 2012 Art Documentation course. Today’s tagging assignment, with its combination of examining the Library of Congress Flickr records, having to write my own tags for the three images and then search for controlled vocabulary, and issues brought up in the readings, caused me to think more critically about tagging. While I do not embrace it wholeheartedly, I have begun to see its value. I think the best solution is the balance that author Diane Neal calls for between expert subject knowledge and more natural terminology.

The readings:
With regard to this week’s readings, Diane Neal’s “News Photographers, Librarians, Tags, and Controlled Vocabularies: Balancing the Forces” (2008) and Barbara Orbach Natanson’s “Worth a Billion Words? Library of Congress Pictures Online” (2007) are case studies (Neal’s is, whereas Natanson’s focus is less formal) that both address large-volume collections. Each article cites the number of images that librarians in news organizations and the Library of Congress are dealing with, respectively, and for me those numbers alone presented an extremely persuasive case for tagging. Given that the goal of these repositories is to make their holdings available (discoverable and retrievable) to their publics, the organizations simply do not have the manpower to catalog such huge volumes of works without tagging or some form of user added description. As Elaine Ménard and Margaret Smithglass state in “Digital Image Description: A Review of Best Practices in Cultural Institutions,” (2012), “Without metadata, the digital document has no real existence since it remains inaccessible” (292). Neal, furthermore, makes an important point about the need for specialized knowledge, even in the face of the staggering quantity of images news librarians must manage while meeting news editors’ demands for fast retrieval to make publication deadlines. Her article begins (the title) and ends with a call for balance between expert subject input (from the news photographers) to create metadata and the use of more natural yet still structured terminology from the librarians. As I have been learning in Metadata this term, the importance of structure is undeniable.

Interestingly, Neal’s proposal reminded me of Tom Blake’s term “expert sourcing,” which he prefers to “crowdsourcing.” Blake, the Digital Projects Manager at the Boston Public Library (BPL), gives the example of a collection of digitized baseball cards that the BPL has on its Flickr site. No one in the Library knew enough about baseball to create a good set of search terms for the collection, so Tom asked for help from an author of several books on baseball who has done quite a lot of research at the BPL. The author, in turn, called upon an organization of amateur, yet highly knowledgeable baseball historians to help create tags for the cards, to the metadata librarian’s satisfaction.

The Library of Congress Flickr site:
After looking at the records for several images, I was not overly impressed by the general quality of the tags. For example, for a photograph of the bandleader Les Brown and Doris Day (http://www.flickr.com/photos/library_of_congress/4843125023/in/set-72157624588645784) the tags are:  “Library of Congress,” “Doris Day,” “Les Brown,” “woman,” “man,” “smile,” “1940s,” “forties,” and “blonde.” With the exception of the proper names, which are already in the title of the photo, and the tag “1940s,” the others are so broad as to be virtually useless. The terms “bandleader” or “big bands” do not even appear, and the comments blather on about how young Doris Day is. That said, I tended to find the comments on the photo pages much more interesting and helpful than the tags. In several cases the commenters provided solid information that the Library of Congress could not have supplied without the expense of staff time devoted to research. For example, for a photo of the Australian pianist and composer Percy Grainger and his mother (http://www.flickr.com/photos/library_of_congress/9954775204/ ), a member of the public signaled that information on Mrs. Grainger could be found on the Grainger Museum website, and supplied the URL. Kristi, a Prints and Photos cataloger at the Library of Congress who checks in on the comments and points out those that will be used to improve the LOC records, wrote to say that the Library will be adding information from the Grainger Museum website, based on the help provided in the comment. Another solid example of the usefulness of tagging on the LOC Flickr site is a photo set entitled “Mystery Pictures–Solved!”, the title of which says it all.

To answer the question whether the Flickr Commons is an option for collections of all sizes, I would say that for a more manageable smaller collection, where the librarian or curator had subject expertise and the quantity of objects not that onerous to catalog, then Flickr is probably less necessary.

The tagging worksheets:
I had tagged very few objects on the Web prior to this assignment, so, inspired by the inanity of tags I have read yet trying nonetheless to create useful search terms, I created tags without stressing too much over their precision (meaning I tried not to think like a subject cataloger). Then I used my tags to look up terms in the AAT, the TGM, and the LCSH. The photo of the Chicago artist Ron Blackburn painting a mural was the easiest to create and to find terms for, because the thesauri for artworks tend to be some of the best conceived and most precisely populated. The AAT is the most precise, the LCSH too general, and the TGM sits somewhere in between in that it supplies more subject language than does the AAT but is looser when it comes to naming techniques and materials.

What became clear to me over the course of the exercise is how stilted the language of the LCSH is, and also that despite the quantity of terms it provides, it is not exhaustive. For example, I wanted a term from a controlled vocabulary to call attention to the booths in the photo of the county fair. “Booths” on its own does not exist in the LCSH, and I did not feel that any of the narrower terms were quite right. “Exhibition booths” in the TGM is not quite right because some of the booths could have been housing games, not displays of products. And to get across the idea of game booths, the closest was “carnival games,” but a carnival is not the same as a county fair. I realized that in cases such as this, one either has to go without an LCSH subject term or select one that is not quite accurate, which makes discovery difficult. The other inconsistency among the authority files is the use of singular versus plural terms. I began to understand the use of plurals, finally, in the case of the word “painting.” In the singular, it more apparently refers to the act of painting, whereas in the plural it refers to the finished product.

"For the Common Good":  the LOC final report:
The suggestions for future tag analysis on pages 24–25 of the LOC final report seem to me both sound and highly useful. Bullet point #2, incorporating popular terms into the LCSH, such as "Rose the Riveter" for the more cumbersome combination "Women—employment" and "World War, 1939–1945" makes a lot of sense. As for point #3, importing tags into the LC search environment, I would like to see quality control, not mass importation. Point #4, curating the tags on Flickr, particularly dropping the inaccurate or useless ones, would improve the integrity of the site, I think. Moreover, such curation could take place before a mass importation of tags into the LC search environment. Finally, it might be interesting to somehow tag the tags so that users were aware of which terms were generated by catalogers and which by the public. In any case, what these two pages of the report are getting at are two important areas:  (1) terminology quality control and (2) revisiting the definition of the authority file to make it a more accommodating resource.

No comments:

Post a Comment