Tuesday, October 22, 2013

Some cool photographs from New York City in the 1970s







I added these images because I've always been kind of fascinated by 1970s New York City. I think that I just kind of love the grime and graffiti. I also love the people, the way they are dressed  and the huge 1960s and 70s cars. Some of them are actually quite funny... At any rate, I hope you enjoy them.

Nicholas Long

Monday, October 21, 2013

The future of authority



While preservation and authority are important ideals for the library/archives/museum environment to uphold, the provision of access cannot be forgotten.  Flickr Commons offers institutions with an avenue to increase their collections accessibility as well as accessibility’s sibling, discoverability.  While some catalogers still cringe at the term “uncontrolled vocabulary,” the benefits of non-traditional methods of description like “tagging” are becoming hard to ignore.  This blog will examine how “tagging” can benefit both users and librarians.
            Who stands to gain more with Flickr Commons?  The clear winner is the user.  Flickr’s user interface offers the ability to both tag and comment.  Tags allow greater searchability while discussions in the comment section can lead to better understanding of the photographs, both from a content-based and a context-based perspective.  The thing that gives “tagging” superiority over LCSH, AAT, or other controlled vocabularies is the fact that it both for users and by users.  User-generated description means user-based access; because the terms used in “tagging” have come from those interested in the materials, there is a greater chance that a similar user to the tagger will find the image based on the tag, even if the tag is inaccurate.  The issue of “non-specialist” researchers was addressed in the Library of Congress’ report, “For the Common Good: The Library of Congress Flickr Pilot Project”:

We might also consider doing something with tags that other Flickr members have asked to have removed as inaccurate, such as “Dirigibles” and “Zeppelins” on a photo of a barrage balloon.  Because these terms could still be useful as entry vocabulary for non-specialists, it might be useful to qualify misused terms, e.g., “Dirigible (similar)” or “Dirigible (related to).” (25)

Though the terms “zeppelin” and “dirigible” may be inaccurate, users without that knowledge might never have found the proper picture simply because they did not know the proper term to search for.  Why should the users be punished for not having a vocabulary keyed to the intricacies of aeronautical terminology?
            Let us continue with this last example, but consider it from the cataloger’s end.  “Controlled vocabularies” do not just save the punishment for users, but also for the catalogers that prepare controlled descriptions.  Imagine the cataloger who sees a picture of what he or she believes to be a dirigible, makes a note, and continues.  While on lunch break, the cataloger feels a slight pain emanating from the abdomen.  No, it is not heartburn, but worse: cataloger’s guilt.  After a quick google search, it is discovered that it wasn’t a dirigible.  The cataloger rushes back and changes the record, removing the inaccuracy.  The new record, though accurate, has lost both the cataloger’s time as well as that of some users.  In the world of “tagging,” this issue would have cleaned itself up as someone would have eventually corrected the inaccuracy.  The original entry, though inaccurate, was closer to the cataloger’s instinctive vocabulary and therefore might have been closer to some user’s instinctive vocabulary as well.
Using controlled vocabularies can be very helpful to catalogers, preventing them from having to reinvent the wheel.  It has been said that the greatest thing about standards is that there are so many to choose from.  There are certainly enough controlled vocabularies to handle almost any subject, genre-form, geographical location or whatever else needs to be described by metadata, but holding any of these various cataloging languages up as “the standard” ignores the fact that all of them are a derivative of some common language.  Theoretically, language itself is the original “controlled vocabulary,” and although language has historically been more organic, it still is something that all taggers will have in common.  Will all taggers have the same level of vocabulary? No, but then again, neither will the catalogers.
Although there are plenty of good arguments for using standards, when it comes to description and user access, it is time for us to admit that these standards may be losing some of their effectiveness.  While it is important for catalogers to be aware of the many controlled vocabularies at their disposal, it is illogical to expect that our users take the time to learn these lexicons when Web 2.0 tools are making them increasingly irrelevant.  Perhaps the key word in the last sentence is “disposal.”  The future authority of catalogers may mean handing over the controls to the people who benefit from description the most: the users.

The Medium is the Message


NASA on Flickr Commons
"...Dryden pilot Neil Armstrong is seen here next to the X-15 ship #1 (56-6670) after a research flight. The X-15 was a rocket-powered aircraft 50 feet long with a wingspan of 22 feet..."

I found this image on Flickr Commons of legendary astronaut Neil Armstrong. Armstrong was the first person to step foot on the moon and participated in many United States missions into space during the 1960s. Here, he is pictured quite differently from how I have generally thought of him: young and handsome. "Young" and "handsome" could even be some of the terms I use if I were to share this picture on a social media site such as the blogging platform Tumblr or the "microblogging" site Twitter.

This week's assignment, along with the concept of sharing on social media sites, made me consider the ways in which we interact with archival material. The way we access a photograph, for example, be it physically in an archive, or digitally on Flickr Commons, has had a tangible consequences for how we describe it.

A phrase from mass media writer Marshall McLuhan comes to mind: the medium is the message. Tagging and controlled vocabulary are inherently different because they were created when materials were accessed in two different ways. Tagging came about as a product of social media websites. Library of Congress subject headings and the system of using subdivisions came about as a product of the card catalog. The systems we use to access archival materials differ as a result of the evolution in access.

According to our readings, the solution to improve accessability has been to implement both tagging and controlled vocabularies. The reason being that the faults of one method are countered by the other. Among the flaws or cons of tagging are that they are inconsistent, overly used, and highly subjective. The pros of tagging are that they do not require access to grammatical standards, are lost cost, and the knowledge base can be improved by the contributions of many, instead of one. However, controlled vocabularies are nice because they provide the consistency that tagging lacks. Controlled vocabularies also adhere to grammatical standards to make searching easier and are not as subject to being overused.

The cons of controlled vocabularies have become more pronounced with the advent of social media. As some authors have admitted, tagging is simply easier than using a controlled vocabulary. Some would argue, from a social media standpoint, why use a controlled term for a photo when you can tag it? Controlled vocabularies are a proprietary enterprise that plays to the profession of libraries when they used card catalogs. Additionally, from an intellectual standpoint, controlled vocabularies throw common consensus out the window and give total control to a classification scheme which could be riddled with biases and awkward subdivisions. Basically, employing a controlled vocabulary is equivalent to asking users to learn a language before giving them access to the material and ultimately impedes access.

In addition to this problem, another point to consider is the economic hardships produced by using a controlled vocabulary. Elaine Menard and Margaret Smithglass put it succinctly in their 2011 article, titled, "Digital Image Description: A Review of Best Practices in Cultural Institutions." According to Menard and Smithglass, "a controlled vocabulary term requires not only specialized knowledge and education, but also access to the pertinent resources, which are usually subscription or fee-based and often specific to an organization" (297). Using a controlled vocabulary is costly! Controlled vocabularies are costly on two levels. Not only do they take the labor of a highly educated employee, controlled vocabularies can come at a cost when it is necessary to access materials and resources related to indexing activities. Conversely, tagging is free. There is no maintenance on tagging.

The success of Flickr Commons speaks to the validity of tagging. In their 2008 report on the Flickr Commons Pilot Project, the Library of Congress found that "there were exceptionally few tags that fell below a level of civil discourse appropriate to such an online forum" (18). Many of the problems that the Library of Congress had anticipated might occur were not an issue. They were, in fact, surprised and impressed with the level of participation in the way of tagging. As the project has been successful for the Library of Congress so far, collections of all sizes would likewise benefit from joining the Flickr Commons.

As to what extent controlled vocabularies will be used in the future, I am not sure that we have much say in the matter. The nature of describing relies on the systems we use. Our language changes rapidly, especially as new systems for holding information evolve. Just as the card catalog has given way to the internet has a system for accessing archival materials, our descriptions will adapt. As long as the media on which we rely to access archival material evolves, so too will our regulations on description. Case in point: if someone wants to shares the above photo and tag Neil Armstrong as "hot astronaut," no social media site will stop them!

Controlled Vocabularies vs. User Added Description

Ever since my first encounters with word clouds and Library Thing, I have been skeptical of the value of tagging. The tags in the Beatley OPAC record for book I know very well really miss the mark, which immediately demonstrated to me how little the commenters understood the book. Moreover, I have never been able to forget the useless but enthusiastic “woo hoo” tag we found on the webpage for a painting on a museum website in the summer 2012 Art Documentation course. Today’s tagging assignment, with its combination of examining the Library of Congress Flickr records, having to write my own tags for the three images and then search for controlled vocabulary, and issues brought up in the readings, caused me to think more critically about tagging. While I do not embrace it wholeheartedly, I have begun to see its value. I think the best solution is the balance that author Diane Neal calls for between expert subject knowledge and more natural terminology.

The readings:
With regard to this week’s readings, Diane Neal’s “News Photographers, Librarians, Tags, and Controlled Vocabularies: Balancing the Forces” (2008) and Barbara Orbach Natanson’s “Worth a Billion Words? Library of Congress Pictures Online” (2007) are case studies (Neal’s is, whereas Natanson’s focus is less formal) that both address large-volume collections. Each article cites the number of images that librarians in news organizations and the Library of Congress are dealing with, respectively, and for me those numbers alone presented an extremely persuasive case for tagging. Given that the goal of these repositories is to make their holdings available (discoverable and retrievable) to their publics, the organizations simply do not have the manpower to catalog such huge volumes of works without tagging or some form of user added description. As Elaine Ménard and Margaret Smithglass state in “Digital Image Description: A Review of Best Practices in Cultural Institutions,” (2012), “Without metadata, the digital document has no real existence since it remains inaccessible” (292). Neal, furthermore, makes an important point about the need for specialized knowledge, even in the face of the staggering quantity of images news librarians must manage while meeting news editors’ demands for fast retrieval to make publication deadlines. Her article begins (the title) and ends with a call for balance between expert subject input (from the news photographers) to create metadata and the use of more natural yet still structured terminology from the librarians. As I have been learning in Metadata this term, the importance of structure is undeniable.

Interestingly, Neal’s proposal reminded me of Tom Blake’s term “expert sourcing,” which he prefers to “crowdsourcing.” Blake, the Digital Projects Manager at the Boston Public Library (BPL), gives the example of a collection of digitized baseball cards that the BPL has on its Flickr site. No one in the Library knew enough about baseball to create a good set of search terms for the collection, so Tom asked for help from an author of several books on baseball who has done quite a lot of research at the BPL. The author, in turn, called upon an organization of amateur, yet highly knowledgeable baseball historians to help create tags for the cards, to the metadata librarian’s satisfaction.

The Library of Congress Flickr site:
After looking at the records for several images, I was not overly impressed by the general quality of the tags. For example, for a photograph of the bandleader Les Brown and Doris Day (http://www.flickr.com/photos/library_of_congress/4843125023/in/set-72157624588645784) the tags are:  “Library of Congress,” “Doris Day,” “Les Brown,” “woman,” “man,” “smile,” “1940s,” “forties,” and “blonde.” With the exception of the proper names, which are already in the title of the photo, and the tag “1940s,” the others are so broad as to be virtually useless. The terms “bandleader” or “big bands” do not even appear, and the comments blather on about how young Doris Day is. That said, I tended to find the comments on the photo pages much more interesting and helpful than the tags. In several cases the commenters provided solid information that the Library of Congress could not have supplied without the expense of staff time devoted to research. For example, for a photo of the Australian pianist and composer Percy Grainger and his mother (http://www.flickr.com/photos/library_of_congress/9954775204/ ), a member of the public signaled that information on Mrs. Grainger could be found on the Grainger Museum website, and supplied the URL. Kristi, a Prints and Photos cataloger at the Library of Congress who checks in on the comments and points out those that will be used to improve the LOC records, wrote to say that the Library will be adding information from the Grainger Museum website, based on the help provided in the comment. Another solid example of the usefulness of tagging on the LOC Flickr site is a photo set entitled “Mystery Pictures–Solved!”, the title of which says it all.

To answer the question whether the Flickr Commons is an option for collections of all sizes, I would say that for a more manageable smaller collection, where the librarian or curator had subject expertise and the quantity of objects not that onerous to catalog, then Flickr is probably less necessary.

The tagging worksheets:
I had tagged very few objects on the Web prior to this assignment, so, inspired by the inanity of tags I have read yet trying nonetheless to create useful search terms, I created tags without stressing too much over their precision (meaning I tried not to think like a subject cataloger). Then I used my tags to look up terms in the AAT, the TGM, and the LCSH. The photo of the Chicago artist Ron Blackburn painting a mural was the easiest to create and to find terms for, because the thesauri for artworks tend to be some of the best conceived and most precisely populated. The AAT is the most precise, the LCSH too general, and the TGM sits somewhere in between in that it supplies more subject language than does the AAT but is looser when it comes to naming techniques and materials.

What became clear to me over the course of the exercise is how stilted the language of the LCSH is, and also that despite the quantity of terms it provides, it is not exhaustive. For example, I wanted a term from a controlled vocabulary to call attention to the booths in the photo of the county fair. “Booths” on its own does not exist in the LCSH, and I did not feel that any of the narrower terms were quite right. “Exhibition booths” in the TGM is not quite right because some of the booths could have been housing games, not displays of products. And to get across the idea of game booths, the closest was “carnival games,” but a carnival is not the same as a county fair. I realized that in cases such as this, one either has to go without an LCSH subject term or select one that is not quite accurate, which makes discovery difficult. The other inconsistency among the authority files is the use of singular versus plural terms. I began to understand the use of plurals, finally, in the case of the word “painting.” In the singular, it more apparently refers to the act of painting, whereas in the plural it refers to the finished product.

"For the Common Good":  the LOC final report:
The suggestions for future tag analysis on pages 24–25 of the LOC final report seem to me both sound and highly useful. Bullet point #2, incorporating popular terms into the LCSH, such as "Rose the Riveter" for the more cumbersome combination "Women—employment" and "World War, 1939–1945" makes a lot of sense. As for point #3, importing tags into the LC search environment, I would like to see quality control, not mass importation. Point #4, curating the tags on Flickr, particularly dropping the inaccurate or useless ones, would improve the integrity of the site, I think. Moreover, such curation could take place before a mass importation of tags into the LC search environment. Finally, it might be interesting to somehow tag the tags so that users were aware of which terms were generated by catalogers and which by the public. In any case, what these two pages of the report are getting at are two important areas:  (1) terminology quality control and (2) revisiting the definition of the authority file to make it a more accommodating resource.

Sunday, October 20, 2013

Describing images

From our readings and doing a little searching on the web, it seems that there is a general consensus in the library world that there should be a mix of controlled vocabulary terms and user-generated tags. These compliment each other and help to create a more complete description of images.

Controlled vocabularies are useful because they give researchers structure that helps guide their research. This can be helpful to researchers but it is also limited by the knowledge of the person who is adding search terms. As Joan M. Schwartz explained, “The words we choose to describe what we do reflect our view of the world, the values we hold, the things of this world that we value. Yet archivists continue to employ language, sometimes based on erroneous assumptions about the nature of photographs, other times derived from concepts borrowed from other professions, which privilege some archival materials and marginalize others.” Controlled vocabularies are generally built with more academic purposes in mind.

There are a lot of benefits to using user-generated tags. User-generated tags include a much wider range of terms that allow images to be accessed by a much wider audience. This makes it so that images can have a more diverse and richer description of what is going on in pictures. Plus, user-generated tags come at no costs to libraries and archives. Further, allowing users to create tags makes working with images more interactive and gets users involved with library materials. On the other hand, there are a lot of problems that come with letting users create tags. One of the biggest problems with using user-generated tags is that people will use random search words to tag images that will lead other people to search results that have nothing to do with the search terms that they used.

I think that Flickr Commons can benefit collections of all sizes. Flickr Commons reaches a large audience. It is a socially oriented photo archive; it is interactive and fun for users. If any organization wants its collection to reach more people, using a platform that has a completely new audience will get its photos viewed by more people.

Resource list:

Schwartz, J. M. (2002). Coming to terms with photographs : Descriptive standards, linguistic “othering,” and the margins of archivy. Archivaria 54, 147.

Collaborative Access: The use of controlled and uncontrolled vocabularies

I think that the key issue is access.  From personal experience and as a student in GSLIS I find the LCSH vocabulary counter-intuitive.  If we consider that the past practice and creation of that system is post-digital age than we can understand and accept its usage within information science. Yet the simplicity of Google searches or Amazon purchases sets an expectation of access that is ever re-defining itself.

Within the last twenty years we have moved into what could be considered the digital age.  I see a parallel between controlled and un-controlled vocabulary usage much like the difficulties to get everyone on board with abandoning dedicated word processors, typewriters and using PCs and Macs.  Of course as technology progressed usage became more seamless and accessible.  The idea of “user friendliness” can apply in both instances.

The very idea of “controlled” vocabulary conjures an elitism that is no longer relevant.  I find the hierarchical structure of LCSH rooted in a system that is not collaborative or encouraging inclusion of all those that make up our population. 

“Controlled vocabulary also aids the institution in keeping their subjects politically neutral and objective.” (Andra – “Increasing Access Through Tagging”)

I really appreciate this comment because not only is it true but the LOC has been slow to recognize and change racial and gender based terminology under the guise that terms are interlinked and this would disrupt the current system. 

Yes, there are many problems with uncontrolled vocabularies – syntax, homonyms, synonyms.  I do not see the complete elimination of controlled vocabularies.  However, the users’ access should be the focal point of taxonomy – not the creation of complex and insular disciple driven language.

Folksonomy is well suited for the photographic archive.  No one’s perception is the same and the comments of multiple users about a photograph or collection I feel, would be extremely helpful to researchers, librarians, archivists, artists and users across multiple disciplines.

Browsing Flickr Commons – I chose to look for photos on African American life (especially early 19th century) and was surprised to find a member (since 2007) who had compiled a Black History Album that was very extensive and crossed many chronologies.  This is an excellent example of creative and collaborative information sharing.  


Controlled Vocabularies vs Social Tagging

I am a fan of controlled vocabularies. The precise use of specific terms to describe an item has allowed me to find exactly what I wanted when I wanted it. My initial browsing of the Smithsonian Institution’s collection on Flickr did not sway me from the opinion that social tagging leads to chaos. I found photo after photo with numerous dubious, if not completely erroneous tags. As an example, the photo of Maria Curie below was tagged as both “black and white” and “sepia”. Without even looking at the photo, we know  one of these tags is incorrect.

Marie Curie
Creator: Transocean
Collection: Smithsonian Institution
URL: http://www.flickr.com/photos/smithsonian/2583275677/

The photo contains many other useless tags. A single user added the tags “person”, “wearing”, “parka”,  and “outside”. In isolation, the tags do not mean much, and combining them into the phrase “person wearing parka outside” is simply not true. The Nobel Laureate’s photograph was also labeled as “intense”, “old woman”, “sad”, and “upset”, all subjective labels which may not be useful for someone looking for a picture of Marie Curie.

This single photo is illustrative of all the arguments against uncontrolled vocabularies. Users do not combine similar ideas under a single term, instead using a multitude of synonyms. The terms “woman” and “scientist” are applied to this photo, as is “women in science”, but other similar, applicable terms are missing. For example, “woman in science” or “women scientists” are equally accurate, but without a controlled vocabulary, we do not have a preferred term to collect all of the images related to the idea of women in science and therefore may not discover certain photos as part of a search.

Another common problem was that some taggers did not combine words into a single term. The tag “Marie Curie” was accurately applied, but the photo was also tagged as “Marie” and “Curie” separately. Do users want this photo when they search for “Marie”?

When we combine the erroneous terms, broken compound terms, and the lack of preferred terms, I wonder how users can effectively use the system. There is no way to know if you should have searched for a synonymous term and search results are flooded with images tagged in error, leaving the user to browse through more results than necessary. All of my initial biases toward controlled vocabularies were confirmed by the tags applied to this image of Marie Curie.

But, as I explored the topic more, I began to reconsider my rigid rejection of user-supplied tags. Following the Library of Congress pilot project on Flickr, Michelle Springer, et al. reported cases where comments and tags from the user community helped catalogers improve the Library of Congress records for specific photos (25-31). In my own browsing, I found an example of users adding information to enrich the Library of Congress catalog record. The Lewis Wickes Hine collection contains an image of Nan de Gallant, a 9-year-old who worked in a sardine cannery. A user pointed out that Nan’s full name was Anna J. Gallant and provided a link to other photos of Anna. Without input from the community, Anna J. Gallant would have remained the anonymous “Nan.” As this example shows, soliciting contributions for a large, diverse community can greatly enhance our ability to provide correct, rich metadata records.


Nan de Gallant
Creator: Lewis Wickes Hine
Collection: Smithsonian Institution
URL: http://www.flickr.com/photos/library_of_congress/7985823070

Our class tagging exercise further exposed the limits of controlled vocabularies. A photo of Ron Blackburn painting a mural seems like a simple idea to express until you attempt to apply a controlled vocabulary to it. Neither the Library of Congress Name Authority nor the Union List of Artist Names contained an authority record for Ron Blackburn. While both vocabularies have mechanisms for adding names, it was disappointing to not find the artist already listed. A cataloger is left to create an entry whether or not the entry is later added to the larger vocabulary. It was also surprisingly difficult to use the Thesaurus for Graphic Materials to express mural painting as a subject of a piece rather than a mural as a medium. I found myself sympathizing with users who find controlled vocabularies to be slow to be updated, blind in certain cultural areas, and difficult to use.

Now that I have argued for and against both sides, where does that leave us? We must find a way to balance the pros and cons of each system. First, we should use the community’s collective wisdom to improve our controlled vocabularies. As Springer et al. state in the LoC report, one suggestion is to “compare tags used by Flickr members against terms/references found in vocabulary lists  used primarily to describe photos at LC like Thesaurus for Graphic Materials (TGM) or Library of Congress Subject Headings (LCSH)” (24). The specific example they give is potentially adding the term “Rosie the Riveter” to the vocabulary, which is ironic given that the image below does not use the tag, but I recognized it from an article attempting to identify the original “Rosie”.

Woman working on a "Vengeance" dive bomber
Creator: Alfred T. Palmer
Collection: Smithsonian Institution
URL: http://www.flickr.com/photos/library_of_congress/2179038448

Second, we should consider the voice of the community at large instead of allowing all tags to be equal. In Introduction to Metadata, Tony Gill advocates for a system where “each time an individual user labels a Web resource with a specific descriptive tag, it counts as a ‘vote’ for the appropriateness of that term for describing the resource. In this way, Web resources are effectively cataloged by individuals for their own benefit, but the community also benefits from the additional metadata that is statistically weighted to minimize the effects of either dishonesty or stupidity.” In other words, one user may erroneously state that Marie Curie is wearing a parka, but it is unlikely most people will make that mistake. We could alleviate the negative effects of these bad tags if we discount the unpopular ones.

Finally, better technology interfaces may help alleviate some of the problems. the LoC report recognizes that some of the bad tags are due to “intended word mergers to overcome system syntax requirements (real or perceived) [and] unintended de-linking of multi-word phrases and terms” (Springer et al. 23). Clearer instructions on the allowed syntax and how to enter multi-word phrases may alleviate some of the problems I found while browsing the collection.

I keep coming back to the question of what are we really trying to accomplish here. We all want to be able to find the resources that satisfy our particular needs. We should not be getting caught up in defending one side or another in the war of controlled versus uncontrolled vocabularies. Instead, we should be taking the best of both systems to satisfy the users’ needs.

Springer, Michelle, et al. “For the Common Good: The Library of Congress Flickr Pilot Project”. October 30, 2008. <http://www.loc.gov/rr/print/flickr_report_final.pdf>

Gill, Tony, et al. Introduction to Metadata. 3rd ed. Los Angeles: Getty Publications, 2008. <http://www.getty.edu/research/publications/electronic_publications/intrometadata/index.html>







Tagging Photographs

Tagging Photographs: The pros and cons of uncontrolled vs. controlled vocabularies

The purpose of a tag in archival photographs is to bring together a topic or subject of interest across a variety of images. In the Web 2.0 world a tag is like a metaphorical thread, stitching together things that may have been otherwise unconnected. I do not personally believe there is such a thing as "too much tagging," and if a picture is truly worth a thousand words, there is an incredible amount of possibility to linked associations of images. However, opening up to these seemingly endless possibilities also leaves room for completely irrelevant posts. One should not tag, for example, a picture of fruit with the word "socks". Tagging is a wonderful thing, and like most wonderful things, it can be easily abused.

Using controlled vocabularies is not necessarily the answer to avoiding irrelevance, but it is more likely those who use them are professionals, and/or knowledgeable about the subject matter. A pre-established, controlled vocabulary will also allow for greater uniformity in categorizing and searching.
The biggest problem with controlled vocabularies is the average user: S/he is not familiar with the terms information professionals use and would likely not think to search with them, let alone use them as tags. If a user does not know the proper term to search for what they want, they may miss something that could have been particularly valuable to their research.
Comparatively, if tagging is open to users and the vocabulary is uncontrolled, as seen in the LOC Commons, the odds of people finding what they need can greatly increase.
I believe the professionals should always begin by providing their own tags to images, but allow users to add to them in order to diversify the search process. Great minds do not always think alike and some people may be looking at a picture from a completely different perspective.
If socks are tagged in a picture of fruit, does it really impede access? I believe not. The original posters of the image should have supplied relevant tags, and users would have likely followed suit. If some users made inappropriate tags, whether intentional or not, they may be removed by administrators. I found a picture of a cattle ranch on the Commons that contained the tag "dive" someone had already commented that this seemed an odd tag, and I wondered if it was meant to say "drive," as in "cattle drive." However, this instance would do nothing to impede my access to a search for ranch photographs. I imagine someone searching for a "dive" might find the inclusion of this photo puzzling, but again, this is not enough to outweigh the positives of uncontrolled vocabularies.

People make mistakes, and others are intentional trouble makers. But, I believe we, as archivists and historians, should have a little more faith in the general public when it comes to accepting help with our collections. We never know what kind of information we may be given and how many new doors may be opened for us. The project of the Commons has shown that people are eager and willing to participate.

Saturday, October 19, 2013

Increasing access through tagging

When trying to assign subject terms to items, the librarian has to find a balance between control and access, between specificity and broadness, between imposing meaning and allowing users to find their own meaning, between institutional viewpoints and user interpretation, between objectivity and subjectivity. It is a hard balance to achieve, and a blended approach between controlled terms imposed by the library and uncontrolled tags submitted by viewers may be best. Controlled and uncontrolled vocabularies each have pros and cons, and ideally, they could both be used to balance the faults of the other.

Controlled vocabularies, such as Library of Congress Subject Headings and terms from the Getty Art and Architecture Thesaurus, aim to limit subject access to specific and uniform terms. They eliminate most redundancy, and they limit syntax and vocabulary so that all users are using the same terms. If the subject term is known by the user, then finding items that fit that topic is easy, because all items will be tagged with the same exact term, with no variations in spelling or punctuation or phrasing. Controlled vocabulary also aids the institution in keeping their subjects politically neutral and objective. The terms were chosen by another organization, and any potential choices about political correctness of terminology fall on the organization who developed the vocabulary.

On the other hand, controlled vocabularies often deviate from natural language, such as the order of names when searching. Users who search using natural language will search for Firstname Lastname, while many controlled vocabularies have the name listed as Lastname, Firstname. Controlled vocabularies also eliminate synonyms and redundancy which can aid in access. Users may not know the exact term and would search for a similar but different word (say, they would search for “bubbler” instead of “water fountain”, or they search for “dress” instead of “dresses”). Users might then be frustrated that their search is returning no results. Lastly, controlled vocabularies limit input and access for many users who do not fit the “default” demographic. The controlled vocabularies are often created by white institutions of power, and may not reflect the voice of everyone in society. The Library of Congress’s view of social issues like feminism, racism, and the struggles of LGBTQ may not use the terms that the respective people would want used. They only represent the viewpoint of the one social group. It also limits access by international users. The lack of redundancy and similar terms mean that international users who are searching for pictures of trucks will get no hits on their search term “lorry”. The internet is an international forum, and it should be accessible by a varied and international community. Controlled vocabularies do not allow this.

Uncontrolled vocabularies offset many of the downsides of controlled vocabularies. Uncontrolled vocabularies greatly improve access by allowing users to tag images with colloquial terms. Uncontrolled vocabularies have redundancy so different but similar searches will still return results. Natural language searches become much easier, as do searches with various spellings and syntax. For instance, while the “official” spelling of the Korean dancing girls is “kisaeng,” many people spell it “gisaeng,” and if the image was tagged with both, then both groups of people would be able to find it. While controlled vocabularies often try to remain detached and objective, assigning subjects to only what the cataloger considers the most important parts of the image, uncontrolled vocabularies and social tagging allow people to apply their subjective views to the image. Different social and political groups can use their own terms for issues. Small side objects in pictures can be highlighted by a tag, such as the musical instrument in the corner of the weaving picture that was mentioned in the LoC report. While it may not be the “point” of the image, uncontrolled vocabularies and user tagging would allow people looking for pictures of that particular instrument to still find the image. Uncontrolled vocabularies essentially exponentially increase the accessibility of an item by giving equal voice to all interpretations and objects in the image in whatever terms the users find best to describe them.

Which is not to say that it does not have downsides. Uncontrolled vocabularies have no rules about spelling or syntax, or rules about whether to use a singular or plural. They allow almost infinite combinations of terms and infinite broadness and specificity. No image can truly be tagged with every single variation of a term, so inevitably some tags would be missing from some images. Non-thorough taggers may tag an image “dress” but forget “dresses” and the searcher would then have the same problem as with controlled vocabularies. Other problems include the issue of controversy. Users at either end of a political spectrum may disagree with the other’s terminology, and may complain about the other viewpoint. This places a burden on the librarian to decide whether to give both views equal space, or to try to decide the political correctness of each point of view and thereby making a moral, social, and political statement on behalf of the institution.

Ultimately, I think the best solution would be to use both sets of terms. Let controlled vocabulary terms be added to the tags list, and then allow user to fill in the rest on their own. The user tags will allow better access and variety, and nothing on the internet is worth anything if it is not accessible by a wide audience. Also, by having both sets of terms, the cataloger can make sure that the areas that the institution wants highlighted are definitely available as access terms, but there is also the flexibility to allow more terms. There are always things in pictures that the cataloger won’t see or won’t think are important that others would. These items that the users see can then be new access points for others. The wonderful thing about tagging on platforms like Flickr is that the number of access points is greatly increased. Controlled vocabularies are rooted in the days of card catalogs, when access was limited by the space that the cards took up. With more space, we can create more access points, and we can better describe the object by allowing users to help us create the access points that they want. It may not be as consistent as just using controlled vocabularies, and there may be items that don’t get tagged efficiently for access. But the increase in access by uncontrolled vocabularies more than offsets the few items that fall between the cracks.

As for whether the Flickr Commons could benefit institutions of all sizes, I say yes. It increases visibility of the collections and therefore the institutions. It allows increased access to valuable material that can be then be used by the internet community at large. Smaller institutions can better lobby for local support, as well as attracting the attention of historians and researchers from the wider audience who can help identify and add meaning and context to items. Institutions of all sizes can benefit from the increased name awareness and traffic.

Uncontrolled vs. Controlled Vocabularies

The Web 2.0. movement has had a profound impact on digital libraries and archives. Now it is possible for users to assist archivists and librarians with the daunting task of imposing order on digital collections via the creation of "tags". This phenomenon has become especially prevalent on photo-sharing websites such as Flickr and Picasa. Currently, there is an ongoing debate over whether or not user tagging is actually beneficial to improving access to collections.

The pilot project of the Library of Congress sought to address this issue by opening up a digital collection on Flickr. The results of the pilot suggest that the LOC's adoption of web 2.0. technology has yielded positive results. However, there are still concerns regarding the efficacy of "tagging" which will continue to be discussed and explored. Here, I will discuss my own views on this matter through an analysis of the pros and cons of uncontrolled vocabularies such as user generated tags, and controlled vocabularies such as the Library of Congress Subject Headings.

Above all I think that uncontrolled vocabularies are great because they allow for a myriad of different terms to be applied in the representation of a single object. Ultimately, this leads to a more holistic and realistic description as no object can be sufficiently described through the eyes of a single person or even a group of so-called experts. Furthermore, an element of dynamism is added by allowing users to continue to generate new tags into the future. In addition, there is the practical aspect of uncontrolled vocabularies  which takes the huge burden of meticulously assigning metadata and shifts it into the hands of the public, thus freeing up a lot of time for archivists and librarians.

That being said there are certainly some negative aspects of using uncontrolled vocabularies. First and foremost, limitless freedom is not as good as it may seem. If, for example, an image can be described in an infinite number of ways, and thus tagged in an infinite number of ways, then ultimately you will have an infinitely broad collection of terms that do not refer to any unique objects. Therefore, some degree of order must be imposed so that the objects in a collection retain their individual meaning. This may not reflect reality, but for the sake of living in a rational world, some degree of order is necessary.

I have just basically described the primary argument for using controlled vocabularies, but I will add to that by emphasizing the importance of assigning very few, but precise terms to objects. This allows for greater control of the collection, making it easier to manage and present to the user in a meaningful way that reflects an institution's unique values.

As for the cons of using controlled vocabulary, the first thing that comes to mind is the fact that they require users to be familiar with the accepted search terms. In a digital environment run by Google, this is difficult for contemporary users to accept. As a user myself, I can relate to this. We have the technology to make our collections more transparent and accessible so to continue using traditional controlled vocabularies seems like a stubborn refusal to accept change.Although, I do not think that we should abandon controlled vocabluaries completely. Certainly, in conducting this analysis I have realized the importance of controlled vocabularies in terms of collection management.

Therefore, what I think is needed if more institutions decide to utilized web 2.0. software to digitize their collections is a balance between controlled and uncontrolled vocabularies. In practice this might mean that users can still create tags freely, but that there will be a stricter review process to filter out submissions that would impede access. Of course, more research and experimentation is required in order to see if this is actually a practical and/or beneficial solution.

Thursday, October 17, 2013

Tagging reflections

I feel that the tagging exercise gave me a new perspective on the merits of controlled vs. uncontrolled vocabularies. I decided to assign all of my tags before looking at what AAT and TGM had to offer, trying to select words that were relevant to the subject of the photograph – words that I would use when trying to locate not only these specific photographs, but when trying to find a larger group of images into which these photographs would fall. Browsing the Flickr Commons collections, I had searched for a number of place names, so these figured heavily into the tags I assigned. I was surprised, therefore, to find that AAT and TGM do not include geographic terms in their controlled vocabularies. I feel that many users of archival photo collections want to locate photos of a specific area, and the lack of those terms in controlled vocabularies would be frustrating to them. I found the controlled vocabularies more useful when it came to concepts, rather than proper names and locations – I was unsure of how best to represent the ride seen in the Delta County Fair photo or the charity in Distributing Surplus Commodities, and was pleased that AAT and TGM had the concrete term “amusement rides,” and that AAT’s “charities (nonprofit organizations)” and TGM’s “charitable organizations” were more specific than what I had come up with.

Because of this experience, I believe that controlled vocabularies and user-submitted tags should work hand-in-hand, rather than selecting one option at the expense of the other. I felt that the suggestion in the Library of Congress report of incorporating popular concepts into controlled vocabularies was an excellent one. If our main goal is – and it ought to be – facilitating user access to photographic collections, we should be describing our holdings with terms that users will be searching with. The fact that users of the Flickr Commons collections have added useful information to a staggering number of photos shows that user input is beneficial for collection description and access, even if that input is not strictly controlled. I think that the Commons option could benefit collections of all sizes, and might be especially helpful for small repositories both to gain visibility for their collections and receive user input for collections that a smaller staff may not have time to research and describe. Although user tags are not always accurate, and may be redundant or misspelled, I think that they ultimately aid access much more than they impede it. The example from the Library of Congress report, of 73 users tagging a photo “Rosie the Riveter” when the LC headings were  “women—employment” and “World War, 1939-1945,” makes this particularly clear. While a serious historian may use such controlled terms, the more casual user of a photo collection would certainly search “Rosie the Riveter” before either of those options and be disappointed when LCSH returned no results. On the other hand, controlled terms for more abstract concepts, or ones that could be represented by a wide variety of terms, make much more sense than allowing a photo to be tagged with multiple redundant terms. Creating a generalized guide to tagging for Flickr users (on topics such as variant terms and plural forms) could help users (especially the “power taggers” described in the LOC report) assign the terms that will be the most helpful for future searchers. If we are to truly make user access to our photographic holdings a top priority, we should pay attention to how users search and modify our terms to fit their needs.