Comparing Full-Text Searching with Traditional Library ToolsThis is a featured page

Back to: Full-Text and Search Engines
Traditional Bibliographic Tools

Through all of these discussions about full-text searching and controlled vocabulary, it is important to understand that one is not better than the other. They each have their own strengths and weaknesses. But it is the task of the user to realize that full-text searching and controlled vocabulary work together and can eliminate many of the other's weaknesses, that is, if someone knows how to search both correctly.

  • Remember, you can search most library databases (including AUR's) with Boolean operators, but by doing so, you more-or-less by-pass the controlled vocabulary. You are also searching only on the bibliographic records and not the full-text.

Strength of Controlled Vocabulary
As we noted before, searching controlled vocabulary allows people to search for entire concepts. And when there is a list of those concepts arranged in a hierarchical or other conceptual arrangement that a user can search, this magnifies the retrieval of materials. So, a person who searches, e.g. Tyrants in the LC Authority File will find that this is not the correct term: they must search under Dictators. If they look at Dictators and see the cross-references, the user finds Despotism as a See also term. In this way, someone who thinks of tyrants can get to the term despotism, which may be the concept they really want.

By using the authority files, although an individual item may not have a precise term, the user is led to explore related materials in an organized manner.

Weakness of Controlled Vocabulary
Assigning controlled vocabulary is a labor-intensive task. It takes experts cooperating around the world to create these terms and assign them to individual materials. Since it is so labor-intensive, controlled vocabulary is reserved for the most general concepts of an item, and many "bits of information" inside a text cannot be included. For example, a book may be on Italian sculpture, but there may be a nice couple of pages about life in Florence during the time of the Medici. These couple of pages will not appear in the controlled vocabulary and is effectively lost to the searcher.

Another consequence of the labor-intensive task is that it takes time for the records to be made and included into a catalog, and materials may be unavailable to users for a time. But perhaps the major weakness of controlled vocabulary is that it seems to many users that it is more difficult to do. Searching the authority files is an extra task right at the beginning. The time saving for the user appears at the end, after the concept has been found, and all the materials on the topic are grouped together. (See the example of searching Dogs in the sectionTraditional Bibliographic Tools) There are also not nearly enough cross-references.

The authority files and bibliographic records do not work together very well yet. It needs to be easier to search both at once.

Strengths of Full-Text Searching
Many of the strengths of full-text searching supplement the weaknesses of controlled vocabulary. (See Full-Text and Search Engines) First, it is immediate: the moment an item is online, it can be searched. Another strength is that the little "bits of information" that otherwise went missing can be searched and found. With the power of the Boolean operators, highly sophisticated and subtle searches can be made. Since the entire texts of items are online, many more terms are available for searching and as a result, far more materials come up immediately with a simple search.

Weaknesses of Full-Text Searching

There are just as many weaknesses for full-text searching as there are for controlled vocabulary. One thing is that full-text searches search text instead of concepts, so that if a word is missing from the text, it cannot be found. Determining if something is missing from a full-text search can be very difficult to realize. For example, a user who wants materials on "World War I" might search "WWI" in Google and get many results (as of this writing 5,890,000 results). Users may be happy with this result, until they realize that with a full-text search, they cannot get any primary sources, since no one during World War I called it World War I until World War II came about in 1939. If no one has included WWI in the text, the item is effectively lost to the user. There are many, many examples of these sorts of extremely subtle problems.

Another problem is that to do some of the more powerful searches, a good knowledge of Boolean logic is important, which can be very difficult to master. For example, using the NOT operator without due consideration can eliminate many items you really want.

Other, technical questions exist as well. The search language and methods are not standardized. Although most search boxes today do not require you to put in the Boolean operator between terms, it is important to know what the "default" operator is. Normally, it is AND, but sometimes it is OR. For example, typing "rome italy" is handled in Google as "rome AND italy" but in another database, it may be "rome OR italy". Truncation symbols can vary a lot: *, ?, #, $. In some databases, these may mean different things; in others not.

Advantages of All
All of this may seem unduly complicated, but before computers, there was no option for people other than to use the traditional bibliographic tools, and using the methods described there. Boole was still unknown and there was no place to use his Boolean operators. When full-text was introduced, it was seen as a tremendous advantage to go beyond the old ways.

The old ways have not been made obsolete however. What's nice is that many of the strengths and weaknesses of one type of searching are balanced by the other. It is important for searchers to understand that both types of searching are necessary today.

For a good overview of how to search full-text databases, see: Power Searching For Anyone



No user avatar
j.weinheimer
Latest page update: made by j.weinheimer , Aug 21 2007, 7:38 AM EDT (about this update About This Update j.weinheimer Edited by j.weinheimer

4 words deleted

view changes

- complete history)
Keyword tags: None
More Info: links to this page
There are no threads for this page.  Be the first to start a new thread.