From this result, the computer can relate the true sets in various ways, through the operators:
AND, OR and NOT
Trajan
AND
Portraits |  | AND lowers the number of records only to those that contain both terms |
Trajan
OR
Portraits |  | OR increases the number of records to include those that contain either word |
Trajan
NOT
Portraits |  | NOT lowers the number of records to exclude those that have the word |
In this way, an infinite number of
TRUE sets can be added, e.g.
trajan and portraits and rome. Since this is based on mathematics it can get rather tricky if you mix the operators together, for example, if you want to add
"rome or roman" to the search:
trajan and portraits and rome or roman
will give you different results depending on how they are grouped: do you want:
trajan and portraits and (rome or roman)
which limits the previous search to all records with either of the words Rome or Roman

or
(trajan and portraits and rome) or roman
which opens the search to include anything with the word Roman in it

As we see, the results are quite different and similar to 1 + 2 x 3. There may be two answers, depending on how they are related.
(1 + 2) x 3 = 3 x 3 = 9
1 + (2 x 3) = 1 + 6 = 7
If you want to do these kinds of searches, you should do either
separate searches, e.g.
"trajan AND portraits AND rome" and another search
"trajan AND portraits AND roman" or ask for help.
Some additions have been added to these operators such as
exact phrase, e.g.
"white house" does not retrieve
"the house is white," and
near, e.g.
"dante NEAR3 comedy" would limit the search to the words
dante and
comedy within three words of one another.
There are various types of
truncation, that is, a way of searching for
multiple letters at once (for example,
fascis* retrieves
fascist, fascists, fascism, etc.), and
fuzzy searching, i.e. inexact searches that will retrieve information that
comes close to the desired search, (this allows for
spelling errors).
- Not all computer systems offer all of these searches.
Arrangement of Boolean ResultsSomething rather surprising happened when computer scientists began to use
Boolean algebra: it turned out that search and retrieval through the
Boolean operators was quite
simple for a computer to do, but the results turned out to be unsatisfactory since, although a full-text search could be done very quickly, the searches themselves would routinely retrieve
thousands of results and were more of a
hindrance than a help to the searcher. So, it turned out that the real problem was to make the results
useful, and so efforts turned to how best to
arrange the search results.
There have been many attempts to do this and many failures. Lately, there have been some successes with modern search engines, such as
Yahoo and
Google. What are the differences between the results from a search engine and the results from traditional bibliographic tools?
If we remember from the previous section, the
purpose of a traditional library catalog is to enable people to find everything in a collection in certain ways. Therefore, in a catalog, search results allow people to find materials by their
authors, titles, and
subjects. This method allows for
concept searching by using
special forms found through
authority files.
The modern search engines have completely different goals from traditional library tools. One of the main differences is that there is
no concept searching available: search engines can only search text. Therefore, you cannot search the
concept World War I; you can only search the words that may be about World War I; so you search
"World War I" "wwi," "ww1" "World War One," "First World War," "1st World War," and so on.
How does a
search engine work? What is the purpose of a search engine? How is this purpose different from a traditional bibliographic tool?
To discuss this, we will focus on two of the most popular search engines:
Google and
Yahoo.
Google adds information to its database using
web crawlers that automatically go out, scan the web, and bring back links and text for people to search. As everyone knows, searching
Google (using Boolean operators) can be very fast, and the results can make the users very happy. But, it is routine that a searcher returns
100,000 or more results. Not too many people will look at all 100,000. How are these results arranged?
Google uses a method called
"Relevance Ranking," which is determined by a
mathematical algorithm that Google keeps secret. Why is it secret?
It turns out that for many businesses, selling their wares on the web is a matter of
life and death. Many institutions also want users to find the information that they have gone to great expense to place on the web. In any case, research has shown that few people look beyond the
"top 10" hits, and even fewer people go beyond the
first page of results. Therefore, there is tremendous pressure to get into the top
10 results of a search.
If Google were to reveal precisely how their algorithm worked, people would be able to
manipulate the results to get to be #1, regardless of how
"reliable" the search result turned out to be for a user. This has happened, and we will take a look at an example later.
Google's
relevance ranking is explained in the following way:
PageRank Explained PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important." Using these and other factors, Google provides its views on pages' relative importance.
In essence, Google uses the power of
"citations" to help arrange the results. A page that has more links to it, i.e. is
cited more often, appears higher in the results.
Yahoo also has what Google has: an area that is automatically created by web crawlers and items arranged according to their own algorithm, but Yahoo is unique in that it has a
Directory that is made by
human editors. People submit a web site that they would like to be included in the directory. When they do this, they can suggest categories that the Yahoo editors may change. Yahoo editors may also decide
not to add a site to the Directory.
Users can search
Yahoo similarly to searching
Google, or they can opt to browse and search the
Directory.
For more information, see the Yahoo Help pages for
Suggest a site.
MoneyMoney is an unavoidable topic whenever someone discusses the internet.
Google and
Yahoo make a lot of money. How do they do that? A lot of it has to do with
making it easier for users to find specific materials in their databases.
Yahoo has two programs: the first is
Directory Submit, which allows people to pay a fee (currently $299 each year) to have their sites included more quickly. There is also
Search Submit of various types, in which people can pay a fee and/or a "pay per click" fee.
Google has a similar program, called
AdWords in which people can pay to make their sites more visible. When people search a word that is also used as an
adword, additional links appear.

There is also a program called
AdSense, where a person can pay to have Google advertisements appear automatically on their own websites. For an example, see
ApartmentRatings.com.
Can the Search Results be Manipulated?It should be very clear by now that many people would like to
manipulate the search results. People are doing this, and this is how it is done. Since we have seen that the algorithms work by
citations, it only makes sense that if someone put up enough webpages that link to a
specific webpage, then the latter webpage would come up
much higher in the results. This has happened many times, and can lead to some
very strange results.
Political examples of this are known as
"Google-bombing," and the most famous example is the result for "
miserable failure."
(This link searches Yahoo). The result is very strange: the
#1 link goes to the official White House page of
President George W. Bush. Yet, if you search the page at the White House, you will
not find either word in the page. Why does this happen?
As we saw above, the Google search engine gathers millions and millions of web pages on the World Wide Web, and gives everything a
"vote" for specific pages. It does this by using the
text that links to a page to describe the
page. If many pages use the
same text to describe a single page, the search engine begins to add everything up.
The following illustration shows that the text on 5 pages equals the page that they link to. Therefore, the text
"miserable failure" equals the page.
It doesn't matter what other pages say.
If we return to the
miserable failure result in Yahoo and scroll down the page, we will see links to other pages as well. As of this writing, #5 is
President Jimmy Carter, #10 is
Michael Moore and #17 is
Senator Hillary Clinton. Obviously, we are seeing a battle taking place in the search engines among backers of different political opponents to see who can be the
#1 "miserable failure."