Five College Archives Digital Access Project

Search Engine Help

....................................................................................................

How the search engine works

The search engine consists of two parts, a robot, and the search program that is activated by one of these pages. The robot starts late at night when usage is low and looks for links in each page on the system. Eventually, it covers every one of the several thousand HTML files that is somehow linked to the main index.html page. It intentionally excludes pages that are part of a user's personal Web space rather than the main system. This method is very similar to that used by the Web-wide search engines like Alta Vista and Web Crawler.

From this process the robot generates a database of all the words contained in all of the pages. The search engine script then looks in this database and finds all the pages containing the words or phrases you asked for.

Because the robot only operates once per day, any pages that are added to the system since it last ran won't come up as the results of a search.

....................................................................................................

Simple searches

The most basic search is done by just entering one more words, separated by spaces. The search engine will give you a list of pages containing words which begin with these letters. For instance, if you enter food potato, the request matches pages having the words food, potato, foodstuff, or potatoes somewhere in them.

In order to prevent the search engine from considering words that only start with a word you are looking for, you can enclose the word in double-quotes. If you enter "food", only pages containing that exact word will match; pages with foodstuff will not.

If you want to limit the search to an exact phrase, you can enter it in double-quotes as well. But beware that this type of search can often take over a minute to complete.

A couple of notes about what you enter:

....................................................................................................

Search results

The results of a search are organized into three columns:

Match Gauge
Page Title
Link
100%
Selections: MOUNT HOLYOKE COLLEGE /colls/mhc/survey.htm
42%
Selections: UMASS-AMHERST /colls/umass/index.htm

The first column is a gauge which indicates how well the particular page matches your query. Since the results are sorted with the best matches first, the first entry will always be a completely red bar (text-based browsers show this as 100%). Other matches are expressed as a percentage of the best match.

The search engine calculates how the worth of a match based on the number of times the term appears on the page and where the term is located. For instance, a term contained in a page's title or in a large text header is considered to be more important than one in the body of a page.

The page's title, if any, is taken from the HTML <TITLE> tag. The link gives the full path of the matching page, and provides you with a link you can click on to go there.

Below the matches, a number of statistics are given, for example:

FOOD=193 FIGHT=224
377 matches total, first 100 available, 1-20 shown.

[ Next 20 matches ]

This display means that the word FOOD was found on 193 pages, and the word FIGHT was found on 224. The total number of pages containing either word is 377. In order to conserve system resources, the search engine will only show you the first 100 matches, so this line informs you that the first 20 are now being shown. To go to the next 20 best matches, click on the link provided.

By default the search shows 20 matches at a time, but you can change this by selecting a different value from the popup list provided.

....................................................................................................

Advanced searches

There may be times when you want to make sure that a search only matches one word and a second. For this case, use the format word1 +word2. For example, entering food +potato will only match pages which contain both of these words.

By using a - you can achieve the opposite effect, matching one word and not the other. Entering food -potato will match pages which contain food, but not those which contain both food and potato.

Using the modifier url: you can force the search to only match those pages whose location (URL) begins with a certain path. url: should be followed by the absolute path of interest, without http://www.mtholyoke.edu at the beginning. A url: is always treated as an "and" operation, meaning that a search for food url:/offices/comm/csj is the same as one for food +url:/offices/comm/csj. Both searches look for pages with the word food in issues of the College Street Journal.

....................................................................................................

Summary

Examples:
   mary                  Matches Maryland, Mary Lyon, or Mary Smith
   "mary lyon"           Matches Mary Lyon
   "mary" -smith         Matches Mary Lyon, but not Mary Smith or smithie
   "mary" +lyon          Matches Mary Lyon or Lyon, Mary
   mary +lyon            Matches Mary Lyon; Lyon in Maryland; or Lyon, Mary
   lyon url:/colls/mhc/  Matches pages about Mount Holyoke which mention Lyon

....................................................................................................

Perform a search

....................................................................................................
© 1997 Five Colleges, Inc.
Send comments and questions to Peter Nelson (pnelson@mtholyoke.edu)