![]()
The search engine consists of two parts, a robot, and the search program
that is activated by one of these pages. The robot starts late at night when
usage is low and looks for links in each page on the system. Eventually,
it covers every one of the several thousand HTML files that is somehow linked
to the main index.html page. It intentionally excludes pages
that are part of a user's personal Web space rather than the main system.
This method is very similar to that used by the Web-wide search engines like
Alta Vista and Web Crawler.
From this process the robot generates a database of all the words contained in all of the pages. The search engine script then looks in this database and finds all the pages containing the words or phrases you asked for.
Because the robot only operates once per day, any pages that are added to the system since it last ran won't come up as the results of a search.
![]()
The most basic search is done by just entering one more words, separated by spaces. The search engine will give you a list of pages containing words which begin with these letters. For instance, if you enter food potato, the request matches pages having the words food, potato, foodstuff, or potatoes somewhere in them.
In order to prevent the search engine from considering words that only start with a word you are looking for, you can enclose the word in double-quotes. If you enter "food", only pages containing that exact word will match; pages with foodstuff will not.
If you want to limit the search to an exact phrase, you can enter it in double-quotes as well. But beware that this type of search can often take over a minute to complete.
A couple of notes about what you enter:
![]()
The results of a search are organized into three columns:
| Selections: MOUNT HOLYOKE COLLEGE | /colls/mhc/survey.htm | |
| Selections: UMASS-AMHERST | /colls/umass/index.htm |
The first column is a gauge which indicates how well the particular page matches your query. Since the results are sorted with the best matches first, the first entry will always be a completely red bar (text-based browsers show this as 100%). Other matches are expressed as a percentage of the best match.
The search engine calculates how the worth of a match based on the number of times the term appears on the page and where the term is located. For instance, a term contained in a page's title or in a large text header is considered to be more important than one in the body of a page.
The page's title, if any, is taken from the HTML
<TITLE> tag. The link gives the full path of the matching
page, and provides you with a link you can click on to go there.
Below the matches, a number of statistics are given, for example:
FOOD=193 FIGHT=224
377 matches total, first 100 available, 1-20 shown.[ Next 20 matches ]
This display means that the word FOOD was found on 193 pages, and the word FIGHT was found on 224. The total number of pages containing either word is 377. In order to conserve system resources, the search engine will only show you the first 100 matches, so this line informs you that the first 20 are now being shown. To go to the next 20 best matches, click on the link provided.
By default the search shows 20 matches at a time, but you can change this by selecting a different value from the popup list provided.
![]()
There may be times when you want to make sure that a search only matches one word and a second. For this case, use the format word1 +word2. For example, entering food +potato will only match pages which contain both of these words.
By using a - you can achieve the opposite effect, matching
one word and not the other. Entering food
-potato will match pages which contain food, but
not those which contain both food and potato.
Using the modifier url: you can force the search to only match those pages whose location (URL) begins with a certain path. url: should be followed by the absolute path of interest, without http://www.mtholyoke.edu at the beginning. A url: is always treated as an "and" operation, meaning that a search for food url:/offices/comm/csj is the same as one for food +url:/offices/comm/csj. Both searches look for pages with the word food in issues of the College Street Journal.
![]()
word1 word2 Matches words beginning with either
word1 or word2
"word" Matches the word exactly
"word1 word2" Matches the exact phrase (can be very slow)
word1 +word2 Matches words beginning with
word1 and word2
word1 -word2 Matches words beginning with
word1 and not word2
word1 url:word2 Matches words beginning with
word1, but only on pages whose URLs start with
word2
mary Matches Maryland, Mary Lyon, or Mary Smith "mary lyon" Matches Mary Lyon "mary" -smith Matches Mary Lyon, but not Mary Smith or smithie "mary" +lyon Matches Mary Lyon or Lyon, Mary mary +lyon Matches Mary Lyon; Lyon in Maryland; or Lyon, Mary lyon url:/colls/mhc/ Matches pages about Mount Holyoke which mention Lyon
![]()
![]()
© 1997 Five Colleges, Inc.
Send comments and questions to Peter Nelson (pnelson@mtholyoke.edu)