True or false: (1) The Internet is an incredible research resource overflowing with valuable information. (2) The Internet has become a labyrinth of hard-to-find data and dead-end home pages replete with redundant, often useless information.
|
If you answered "true" to both, you're right-even though the two answers seem to contradict each other. While the Net is a labyrinth with dead ends and lots of junk, software tools are available to bypass the useless and deliver the essential information you're looking for. All this can be done with a few mouse clicks; the trick is to know what to click on.
Search tools (also known as search engines) are available free on the Net, and this article helps you pick the right one for your particular research. All search tools are linked to databases that essentially are giant indexes of much of the information available on the Net. Accessing such indexes is easy: Just click on the Web site of the search tool; there is no software to download or configure.
While most Internet providers usually default to one search tool when you evoke a search, you can always change that default to any other search engine. And if you want to locate other search tools, just type "search tool" as your key words in whatever engine your software defaults to and it will find the others.
Two ingredients go into making a good search engine: a large, up-to-date and comprehensive index and a database search method that is fast, customizable and designed so it doesn't bring up too many sites not relevant to your search. Be aware, however, that no search engine is perfect: Nearly every search will generate some irrelevant data, but the better engines are more discriminating.
As you can imagine, indexing is a herculean task: thousands upon thousands of addresses (also called URLs, pronounced either U-R-Ls or earls) plus millions of pages of data must be scanned, formatted and incorporated into the database index. With rare exceptions, the job is done exclusively by computer; as a result, sometimes mistakes are made and words are indexed incorrectly. But considering the millions of words involved, all the search tools generally do the job relatively well.
The most popular search tools and their Web addresses are
AltaVista | http://altavista.digital.com | |
Cyber411 | http://www.cyber411.com | |
Excite | http://www.excite.com | |
InfoSeek | http://info.infoseek.com | |
InfoSeek Ultra | http://ultra.infoseek.com | |
Lycos | http://www.lycos.com | |
Magellan | http://magellan.mckinley.com | |
WebCrawler | http://www.webcrawler.com | |
Yahoo | http://www.yahoo.com |
And now for a closer look at these popular search tools.
ALTAVISTA
This service claims its index has over 50 million pages on
476,000 servers and 4 million articles from 14,000 Usenets (Internet
jargon for newsgroups). AltaVista says users access the service more
than 28 million times every weekday. Be aware that some hyperbole is
common among search engine vendors, and many of their claims are hard,
if not impossible, to substantiate because the Net is free form and
uncontrolled. But it's clear that AltaVista does have a large index
and it's a popular service.
Unlike Yahoo (the oldest and probably the best known search engine; see review), which indexes the key descriptive words of home pages, AltaVista indexes the complete text within pages. This is both an advantage and disadvantage.
Advantage: You can complete an exhaustive search, including words in a certain order. For example, if you want items mentioning Great Plains Software, most search tools will return finds with both great and plains. AltaVista allows you to specify the order of the words-in effect, acting as a filter.
Disadvantages: Unless you are very specific, you probably will be flooded with data. And because AltaVista's indexing is so comprehensive, it needs more time to index new home pages on the Net; as a result, some very new ones don't get into the database immediately.
CYBER411
With so many search engines available, it was bound to happen: a
search engine that searches the other search engines. Cyber411 is
what's called a "parallel" search engine, which means it
contacts 15 other search engines to do the actual work and then
collates the results. Cyber411 also eliminates duplicate entries from
the various search engines.
Each of its finds is hyperlinked to related sites and is earmarked by the engine that generated the location. However, unlike most of the other engines, it does not provide additional summary information about the found sites.
When we tested the site with a word string query, we found that some
of the sites Cyber411 found didn't contain that word string-in other
words, it located information we did not want. While such errors are
annoying, it does not take away from the comprehensive search power of
Cyber411.
EXCITE
Excite also says it has 50 million Web pages indexed in its
database. When you type a description of what you're looking for in an
Excite search form, it scrutinizes the full text of 50 million Web
pages. In addition to listing finds, Excite is able to estimate how
relevant each one is to your initial request. And then it goes a step
further: If you click on a button that says "More like
this," you can track down more sites that are similar to any of
the sites just found. That's particularly handy because a first-time
search usually discloses a smattering of information a bit off the
mark. But with this feature, if one of the search results looks like
the closest match, then the "More like this" option delivers
more results similar to that near hit.
Excite claims to work at twice the speed of its competition, but, while it is fast, our tests couldn't confirm the specific speed claim. Like most other search engines, Excite lists about 10 search results at a time, in decreasing order of confidence that it has matched your request. Each result consists of a title, a URL and a brief summary of the found page. By adding a ^ symbol and a number value to the end of a word, a user can command the engine to focus its search more intensely on that word in a string of search words; in general, that usually results in a higher success rate. Here's how that works: Say you want to search the following: dog care grooming . By adding ^3 after the word grooming, you will order the search engine to give three times more emphasis to it than to the other words in the string. So while the words dog and care remain search targets, they do not get as much attention from the search engine as grooming .
Something new is Excite Live (live.excite.com). Evoke that page and Excite will pop up a series of questions about what news you want to track. Once you profile your interests, Excite Live creates a summary of just that news. Each summary is hyperlinked, so if you want to dig deeper, one click will take you to the full story.
You can make Excite the default search tool with a simple click on a button on its home page.
INFOSEEK
In its May 1996 issue, the magazine Internet World
tested many search engines and praised Infoseek as delivering the
most relevant results. Fortune also rated Infoseek highly.
Infoseek has a unique indexing system: It keeps track of the order in which words appear on pages and in articles, so you can specify phrases or groups of words in exact order or just in proximity. You also can interact with the search engine to improve your results. When you find a search result that's interesting to you, just click the Similar Pages link. Infoseek will scour the Internet for pages with similar contents.
INFOSEEK ULTRA
One of the newer search engines, Infoseek Ultra claims to be the
next generation of search technology. It can perform 1,000 queries per
second on a database of tens of millions of documents. The engine uses
a proprietary search formula that can merge results from multiple
searches but with more accuracy than Cyber411.
Infoseek Ultra has an enormous index. It lists over 80 million URLs and claims to have indexed the full texts of over 50 million. More important, Infoseek Ultra indexes only what its programmers consider to be meaningful URLs. With today's search technology, you often have to deal with dead hyperlinks (pages that contain no data) or duplicate pages-making the search difficult and frustrating. Infoseek Ultra is relatively effective in eliminating these pages.
The engine also uses automatic name recognition, which means it can recognize that a word in a query is a person's name and finds only documents containing the name. Many other engines require that you use special typographic symbols, such as quotation marks or capitalization, to filter for name recognition.
LYCOS
This search engine is similar to Yahoo (listed below) but with a
few extras. For example, if the user initiates a search for an
accounting standard, say, Lycos will not only find the specific site
in which the standard is listed but it also will search the text for
any mention of the standard in other sites-in effect, doing a global
search for the standard.
Lycos publishes a list of what its managers consider the best sites on the Net-a handy reference for new surfers. It also maintains a multimedia catalogue of the Web for sounds and pictures (all of which can be imbedded in files for illustration or emphasis), hyperlinks to information by city and even the addresses of domain servers (the specific computers that serve Internet addresses, such as the American Institute of CPAs home page: www.aicpa.org).
When a user enters key words for searches, Lycos not only displays the addresses located but also provides a summary of what's available in each site. In addition, the engine estimates how relevant a located site is to your search; however, sometimes its estimates are not quite on the mark. Further, if in reading the summary it's clear that the listed results are nearly, but not quite, on target, you can use Lycos to search for related sites by effectively narrowing the search field; in that case, Lycos will not repeat URLs derived from the first search-a time-saving bonus.
Lycos has a handy feature called Remote Control . When you
select it, a small window appears on the screen and remains active
during your search. This feature allows you to work within the search
engine while viewing Web sites.
Another nice feature is Road Maps . Not only can the feature deliver driving directions and a road map between any two U.S. locations but also if you provide it with the address of a domain server-say, www.aicpa.org-the following appears on your screen:
MAGELLAN
This online guide includes original editorial content, a
directory of Internet sites that Magellan's staff has reviewed and
rated, a vast database of yet-to-be-reviewed sites and a powerful
search engine. If you browse the Magellan topics listed on its home
page or perform a search, a list of sites that matches your area of
interest pops up.
Magellan does a good job of keeping up with the expanding Web. It reviews and rates thousands of new sites each week. Magellan has been translated into French and German and soon will become available in other languages as well.
WEBCRAWLER
This search engine is owned by America Online, which uses it as
its default search tool. WebCrawler has some unique features. For
example, it can use Boolean search syntax-a high-tech language used by
advanced database users. If you're familiar with the syntax, you get a
bit more control over the search operation. It's possible, for
example, to limit search results to just home page sites or just sites
with summaries. Also, the number of search results can be customized
to any number you want; that's important, because some searches may
overwhelm you by targeting hundreds of sites.
WebCrawler has another feature that is useful to users with their own home pages: a backward search function, which reports on who linked to your home page so you can keep track of visitors as a marketing tool.
WAYNE E. HARDING, CPA, is a vice-president of Great Plains Software, Fargo, North Dakota. A member of the American Institute of CPAs information technology research subcommittee, he is a former vice-president of the Colorado Society of CPAs.