True or false: (1) The Internet is an incredible research resource overflowing with valuable information. (2) The Internet has become a labyrinth of hard-to-find data and dead-end home pages replete with redundant, often useless information.
If you answered "true" to both, you're right-even though the two answers seem to contradict each other. While the Net is a labyrinth with dead ends and lots of junk, software tools are available to bypass the useless and deliver the essential information you're looking for. All this can be done with a few mouse clicks; the trick is to know what to click on.
Search tools (also known as search engines) are available free on the Net, and this article helps you pick the right one for your particular research. All search tools are linked to databases that essentially are giant indexes of much of the information available on the Net. Accessing such indexes is easy: Just click on the Web site of the search tool; there is no software to download or configure.
While most Internet providers usually default to one search tool when you evoke a search, you can always change that default to any other search engine. And if you want to locate other search tools, just type "search tool" as your key words in whatever engine your software defaults to and it will find the others.
Two ingredients go into making a good search engine: a large, up-to-date and comprehensive index and a database search method that is fast, customizable and designed so it doesn't bring up too many sites not relevant to your search. Be aware, however, that no search engine is perfect: Nearly every search will generate some irrelevant data, but the better engines are more discriminating.
As you can imagine, indexing is a herculean task: thousands upon thousands of addresses (also called URLs, pronounced either U-R-Ls or earls) plus millions of pages of data must be scanned, formatted and incorporated into the database index. With rare exceptions, the job is done exclusively by computer; as a result, sometimes mistakes are made and words are indexed incorrectly. But considering the millions of words involved, all the search tools generally do the job relatively well.
The most popular search tools and their Web addresses are
And now for a closer look at these popular search tools.
This service claims its index has over 50 million pages on 476,000 servers and 4 million articles from 14,000 Usenets (Internet jargon for newsgroups). AltaVista says users access the service more than 28 million times every weekday. Be aware that some hyperbole is common among search engine vendors, and many of their claims are hard, if not impossible, to substantiate because the Net is free form and uncontrolled. But it's clear that AltaVista does have a large index and it's a popular service.
Unlike Yahoo (the oldest and probably the best known search engine; see review), which indexes the key descriptive words of home pages, AltaVista indexes the complete text within pages. This is both an advantage and disadvantage.
Advantage: You can complete an exhaustive search, including words in a certain order. For example, if you want items mentioning Great Plains Software, most search tools will return finds with both great and plains. AltaVista allows you to specify the order of the words-in effect, acting as a filter.
Disadvantages: Unless you are very specific, you probably will be flooded with data. And because AltaVista's indexing is so comprehensive, it needs more time to index new home pages on the Net; as a result, some very new ones don't get into the database immediately.
With so many search engines available, it was bound to happen: a search engine that searches the other search engines. Cyber411 is what's called a "parallel" search engine, which means it contacts 15 other search engines to do the actual work and then collates the results. Cyber411 also eliminates duplicate entries from the various search engines.
Each of its finds is hyperlinked to related sites and is earmarked by the engine that generated the location. However, unlike most of the other engines, it does not provide additional summary information about the found sites.
When we tested the site with a word string query, we found that some of the sites Cyber411 found didn't contain that word string-in other words, it located information we did not want. While such errors are annoying, it does not take away from the comprehensive search power of Cyber411.
Excite also says it has 50 million Web pages indexed in its database. When you type a description of what you're looking for in an Excite search form, it scrutinizes the full text of 50 million Web pages. In addition to listing finds, Excite is able to estimate how relevant each one is to your initial request. And then it goes a step further: If you click on a button that says "More like this," you can track down more sites that are similar to any of the sites just found. That's particularly handy because a first-time search usually discloses a smattering of information a bit off the mark. But with this feature, if one of the search results looks like the closest match, then the "More like this" option delivers more results similar to that near hit.
Excite claims to work at twice the speed of its competition, but, while it is fast, our tests couldn't confirm the specific speed claim. Like most other search engines, Excite lists about 10 search results at a time, in decreasing order of confidence that it has matched your request. Each result consists of a title, a URL and a brief summary of the found page. By adding a ^ symbol and a number value to the end of a word, a user can command the engine to focus its search more intensely on that word in a string of search words; in general, that usually results in a higher success rate. Here's how that works: Say you want to search the following: dog care grooming . By adding ^3 after the word grooming, you will order the search engine to give three times more emphasis to it than to the other words in the string. So while the words dog and care remain search targets, they do not get as much attention from the search engine as grooming .
Something new is Excite Live (live.excite.com). Evoke that page and Excite will pop up a series of questions about what news you want to track. Once you profile your interests, Excite Live creates a summary of just that news. Each summary is hyperlinked, so if you want to dig deeper, one click will take you to the full story.
You can make Excite the default search tool with a simple click on a button on its home page.
In its May 1996 issue, the magazine Internet World tested many search engines and praised Infoseek as delivering the most relevant results. Fortune also rated Infoseek highly.
Infoseek has a unique indexing system: It keeps track of the order in which words appear on pages and in articles, so you can specify phrases or groups of words in exact order or just in proximity. You also can interact with the search engine to improve your results. When you find a search result that's interesting to you, just click the Similar Pages link. Infoseek will scour the Internet for pages with similar contents.
One of the newer search engines, Infoseek Ultra claims to be the next generation of search technology. It can perform 1,000 queries per second on a database of tens of millions of documents. The engine uses a proprietary search formula that can merge results from multiple searches but with more accuracy than Cyber411.
Infoseek Ultra has an enormous index. It lists over 80 million URLs and claims to have indexed the full texts of over 50 million. More important, Infoseek Ultra indexes only what its programmers consider to be meaningful URLs. With today's search technology, you often have to deal with dead hyperlinks (pages that contain no data) or duplicate pages-making the search difficult and frustrating. Infoseek Ultra is relatively effective in eliminating these pages.
The engine also uses automatic name recognition, which means it can recognize that a word in a query is a person's name and finds only documents containing the name. Many other engines require that you use special typographic symbols, such as quotation marks or capitalization, to filter for name recognition.
This search engine is similar to Yahoo (listed below) but with a few extras. For example, if the user initiates a search for an accounting standard, say, Lycos will not only find the specific site in which the standard is listed but it also will search the text for any mention of the standard in other sites-in effect, doing a global search for the standard.
Lycos publishes a list of what its managers consider the best sites on the Net-a handy reference for new surfers. It also maintains a multimedia catalogue of the Web for sounds and pictures (all of which can be imbedded in files for illustration or emphasis), hyperlinks to information by city and even the addresses of domain servers (the specific computers that serve Internet addresses, such as the American Institute of CPAs home page: www.aicpa.org).
When a user enters key words for searches, Lycos not only displays the addresses located but also provides a summary of what's available in each site. In addition, the engine estimates how relevant a located site is to your search; however, sometimes its estimates are not quite on the mark. Further, if in reading the summary it's clear that the listed results are nearly, but not quite, on target, you can use Lycos to search for related sites by effectively narrowing the search field; in that case, Lycos will not repeat URLs derived from the first search-a time-saving bonus.
Lycos has a handy feature called Remote Control . When you select it, a small window appears on the screen and remains active during your search. This feature allows you to work within the search engine while viewing Web sites.
Another nice feature is Road Maps . Not only can the feature deliver driving directions and a road map between any two U.S. locations but also if you provide it with the address of a domain server-say, www.aicpa.org-the following appears on your screen:
This online guide includes original editorial content, a directory of Internet sites that Magellan's staff has reviewed and rated, a vast database of yet-to-be-reviewed sites and a powerful search engine. If you browse the Magellan topics listed on its home page or perform a search, a list of sites that matches your area of interest pops up.
Magellan does a good job of keeping up with the expanding Web. It reviews and rates thousands of new sites each week. Magellan has been translated into French and German and soon will become available in other languages as well.
This search engine is owned by America Online, which uses it as its default search tool. WebCrawler has some unique features. For example, it can use Boolean search syntax-a high-tech language used by advanced database users. If you're familiar with the syntax, you get a bit more control over the search operation. It's possible, for example, to limit search results to just home page sites or just sites with summaries. Also, the number of search results can be customized to any number you want; that's important, because some searches may overwhelm you by targeting hundreds of sites.
WebCrawler has another feature that is useful to users with their own home pages: a backward search function, which reports on who linked to your home page so you can keep track of visitors as a marketing tool.
The oldest and the best known Internet search engine is Yahoo. It presents a catalog of sites on the Web. Its listings are much like the yellow pages or an encyclopedia-showing many sites under subject categories or key words. It excels when searching for all sites by a key word or company name. For example, if the key word is Widget Manufacturing Co., Yahoo can be programmed to evoke only the home page for Widget, not all other references to the word widget .
Yahoo contains buttons on its home page that, when clicked, produce some very useful information. For example, there's Yellow Pages (to find a business), People Search (to find phone numbers and e-mail addresses of individuals), City Maps (in which you type in a street address, city, state and ZIP code to generate a street map of that area), Today's News , Stock Quotes and Sports Scores .
Another handy feature is Get Local . Upon typing in a ZIP code, Yahoo retrieves information specifically related to that locality. It typically includes weather, yellow page data, local news headlines and even local sport scores. In addition, Get Local provide hyperlinks to other pages that carry information specific to that area.
THE BOTTOM LINE
Many search tools are available on the Net. All of them are good-none is perfect. Try them out and see which works best for you. It may turn out that, depending on the type of search you're conducting, some work better than others. It's best to pick two or three and get to know them very well. Take the time to read about the tools' advanced search techniques; that information is available on the home pages. This will allow you to speed your searches and make them more effective. The time you spend getting to know the tools will pay handsome dividends.
WAYNE E. HARDING, CPA, is a vice-president of Great Plains Software, Fargo, North Dakota. A member of the American Institute of CPAs information technology research subcommittee, he is a former vice-president of the Colorado Society of CPAs.