Building, Configuring, and Using Isearch-cgi

Erik Scott, Scott Technologies, Inc.

Isearch-cgi is an add-on for Isearch. Isearch-cgi lets you do all the cool stuff that Isearch does, only you can do it from a page on the World Wide Web. This means your textbase can be accessed by anyone in the world who has a web browser (and these days, that's pretty much everyone).

It's important to understand that Isearch-cgi is almost useless by itself. You have to have already installed Isearch. This document will describe Isearch-cgi version 1.04. To use this version, you'll have to have Isearch version 1.13 or later installed.

INSTALLATION:

To install Isearch-cgi requires that you have already gotten and built Isearch. If you haven't, get it now, build it, and learn to use it. The Isearch package is available for anonymous ftp from ftp.cnidr.org in the directory /pub/NIDR.tools/Isearch.

From this point on, we're going to make some assumptions:

  1. You have Isearch, and it's installed as /local/project/Isearch-1.13
  2. You know how to use (at least) Isearch and Iindex
  3. You already have NCSA's httpd installed (a "web server")
  4. You pretty much just took the defaults when you installed httpd, like placing your .html files in /usr/local/etc/httpd/htdocs.

Note that if you're using one of Netscape's web servers, these instructions will probably be close, but not quite precisely correct. You were warned.

So, let's get Isearch-cgi:


sti-gw% ftp ftp.cnidr.org

Connected to kudzu.cnidr.org.

220 kudzu FTP server (Version wu-2.4(1) Sun Jan 1 17:43:49 EST 1995) ready.

Name (ftp.cnidr.org:escott): anonymous

331 Guest login ok, send your complete e-mail address as password.

Password:

230- Welcome to kudzu.cnidr.org

230- 

230- ftp logged in from sti-gw.sti-ext.com at Thu May  2 22:57:30 1996

230- 

230- There are currently 20 users of the maximum 25 logged on.

230- 

230- If you have trouble using this experimental FTP client, try including

230- a dash in front of your username (as your login password).  This will

230- disable informational messages such as these.

230- 

230- Comments???  Send mail to `bmv@k12.cnidr.org`

230-

230-

230 Guest login ok, access restrictions apply.

ftp> cd /pub/NIDR.tools/Isearch-cgi

250-Please read the file README

250-  it was last modified on Thu Feb  1 15:44:18 1996 - 91 days ago

250 CWD command successful.

ftp> binary

200 Type set to I.

ftp> get Isearch-cgi-1.03.tar.Z

200 PORT command successful.

150 Opening BINARY mode data connection for Isearch-cgi-1.03.tar.Z (22896 

bytes).

226 Transfer complete.

local: Isearch-cgi-1.03.tar.Z remote: Isearch-cgi-1.03.tar.Z

22896 bytes received in 9 seconds (2.5 Kbytes/s)

ftp> quit

221 Goodbye.

What we just did was this: used ftp to connect to ftp.cnidr.org, logged in as "anonymous", typed our email address for a password (which of course didn't show up here), and changed directories to where Isearch-cgi is kept. Then we made sure we were using "binary" mode, and we got the distribution.

Now we need to uncompress the distribution:


sti-gw% uncompress Isearch-cgi-1.03.tar.Z

This creates a file named "Isearch-cgi-1.03.tar". Now, we'll move that tar file to a good place to work from:


sti-gw% mv Isearch-cgi-1.03.tar /local/project

sti-gw% cd /local/project

Finally (for now) we'll unpack that tar file:


sti-gw% tar xf Isearch-cgi-1.03.tar

sti-gw% cd Isearch-cgi-1.03

We're now ready to start the main part of building Isearch-cgi.

The first thing to do is to edit the file "Makefile". This file contains configuration information that Isearch-cgi needs in order to find Isearch and the cgi-bin directory that your web server is using. Basically, load the Makefile in your favorite editor and follow the directions. At one point, you'll see a line that says "That's all! Type 'make'". Don't edit below that line unless you really know what you're doing.

Now it's time to type "make". Don't be surprised if the compiler prints some warnings: no one is perfect. If all goes well, the Makefile will print:


Welcome to CNIDR Isearch-cgi!

Read the README file for configuration and installation instructions

Which is pretty sound advice, even if you're armed with this guide, since small details change from time to time.

At this point, you have built "isrch_fetch", "isrch_srch", and "search_form". You'll need to build two shell scripts now, and you'll do it with the "Configure" script provided. Configure will need one argument, the directory where your Isearch indexes are stored. The Isearch indexes are the files created when you run Iindex, and the name was set by the "-d" option to Iindex. We're going to assume you have an index named "tester" in the directory "/local/project/Isearch-1.13/":


sti-gw% Configure /local/project/Isearch-1.13

That created the scripts "isearch" and "ifetch". Copy these two scripts to wherever you put your cgi-bin applications. This will quite likely be /usr/local/etc/httpd/htdocs/cgi-bin.

Now, following the outline of the README, we'll take a moment to make an index to some intersting files:


sti-gw% cd /local/project/Isearch-1.13

sti-gw% cd bin

sti-gw% Iindex -d /local/project/Isearch-1.13/tester -t sgmltag /etc/motd

That made a searchable index of the login banner. Boring example, yes, but one that anyone can handle.

The one remaining step is to make a web page that contains the buttons and text fields and so forth we need to actually do the searching. The program "search_form" will do that for us. Here's a simple example:


sti-gw% search_form /local/project/Isearch-1.13 tester > form.html

This creates a form named "form.html" that knows to use the "tester" textbase in the "/local/project/Isearch-1.13" directory. Copy form.html to your httpd document directory (probably /usr/local/etc/httpd/htdocs).

Now point your favorite web browser at that page (probably you'll use a URL like http://myhost.mydomain.com/form.html). You should see a form with blanks to fill in for search terms and a button that says "Submit Query". When you click on that button, the query will run and you'll get your search results. By default, the page you get will have brief "headlines" for the matching documents, and you'll be given links to click on to see the full document if you want to.

There is a chance that you will see a message in your web browser after searching that says something to the effect that it can't read "/cgi-bin/isearch". If this happens, check the entry for "ScriptAlias" in srm.conf, probably as /usr/local/etc/httpd/conf/srm.conf. Make sure it contains the fully qualified path name to your cgi-bin directory.

That's pretty much the whole game, the rest is just details.

DETAILS

As promised, here's the rest of the story.

More Advanced Search Forms:

First, the simple search page, form.html, that we created above is really simple. It allows you to enter three search terms, and they'll all be "or"-ed together at search time. Not all that wonderful. No wonder it's called a "simple" search form. Fortunately, there are two other kinds of search forms, "boolean" and "advanced".

A Boolean search form lets you specify two search terms and whether they are "and"-ed, "or"-ed, or "andnot"-ed. To generate that kind of form, use the "-boolean" option to search_form:


sti-gw% search_form -boolean /local/project/Isearch-1.13 tester >form2.html

You can also create an advanced search form. This allows you to type free-form, infix boolean queries, like "((cheese and wine) or caviar) andnot sherry". To generate this kind of page, use:


sti-gw% search_form -advanced /local/project/Isearch-1.13 tester >form3.html

Better Looking Forms:

The search forms that are generated are pretty plain. You'll probably want to add button bars and pictures of your company headquarters and so forth. Feel free. Go nuts.

The search results page is generated by the code in isrch_srch.cxx. Look at the "cout" statements and get a feel for what is going on, and then hack away. You can make the output as simple or as fancy as you'd like. You can have a little Java animation of yourself pointing at your favorite document when it gets found, for instance. Or color code the hits, red for the highest score through purple to blue for the lowest. Again, go nuts.

Fast, Clean Links:

You can also modify the search form so that fetches aren't done through ifetch, but go straight to the html files that were indexed. This assumes, of course, that the files you indexed were part of your normal htdocs tree. If not, you're out of luck. But if you just indexed your web site, add the line:


<input name="HTTP_PATH" type=hidden value="/path/to/http/docs">

to your search form (like form3.html, above, for instance). Make sure you edit the pathname, though. This technique will make the Isearch-cgi results point to the real files instead of always going through ifetch. This is good for two reasons:

  1. It protects the integrity of the link. This is nice from a philosophical standpoint.
  2. It is much faster and places much less load on your server. This is nice from a job security standpoint.

That concludes the Isearch-cgi Users' Guide. Make sure you subscribe to the Isite mailing list for the most current information. The directions for doing this are in the file "README". I won't repeat them here because they might very well change in the near future.