The Complete User's Guide for the WN server

Version 2.5.0

The Complete User's Guide for the WN server

This virtual document contains all the sections of the WN User's Guide. It is provided to allow downloading and printing of the guide. The latest on line version of this guide can be found at http://hopf.math.northwestern.edu/docs/manual.html.

User's Guide for the WN server

Chapter 1: An Overview of the WN Server

Chapter 2: Installation and Setup of WN

Chapter 3: Creating Your WN Data Directory

Chapter 4: Security on the WN Server

Chapter 5: Setting Up Searches on the WN Server

Chapter 6: Parsed Text and Server Side Includes on the WN Server

Chapter 7: User Defined Fields on the WN Server

Chapter 8: Filters on the WN Server

Chapter 9: Ranges on the WN Server

Chapter 10: Limiting Access to Your WN Hierarchy

Chapter 11: Users' Hierarchies and Personal Home Pages on the WN Server

Chapter 12: Multi-homed or Virtual Servers on the WN Server

Chapter 13: WN Utility Programs

Chapter 14: WN Auxiliary Modules

Chapter 15: Clickable Images and Imagemap Files on the WN Server

Chapter 16: Using CGI Programs on the WN Server

Chapter 17: Support for the WN Server and Thanks

Appendix A.1: Command Line Options for the WN Server

Appendix A.2: Command Line Options for WN Server `wndex` Utility

Appendix B: Index File Directives for the WN Server

Appendix C: Parsed Document Syntax for the WN Server

Appendix D: CGI and Other Environment Variables for the WN Server

Appendix E: Creating User Home Pages for the WN Server

Appendix F: MD5 Digests of Recent WN releases

Appendix G: Configuration Macros for WN

Appendix H: Default Index Cache Files

Appendix I: How the Index to WN User's Guide Server Works

Appendix J: Colophon for the WN User's Guide

Index of the WN User's Guide

Chapter 1

An Overview of the WN Server

An HTTP server should do more than just serve files. It should play an active role in both navigation and presentation issues. It is my hope that this server provides better tools for the creative webmaster.
- John Franks

WN is a server for the Hypertext Transfer Protocol HTTP/1.1. Its primary design goals are security, robustness, and flexibility, in that order. One of its objectives is to provide functionality usually available only with complex CGI programs without the necessity of writing or using these programs. (Of course CGI/1.1 is fully supported for those who want it). Despite this extensive functionality the WN executable is substantially smaller than the CERN httpd, NCSA httpd or Apache servers.

WN was planned with a focus on serving HTML documents. This means such things as enabling full text searching of a single logical HTML document which may consist of many files on the server, or allowing users to search all titles on the server and obtain a menu of matching items, or allowing users to download a total logical document for printing which, in fact, consists of many linked files on the server. All of these are done in a way which is transparent to the user (and largely transparent to the maintainer)! The "User's Guide for the WN Server", which this chapter is part of, provides a good example of many of these features.

Another feature not found in many other servers is conditionally served text. Often a server maintainer may wish to serve different versions of a document to different clients. By adding simple HTML comments to documents and marking those documents to be "parsed" by the server, the maintainer can arrange that different sections or entirely different documents are sent to clients, based on such things as the client's domain name, IP address, browser type, browser "Accept" header, "Cookie header", etc. This feature is described in more detail in the section "Conditional Text: If, Else, and Endif" in this guide.

But these are only examples of many new tools WN makes available to webmasters.

The design and security mechanisms of WN differ substantially from those of the httpd servers available from CERN and NCSA so a brief description of how they work is useful.

1.1 How WN Works

Files served by an HTTP server may have many attributes relevant to their serving. These attributes include content-type, optional title, optional expiration date, optional keywords, whether the file should be parsed for server-side includes, access restrictions, etc. Some servers try to encode this information in ad hoc ways, in a file name suffix, or in a global configuration file. The approach of WN is to keep this information in small databases, one for each directory in the document hierarchy.

The WN maintainer never needs to understand the format of these database files (named index.cache by default), but this format is very simple and a brief description will indicate how WN works. When the server receives a request, say for /dir/foo.html, it looks in the file /dir/index.cache which contains lines like:

file=foo.html&content=text/html&title=whatever...

If the server finds a line starting with "file=foo.html" then the file will be served. If such a line does not exist the file will not be served (unless special permission to serve all files in the directory has been granted). This is the basis of WN security. Unlike other servers, the default action for WN is to deny access to a file. A file can only be served if explicit permission to do so has been granted by entering it in the index.cache database or if explicit permission to serve all files in /dir has been given in the index.cache file in /dir. This database also provides other security functions. For example, restricting the execution of CGI/1.1 programs can be done on the basis of the ownership (or group ownership) of their index.cache files. There is no need to limit execution to programs located in particular designated directories. The location of a file in the data hierarchy should be orthogonal to security restrictions on it and this is the case with the WN server.

The index.cache database file has a number of other functions beyond its security role. Attributes of foo.html which can be computed before it is served and which don't often change are stored in the fields of the line starting file=foo.html. For example, the MIME content type "text/html" must be deduced from the filename suffix ".html". This is done once at the time index.cache is created and need not be done every time the file is served.

The title of a file is another example. With the WN server every file served has a title (even binaries) and optionally has a list of keywords, an expiration date, and other fields associated with it. For an HTML document the title and the keywords are automatically extracted from the header of the document and stored in fields of that file's line in its index.cache file. These are used for the built-in keyword and title searches which the server supports. The maintainer also has the option of adding his own fields to this database file. They could contain such things as document author, document id number, etc. These user defined fields can be searched with the built-in WN searches or their contents can be inserted into the document, on the fly, as it is served

So how are the index.cache databases created? Their format is quite simple and a maintainer is free to create them any way she chooses, but normally they are created by the utility wndex (pronounced "windex"). This program, which is part of the WN distribution, is designed to produce the index.cache file from a file with a friendlier format with the default name "index.wn". A very simple index.wn file might look like:

File=foo.html File=clap.au Title=Sound of one hand clapping File=hand Title=Picture of one hand clapping Content-type=img/png

Of course if the file hand were named hand.png the content-type line would not be necessary as wndex could deduce the type from the .png suffix. Likewise it is not necessary to give a title for foo.html because wndex will read the HTML header from that file and extract the title and perhaps other things like keywords and expiration date.

1.2 Features of WN

The WN server has several features which are not available with other servers or only available through the use of CGI/1.1 programs.

1.2.1 Searching

One of the design goals of WN is to provide the maintainer with tools to create extensive navigational aids for the server. A variety of search mechanisms are available.

Title searches: In response to the URL <http://host/dir/search=title> the server will provide an HTML form (automatically generated or prepared by the maintainer) asking for a regular expression search term. When supplied the server will search the index.cache files in /dir and designated subdirectories for a items whose titles contain a match for the search term. An HTML document with a menu of these items is returned.
Keyword searches: Like title searches except matches are sought in keywords instead of titles. Keywords for HTML documents are automatically obtained from <META> headers. For other documents (or HTML documents) they can be manually supplied in the index.wn file.
Title/Keyword search: Like the above except the match can be either in the keyword or the title.
User supplied field searches: Like keyword searches except matches are sought in user supplied fields. The user supplied fields can contain any text and are attached to a document by entering them in that document's record in the index.wn file. Their purpose is to include items like a document id number, or document author in the index.cache database. A field search could then produce all documents by a given author for example. Or using regular expressions in the search term produce a list of all documents whose id number satisfy certain criteria.
Context searches: Unlike the title and keyword searches this is a full text search of all text/* documents in one directory (not subdirectories). The returned HTML document contains a list of all the titles of documents containing a match together with a sublist of the lines from those documents containing the match. This provides one line of context for the match. For HTML documents the matched expression in each of these lines will be a highlighted anchor. Selecting one takes you to the document with your viewer focused on the matching location. The primary intent of this feature is to provide full text searching for an HTML "document" which might consist of a substantial number of files.
File context and grep searches: A file context search is just like a context search, except limited to a single file. The file grep search returns a text/html document containing the lines in the file matching matching the regular expression.
List searches: The server will search an HTML document looking for an unordered list of anchors linking to Web objects. The contents of each anchor will be searched for a match to the supplied regular expression. The search returns an HTML document containing an unordered list of those anchors with a match. This is quite useful with the wn_mkdigest utility which creates HTML documents to be searched in this way from files with internal structure like mail or news digests, mailing lists, etc.
Index searches: This is a mechanism by which arbitrary search engines can be linked to WN through a search-module. The server will provide the search term to the search-module and expects an HTML list of links to matching items to be returned.

All of the searching methods listed above except the index searches are built into the server and require no additional effort for the maintainer. They are simply referenced with URLs like <http://host/dir/search=context> where /dir is any directory containing files to be served and an index.cache listing them. Of course search permission can be denied for any directory or any file contained in that directory.

1.2.2 Parsed Text, Server-Side Includes and Wrappers

The WN server has extensive capabilities for automatically including files in one which is being served or "wrapping" a served file with another, i.e. pre-pending and post-pending information to a file being served. This latter is useful if you wish to place a standard message at the beginning or end (or both) of a large collection of files. For security all files included in a file or used as a wrapper for it are listed in that file's index.cache file. This combined with various available security options, like requiring that a served file and all its includes and wrappers have the same owner (or group owner) as the index.cache file listing them, provide a safe and productive Web environment.

One important application of wrappers is to customize the HTML documents returned listing the successful search matches. If a search item is given a wrapper the server assumes that it contains text describing the search and it merely inserts an unordered list of links to the matching items.

In addition to including files the output of programs may be inserted and the value of any user defined field in the index.cache database entry for a file may be inserted.

Also parsed text may conditionally insert items with a simple if - else - endif construct. based on Accept headers, User-Agent headers, Referer headers etc.

1.2.3 Filters

An arbitrary filter can be assigned to any file to be served. A filter is a program which reads the file and has the program output served rather than the content of the file. The name of the filter is another field in the file's line in its index.cache file. One common use of this feature is for on-the-fly decompression. For, example, a file can be stored in its compressed form and assigned a filter like the UNIX zcat(1) utility which uncompresses it. Then the client is served the uncompressed file but only the compressed version is stored on disk. As another example, you might use the UNIX nroff(1) utility, "nroff -man", as a filter to process UNIX man files before serving. There are many other interesting uses of filters. Be creative!

1.2.4 Ranges

An arbitrary range of a file can be served if the server is accessed via a URL like <http://host/dir/foo;lines=20-30> and file is any text/* document it will return a text/plain document consisting of lines 20 through 30 of file foo. This is very useful for structured text files like address lists or digests of mail and news. A WN utility called wn_mkdigest will produce an HTML document with a list of links to separate sections (line ranges) of the structured file. The wn_mkdigest utility is executed with two regular expressions as arguments: one to match the section separator and the other to match the section title. For a mail digest, for example, these could be "^From" and "^Subject:" respectively. Then the sections of the virtual documents would be delimited by a line starting with "From" and would have the message subject as their title. A similar mechanism provides byte ranges from files.

Chapter 2

Installation and Setup of WN

2.1 Installing the Software

Get the source file from http://hopf.math.northwestern.edu/ (U.S.) or http://www.wnserver.org/ (Europe) then uncompress it and untar it to make the WN source directory hierarchy. The file must be uncompressed with the GNU compression utility gzip(1) (or gunzip(1)). The resulting file wn.tar should be unpacked with the UNIX tar(1) utility using "tar -xvf wn.tar". The top level of the directory created by untarring this file contains several directories, including: wn, wndex, wnauth and docs.

2.1.1 Configuring WN

If your system supports perl, the quickest way to get your server configured is to run the perl program "configure" which is in the main source directory. Do this with the command:

perl configure

This program will ask you various questions, like what version of UNIX you are using and the path to the directory you want to be the wnroot of your data hierarchy.

Default answers are printed in square brackets [ ] so you can simply press return to enter that value. You can quit at any time by pressing Ctrl-C and nothing should be changed. If you want to try it once to see what the questions are, that is fine.

This program creates two files: config.h and Makefile which are customized based on the answers you gave. You may rerun this program as many times as you like. The first time you run the program the default values are those in the file config.h.dist. Subsequent times the default values for all question answers are taken from the most recent config.h you have produced (if it still exists in the top level directory).

An alternative to running this program is to copy the files Makefile.dist and config.h.dist to Makefile and config.h respectively and edit them manually. If you want to use some of the features which are not turned on by default like multiple IP interfaces, you will have to edit at least config.h. I recommend starting with the perl program and getting your server up and running. Then you can go back browse through config.h to see if there are things you want to change. If there are you will have to recompile but that takes only a few minutes.

Here are some of the questions you will be asked when you run configure. You will be given a list of supported operating systems and asked to pick the one you are using, e.g. SUNOS, SOLARIS2, AIX, LINUX, etc. You will be asked the complete path name of your data directory. You will also have to enter the names of the access and error log files you wish to use (they can be the same file). If you don't want logging or you want to use the UNIX syslogd(8) system utility (i.e. the -S option) then these should both be defined to be the empty string (i.e. a pair of double quotes with nothing between them like ""). If you specify the names of these log files then you must make sure that either these two files exist and are writable by the server or that they are files in a directory where the server has permission to create them.

Additional customizations in config.h are possible but should not be needed. These customizations require that you manually edit the config.h file. For example, there is a #define DEFAULT_URI line in the config.h file.

You may also customize the file Makefile in the top level directory. In particular you should do this if you wish to specify a C compiler other than the UNIX cc(1) utility (e.g. gcc) should be used for compiling. Also some systems require that special libraries for sockets, or whatever, be mentioned in the compile command. The configuration program attempts to do this, but I am working from user reports since I do not have access to most of the UNIX variations. If they are incorrect please let me know.

2.1.2 Building WN

In the top level directory do a make(1) to produce the server wnd, the stand-alone version wnsd and the utility wndex. This utility is used to produce index.cache files for use by the server. If the make proceeds without problem you should next do a "make install". This will strip the binaries and place them in the top level bin directory or whatever directory you specified when you ran the configure program.

If you specified a log file name or error log file name when you ran the configuration program or edited config.h you will need to make sure that these files exist and that they are writable by the user id under which the server will run. The best way to do this is create the files as root ("touch wn.log"), then change their ownership to the appropriate user ("chown nobody wn.log") and finally set the permissions appropriately ("chmod 644 wn.log"). An alternative is to create a directory in which these files will reside and make sure that the user nobody has permission to create files in this directory. Then the server will create the files with proper ownership and permissions.

2.2 Running the Server as a Stand-alone Daemon

You can now either run the server as a stand-alone daemon, the wnsd executable, or run under the UNIX inetd(8) system utility, the wnd executable. We first describe the stand-alone version. Run this with the command:

wnsd -p port [other options] wnroot

where "port" is the number of the port on which you wish the server to run. If this is a non-privileged port (i.e. > 1024) then wnsd can be run as an ordinary user. However, for privileged ports like 80 you must run the command above as root. If wnsd is run without the -p option it will use port 80 by default. If wnsd is run by root then when it starts up it will change its user id to the one set when running the configuration program or by editing the config.h file line containing #define USERID. Otherwise it will have all the permissions of the user who runs it.

The safest practice is to use the numeric UID of nobody for the USERID set in config.h (this is the default) and then start the server as root.

Note: on HPUX and perhaps other systems user nobody cannot be used. In this case just create a new user, say "www", with the fewest possible privileges and no shell.

The server needs to have root permissions to connect to a socket on a privileged port and listen for requests. But immediately after doing so it will change its user id to that of nobody and have minimal access permissions. In this situation the user nobody needs to have only read permission to your server data and should not own or have have write permission. In particular nobody should not have ownership or write access of the index.cache database file described in the chapter "Creating Your WN Data Hierarchy" of this guide.

2.3 Running the Server Under `inetd(8)`

The other way to run the server is to use it under the UNIX inetd(8) system utility. This is an efficient way to run the server if the load on it is relatively light (a few thousand hits per day) and the host on which it runs is used for other purposes. There are variations on how inetd(8) works from system to system so you may need to look at the man page for the UNIX inetd.conf(5) configuration file. Here's how it works under many systems (e.g. SunOS 4.1.3): Edit the file UNIX services(5) configuration file and create the line:

wnd 80/tcp

or replacing 80 by the port you wish to use. Then edit the file inetd.conf(5) and insert the line:

wnd stream tcp nowait nobody /path/wnd wnd

After the last wnd you can have optional arguments to turn on logging or use a different data directory. Some inetd(8) limit the number of arguments you may use so I prefer to use a small program in place of wnd here. My inetd.conf(5) line looks like:

wnd stream tcp nowait nobody /path/wn.rc wn.rc

and wn.rc contains only the two lines:

#!/bin/sh exec /path/wnd -t 202 -L /path2/logfile wnroot

It is important to run wnd as nobody (the fifth field in the inetd.conf(5) line above) or some other user with no special access privileges. If you are using an inetd(8) with without the capability to set UID on startup (e.g., Ultrix), you should define the group ID and user ID in config.h so that the program is not running as root (look for the #define USERID and #define GROUPID and set the values appropriately). It should never be necessary to run wnd under inetd(8) as root and to do so would be a serious mistake for maintaining security. Every attempt has been made to make wnd as secure as possible, even if it is run as root, however, no program accessible to remote users on the Internet can be assumed perfectly secure. See the chapter "Security on the WN Server" in this guide.

After editing the inetd.conf(5) and services(5) files you should find the process id number of the inetd(8) process and use the UNIX kill(1) utility to reload the configuration using "kill -HUP <process_id>". This must be done as root. You find the number "<process_id>" by using the UNIX ps(1) utility or by looking at the contents of the file (which you specified when you ran the configure program) in which the server stores this number.

2.4 Your Hostname: What's in a Name?

If the fully qualified domain name of your server is abc.com you might like to have your server known as www.abc.com or some other "vanity" name. For most purposes this is simply a matter of properly setting up Domain Name Service (DNS) on your system so that the system responds to the desired name.

Note: To use multiple vanity names for different IP addresses on a single server see the chapter "Multi-homed or Virtual Servers on the WN Server" in this guide.

There are a few instances, however, where the WN server does use its own hostname. Ideally, in my opinion, the server should do nothing with its hostname and not even need to know it. This is not possible for two reasons.

First, the CGI/1.1 protocol requires the server to pass its hostname to CGI programs in an environmental variable whenever those programs are run. Secondly clients often implement redirection so that it cannot handle relative but only complete URLs. (This is a mistake in my view, but one we have to live with.) Thus when a server redirects to another local document it must supply its own hostname. These are the only places WN uses hostname.

For most cases then, WN only uses it hostname when a redirection is done. This happens in several circumstances. The most common is when a request is made for a directory any the trailing '/' is left off of the URL.

So how does WN know its hostname? When you run the configure program you are queried for the value you want or you have the option of using a system call at the time the server is run. This value is placed into the config.h header file and compiled into your server. In the file config.h the #define WN_HOSTNAME macro is set by default to the empty string. If this is not changed the server will get its name from the UNIX gethostbyaddr(3) system call. If this is set to another string that string will be used. If you are using WN as a multi-homed server then you need to set different names for the different IP addresses. This is done in the file wn/vhost.h which you edit to set up the correspondence between IP addresses and wnroot directories.

2.5 Testing Your Setup

After compiling and setting up the software you can test it on a sample directory provided with the distribution. To do this first make a symbolic link in your wnroot data directory to the docs directory in the source distribution. The command "ln -s /your/src/dir/docs docs" executed in the wnroot directory should do this. If your system does not support symbolic links you can copy this directory and its subdirectories to your data directory temporarily.

Now you are ready to test your server installation on this directory. Try it with your favorite HTTP client. The URL should be:

http://localhost/docs/index.html

2.6 Shutting Down Your Server

If your are running under UNIX inetd(8) system utility as described above then to shut down the server first remove or comment out the line you created in the UNIX inetd.conf(5) configuration file. Then you should again find the process id number of the inetd(8) process and run the UNIX kill(1) utility using "kill -HUP <process_id>" where "<process_id>" is the process id number of inetd(8) just as you did to start WN.

If you are running wnsd, the stand-alone version of WN, you should find the process id number of the running wnsd by using the UNIX ps(1) utility or by looking at the contents of the file (which you specified when you ran the configure program or by using the -q option) in which the server stores this number. Then you run the UNIX kill(1) utility using "kill <process_id>" where "<process_id>" is the process id number of wnsd. If you started the server as root you should be root to kill it.

2.7 Managing Log Files

There are two ways to log WN transactions: dedicated log files or using the UNIX syslogd(8) system utility. We first describe dedicated log files.

Normally when you use WN you will keep two log files. The first is a log of all "normal" transactions and the second records error conditions or items which might require your attention. For example, if the server cannot find a file which your index.wn file indicates should be served it will log an error. The error log file can be the same file used for the normal transaction log. In general the difference between the two is that the error log gets information about anything which might require attention of the maintainer while routine transactions and errors which are simply user errors tend to go to the regular log. The intent is that a conscientious maintainer should keep an eye on the error log but need not read the (much larger) log of regular transactions.

There are two ways to tell the server the names of these files. The first is by supplying the file names when you run the configure program and then compiling these into your server. And the second is by supplying the file names on the command line when you execute the server. This is done with the -L option and the -l option to specify the transaction and error log files respectively.

For example, executing the command:

wnsd -L /path2/logfile -l /path3/error.log wnroot

will cause the server to use "logfile" and "error.log" as the log file and error log respectively. Of course, it is necessary for the server to have write permission to these files and execute permission on the directory containing them.

A good way to achieve this if the server is running as nobody is to create the files yourself and change their ownership to the user nobody. This can be done, for example, with the commands:

touch logfile /usr/etc/chown nobody logfile chmod 600 logfile

executed as root in the directory where the log file is to reside. The first of these commands creates the file "logfile". The second makes nobody the owner and the third gives nobody (and no one else except root) permission to read and write this file. You might want to allow others to read, but not write to the log file, or security of the log file might not be a concern.

Thus a program executed by the UNIX crond(8) system utility to rotate log files might look like:

cd /path2 mv logfile logfile.old touch logfile chown nobody logfile chmod 600 logfile kill -HUP `/bin/cat /path/to/wn.pid` chown maintainer logfile.old chmod 600 logfile.old

where /path/to/wn.pid is a file containing the processes id of the server created by using the -q option or by specifying this filename when the configure program is run. If neither of these has been done the stand-alone server, wnsd, will print its process id on the UNIX stdout(3) stream when it is run. If you are using wnd under inetd(8) there is no need to send the -HUP signal as the server must close this file after each transaction.

If you are using a system that supports the logrotate program, it provides a good way to handle logs. Here is an excerpt from a typical logrotate.conf file.

/var/log/wnsd/error_log { daily } /var/log/wnsd/access_log { daily postrotate /usr/bin/killall -HUP wnsd endscript }

This causes the logs to be rotated daily. The command /usr/bin/killall -HUP wnsd assumes that your system supports the killall command, in which case it finds the process id of the wnsd process and sends it the -HUP signal, as was done by the kill command. in the script above.

There are three formats which the server can use in writing its log files, each with three options concerning how much DNS lookup to do for the remote host. The two most common formats are "verbose" and "common log format". The verbose mode is essentially the common log format but with the user-agent, referrer, HTTP cookie, and virtual server nickname appended to the line for that transaction as well as better transaction error messages if necessary.

You can chose between verbose and common log formats by answering the relevant question when running the configure program before compilation (or by editing config.h).

To use the third format you need to use the -v command line option. When the server is invoked with the -v option it will write a log file in the format specified by the value of this option. The legal values for this option are "common", "verbose", and "ncsa". They cause the log file to be written in the so-called common log format, or WN's verbose format including user agent, referrer, X-Forwarded-For header, virtual server nickname, and cookies, or in the NCSA extended format which includes just referrer and user agent. When using verbose logging the nickname (enclosed in angle brackets) will be the last field of each log entry. If virtual hosting is not being used this field will be omitted.

More precisely a verbose log line begins with a normal "common" log format line and then adds the following:

<(pid/count) msg1: msg2> <user_agent> <referrer> <cookie> <X-forwarded-for> <nickname>

The punctuation characters, i.e., < > ( ) and :, will always be present in this order. The fields "msg1" and "msg2" may contain additional parentheses or colons. Hopefully none of the fields will contain the character '<' or the character '>', but "referrer", "cookie", "X-forwarded-for" and "nickname" are provided by the browser or the server maintainer so WN has no control over them.

The fields are as follows:

field	description
pid	Process id of the process serving the transaction.
count	n if this is the nth transaction of this (keepalive) connection.
msg1	Description of transaction. May be sent to user.
msg2	Description of transaction. Information NOT sent to user.
user_agent	From the HTTP user agent header.
referrer	From the HTTP referer (sic) header.
cookie	From the client's cookie header.
X-forwarded-for	From the client's X-Forwarded-For header.
nickname	Value assigned this virtual host by the maintainer (omitted if virtual hosting is not used).

The NCSA format will likely only be of interest if you want to use log processing tools which expect this format. If the -v option option is not specified the server will default to either the common log format or the WN verbose format depending on which was selected when the configure program was run. The utility wnv2c can convert verbose log files to log files in the shorter common log format.

The WN server does not send the UNIX stderr(3) stream output to the error log file, but leaves its default the terminal from which the server is invoked. This allows the maintainer to set it to a file of her choice, for example the error log, or leave it directed to the console window in which wnsd was invoked. To redirect it to a file called my.errs simply run wnsd with a command like "wnsd <options> 2>my.errs" if you are using a Borne-like shell like sh(1). The server itself sends very few things to stderr(3) -- only errors which it is impossible to put in the error log (like "Can't open error log file"). The real usefulness of redirecting stderr(3) comes when you are creating CGI/1.1 programs because their errors are typically sent to stderr(3) so you can easily view them rather than have them buried in a log file.

2.8 Trouble Shooting

If things are not working as they should here are some tips to help you isolate the problems.

If the compilation was successful you can check the server itself by executing it from the command line. If you use the command:

wnd wnroot

it should run and pause for input. Type the line:

GET /<ret>

and in response wnd should print the raw HTML of the index.html file in your top level directory (perhaps along with a message about not being able to open a log file). If instead you type:

GET /docs/overview.html<ret>

(and you still have the /docs subdirectory in your top level directory) the overview document should be sent to your screen. If this doesn't happen there should be an error message which may be helpful. Better error messages are placed in the log file so you may want run wnd again with the additional arguments "-L logfile" and then examine the contents of the log file. Or if you run "wnd -L /dev/tty" the log entries will be printed to your screen instead of being put in a file. If the server can't open a file, for example, the name of that file will be recorded in the log file. Check its permissions. Remember that all files that wnd serves must be world readable. More serious errors are put in a separate error log. So you might want to try the command "wnd -L file -l file2" and then type the GET requests described above.

If this succeeds you should run the server for real, either under inetd(8) or stand-alone. In order to use port 80 the server must be started by root. It will then switch to user nobody. It does this immediately after connecting to port 80, before it does anything else including opening its log file. If you get a message that the server cannot open its log file then either you have specified putting the log file in a directory where user nobody does not have permission to create files or you have specified an existing file which the server does not have permission to write.

After starting the server a useful test is to use the UNIX telnet(1) utility to connect to your server at port on which you are running. You should get a connection message and a pause for input. If you get a "Connection refused" message and you are running under inetd(8), it is likely there is a problem with your inetd(8) setup or for some reason your system can't find or can't execute the wnd binary. If you are using wnsd this message means that wnsd is not in fact running.

If you still have problems feel free to ask questions on the WN list server. There are many helpful people there. But it is a good idea to try the steps above first and to include the relevant log file messages with your request.

Chapter 3

Creating Your WN Data Directory

3.1 The `index.wn` file

In each directory of your data hierarchy you create a file called index.wn with information about each file you want to serve. The simplest index.wn file might contain the single line:

Attributes=serveall

which when properly processed will grant the server permission to serve any file in the directory (but not in subdirectories). For more information about this directive see the section on the serveall attribute below. A more elaborate index.wn file might look like the following:

Owner=mailto:webmistress@host.edu File=file.txt Title=This is a descriptive title for file.txt # This is a comment File=file2.html File=soundfile Title=This plays some sounds Content-type=audio/basic

The file contains four groups of lines called records. The first record (the single line starting Owner= in this example) describes properties of the directory and is called the directory record. It can be empty, but in general it is a good idea for the directory record to contain an owner line, like the one above, referring to the maintainer of the directory.

The remainder of this index.wn file has three file records describing three files, file.txt, file2.html and soundfile, in the directory which we wish to serve. The line starting with '#' is a comment. Wherever a '#' occurs the remainder of that line is treated as a comment (i.e. ignored).

The index.wn file is processed with the utility wndex (pronounced "windex") to produce a small database called index.cache containing information about this directory and its contents. Detailed information on the wndex utility is given below, but simply running it with no arguments in a directory containing an index.wn file will produce the index.cache file for that directory. This file contains all the information in the index.wn file plus additional information gathered automatically about the files to be served. In particular the index.cache file will list the names of the files given in the File= lines of the index.wn file. Any file on the server whose name is not listed in an index.cache file will not be served. This is the basis of WN security. For security reasons the server will refuse to use any index.cache file which is in reality a symbolic link to another file.

The index.cache database has a number of other functions beyond its security role. Attributes of the files listed in the index.wn file which can be computed before they are served and which don't often change are stored in the index.cache file. For example, the MIME content type of soundfile is read from the Content-type= line. The other files do not need such a line since wndex can deduce from the file name extensions that file.txt has type text/plain and file2.html has type text/html. This is done once at the time index.cache is created and need not be done every time the file is served. By the way, if the sound file were named soundfile.au it wouldn't need a Content-type line either.

The title of a file is another example of information stored in the index.cache file. With the WN server every file served has a title (even binaries) and optionally has a list of keywords associated with it. For an HTML document the title and the keywords are automatically extracted by wndex from the header of the document and stored in fields of that file's line in index.cache. These are used for the built-in keyword and title searches which the server supports.

3.2 File Ownership and Permissions

The files which you wish to serve should be owned by you, or by their creator, or by whoever is in charge of maintaining them. They should not be owned by nobody or whatever user id the server runs under as set in config.h. This because the nobody id should have the minimum permissions possible. It needs to have read access to the files to be served, but it has no need to be able to write to those files or alter them in any way.

Thus normally the files served might be owned by the maintainer and have their permissions set to be world readable but writable only by the maintainer (or by no one).

Likewise the index.cache file which controls access to everything in a directory should be owned by the maintainer of that directory and the only permission nobody should have for this file is read permission. In fact, for security reasons it the server was started as root (and then switched to a safer user like nobody) wnd or wnsd will refuse to use any index.cache file which is owned by the user id (e.g. nobody) under which the server is running. This restriction does not apply if wnsd is run on an unprivileged port by an ordinary user, because such a user might not be able to make index.cache files owned by someone else.

There is one exception to the rule of having nothing owned by nobody (and that's not a double negative). The exception is the log files. These files must be writable by the server and it generally seems sensible to have them owned by the user nobody under whose identity the server runs. The log file and the error log file can be specified on the command line when the server is run or can be set in the config.h with the #define WN_LOGFILE and #define WN_ERRLOGFILE macros.

3.3 Using the `wndex` Utility

Before describing the index.wn file in greater detail we briefly explain the use of the program which reads this file and produces the index.cache database file. Simply running wndex with no arguments in a directory containing a file named index.wn causes that file to be read and a file called index.cache to be created in that directory.

There are several command line arguments for wndex. The -r option causes wndex to recursively descend your data hierarchy using all subdirectories listed in the Subdirs= line of the directory record in the index.wn file (see below).

The -i and -c options specify an alternate name for the index.wn file and the index.cache file respectively. For example the command:

wndex -i foo -c bar

will attempt to use foo as the index.wn file and produce the file bar instead of index.cache.

The -d option specifies a directory other than the current directory in which to find the index.wn file and in which to create the index.cache and index.html files.

Finally the -q option (for quiet) suppresses the printing of any warning or informational messages by wndex.

3.4 The Directory Record

The first group of lines in an index.wn file provides information about the directory itself and the collection of files it contains rather than about any single file in the directory. It is called the directory record. This beginning collection of lines might look like:

Owner=mailto:you@host.edu SearchWrapper=dir_search_wrap Accessfile=/dir/access Subdirs=dir1,dir2,directory3

The Owner= line specifies the owner of items in the directory (which is used in the HTTP/1.1 headers sent by the server).

The SearchWrapper= line specifies a "wrapper" for the various searches of the directory. That is an HTML document which provides a customized response listing the matching items in one of the various searches of the directory. For more details see the chapter "Parsed Text and Server Side Includes on the WN Server" in this guide.

The Accessfile= line specifies the name of the file which controls access (by IP address) to this directory. If this item is omitted then items in the directory may be served to anyone. For more information on using the access mechanism see the chapter "Limiting Access to Your WN Hierarchy" in this guide.

Finally the line starting with Subdirs= specifies the subdirectories of this directory which you wish to have recursively searched when a title or keyword search is done on this directory. More information about searching can be found in the chapter "Setting Up Searches on the WN Server" in this guide.

For a complete list of the possible lines (called "directives") which a directory can have see the section "Directory Directives" in this guide.

3.5 File Records

After the directory record line group an index.wn file will typically have groups of lines called file records describing a particular file. A file record can be as simple as a single line like the line "File=file2.html" in the example above or it can contain several lines describing the file. For a complete list of the possible lines (called "directives") which a file can have see the section "File Directives" in this guide.

3.6 Your Default Page

When someone sends a request to your server with only the server name and no file name like:

http://hopf.math.northwestern.edu/

the WN server automatically translates this to:

http://hopf.math.northwestern.edu/index.html

adding the file name "index.html". More generally if a request is made for a directory, say with the URL http://host/dir1/dir2/, this will be translated to a request for http://host/dir1/dir2/index.html.

If you wish the default file name in a particular directory to be something other than "index.html" you can use the Default-Document= directive in the directory record of your index.wn file to change it. If you wish to change the default file name for all directories on the server you can change the #define INDEXFILE_NAME line in the config.h file and recompile.

3.7 Serving Files not Listed in an `index.wn` File

WN is also able to serve files without explicitly listing them in an index.wn or index.cache file. This is done by putting the line:

Attributes=serveall

in the directory record of the index.wn file for a directory or by running wndex with the -a option. Either of these specify that any file in this directory, which does not start with the character '.', or contain a '~', may be served, not just those listed in the index.wn file. The files index.wn and index.cache will also not be served. (Indeed if the -a option is used with wndex there need not even be an index.wn file, because an index.cache file will be created just as if the Attributes=serveall directive had been used.)

Note: When this directive is used in a directory protected by an accessfile or a password file be sure that these files have names that start with '.', or contain a '~'. Or better, put these files in a different directory from which nothing is served.

When the Attributes=serveall directive is used the server will attempt to set the content type correctly based on the file name suffix using the same default correspondences between type and suffix that wndex uses. Indeed when wndex is run on a directory with the Attributes=serveall directive, it behaves as if all files in the directory (except those starting with '.' or containing a '~') were listed with a File= directive. If the Attributes=serveall line (and the corresponding entry it creates in the index.cache file) are not present then only the files explicitly listed with a File= directive will be served.

The default correspondences between file name suffixes and MIME types are specified in the "mime.types" file. A default version of the file is in /lib/mime.types. The mime.types file should be installed in a known location. The default location is in the WN src hierarchy, but this can be changed by specifying a different value when the configure program is run or by editing the value of #define MIME_TYPE_FILE in config.h. The mime.types file exists so that you can add to it if you wish to add new kinds of documents to your server. The format of the file is explained in the file. If this file cannot be opened then wndex will use compiled in defaults which are the same as what is currently in the default version of this file. The mime.types file is read whenever wndex is run so wndex always knows the latest additions. This file is also read by wnsd (but not wnd) on startup for use with directories with the Attributes=serveall directive. The wnsd stand-alone server reads this file when it is started or restarted, but only takes note of new suffixes and their MIME types. You cannot change the MIME type corresponding to one of the standard suffixes (as listed in the default mime.types file). To do that you need to change the server source and recompile.

It is fine to have file records in an index.wn file which also has the Attributes=serveall directive. In this case the file directives take precedence. Thus if you had an index.wn file consisting of:

Attributes=serveall File=foo.html Content-type=application/postscript

the server would consult the file record for "foo.html" first and see that it is of type application/postscript (it would be silly to actually do this, of course) and use that type. But another file "bar.html" in the directory would also be served with the type indicated by its suffix. Files with no file record in the index.wn file and no recognized suffix will be given the default content type which can set with the Default-Content= directive.

When wndex is run on an index.wn file with the Attributes=serveall directive all the files currently in that directory which can be served are given entries in the index.cache file. Title and keyword searches only see files listed in an index.cache file. Likewise context and grep searches only seek matches in files listed in the index.cache file. Thus if a file is added to a directory with the Attributes=serveall directive it will not be visible to searches unless wndex is re-run in that directory. If it has not been re-run the file will still be served, however. Still, it is good practice to re-run wndex every time you add or delete a file in a directory with the Attributes=serveall directive. (Of course, it is required to do this for a directory without the Attributes=serveall directive.) There is no need to re-run wndex if you only change an existing file, unless you change its title or keywords.

There is no way to use wrappers or includes for files not listed in the index.wn file. So generally, the few seconds it takes to add a document's name and a descriptive title to your index.wn file and then to run wndex will pay off.

If you do not wish the Attributes=serveall directive to be allowed on your server you can disable it by uncommenting the "#define NO_SERVEALL" line in the config.h file. This does not affect the ability of wndex to write index.cache entries for all files in a directory with the Attributes=serveall directive. But it means the server will only serve files listed an index.cache file.

3.8 Customized Error Messages

There are three situations when the client request will be denied but for which you can supply customized error messages. These are requests for non-existent files, requests for files which require a password but for which no valid password was given, and requests from an invalid host for files limited to certain hosts. The lines:

No-Such-File-URL=http://host/dir/nosuch.html Access-denied-URL=http://host/dir/noaccess.html Auth-denied-file=~/dir/nopassword.html

in a directory record of an index.wn file specify URL's to which clients are redirected when a non-existent file is requested and when a document protected by an access control file is requested from an invalid host. The last line specifies a file to be sent when a password protected file is requested without a password or with an invalid password. For technical reasons it wouldn't work to have this be a redirection.

In the first two lines above (specifying redirection) the URL's given can be relative URL's, so the lines:

No-Such-File-URL=/dir/nosuch.html Access-denied-URL=noaccess.html

are valid. Default values for these three directives may be specified by editing the config.h file and recompiling the server. More information on customized error messages can be found in section "Directory Directives" in this guide.

Chapter 4

Security on the WN Server

A great deal of effort has gone into attempting to make WN as secure as possible. Security has received the highest priority in all design decisions. This is not grounds for WN maintainers to feel they can lessen their vigilance, however. The first thing you should be aware of is that there is a trade-off between security and functionality. You can have high security and restricted functionality or lower security with greater functionality, or something in between. WN is designed to let the maintainer choose the point on this continuum he or she is comfortable with. This document tries to discuss the various options you as a maintainer will have and what the implications of your choices are.

First, it is important to understand possible threats to the integrity of a system running the WN server. There are two types of threat which this document addresses separately: (1) external, from a client or purported client on a remote host, and (2) local, from a user with an account on the server host.

After reading this section you may wish to look at the section "File Ownership and Permissions" in this guide.

4.1 External Threats

The maintainer's objective is to prevent any unauthorized access to (or alteration of) files on the host system. Programs run on the server with the CGI/1.1 protocols cause special problems and are discussed separately below. If you do not need to use any executable programs you should run the server with the -e option. This option disallows any attempt to execute a command on your server and does not allow any data sent by a client even to be written to a temporary disk file. In this situation the key to WN security is twofold: no document is served without explicit permission from the maintainer; and nothing is written to disk on the server except the log file.

The basic philosophy of WN security is that by default no client requests are granted. Permission to serve a document must be explicitly granted by the maintainer. The WN server keeps a small database in each directory of its data hierarchy which contains information about files to be served from that directory. In particular no document can be served unless explicit permission to serve it is given in such a database.

Note: For more information on these database files the chapter "An Overview of the WN Server" in this guide is a good place to start. These files are very easy to create and maintain. See the chapter "Creating Your WN Data Directory" in this guide.

Despite this strong security foundation several additional steps are prudent. The most important is that the maintainer must assure that no untrusted person has write access to any part of the WN hierarchy. For example an incoming anonymous ftp directory should never be part of a WN hierarchy (better yet don't have one at all), because an attacker might be able to put a database there granting illicit access to some documents on the server system for which the user id running the server has read permission. There are several defenses against such a counterfeit database and we discuss them next.

4.1.1 Protecting Your `index.cache` Files

All security control for the WN server resides in the per directory database files (these files have the default name index.cache). Consequently it is extremely important to guarantee their integrity. There are several command line options for the server which help protect against counterfeit index.cache files.

The -t or -T option to wnd and wnsd allow you to specify a trusted owner or group owner (not both) for index.cache files. When invoked with only the -t argument (or the -T argument) wnd or wnsd will not serve a document unless the index.cache file listing it has the prescribed uid or gid. This uid or gid should be that of the maintainer not the user id under which wnd or wnsd runs. Indeed, for security reasons if the server has been started as root and changed to another uid it will refuse to use an index.cache file whose owner is the uid under which it is running. If on your server all index.cache files are created by a single user or a single group I strongly recommend using the -t or -T option.

This added security is weakened somewhat if you use the -u option which allows index.cache files owned by untrusted users, but only permits them to grant access to files owned by the same user as the index.cache file. This option might be appropriate if you permit users to have their own home page on your server. It would allow users to serve documents which they own but no others. If both the -u and the -t argument are used the -u takes effect except the trusted user specified with the -t option is exempt from its restrictions. Notice that if neither the -t or -u argument is used then a user with his own home page can make a symbolic link to any file readable by the server and that document will be served! This is true even if the linked to document is in a directory with limited access or is outside the server data hierarchy.

When the server is run it must assume the permissions of some user on the host. Which user is determined when you run the configure program or by defining "#define USER_ID" in config.h. It is important that USER_ID have as few permissions as possible. On many systems there is a user called nobody with minimal permissions. The numeric user_id of nobody is a good choice and is the default choice of the WN configure program. Of course the server must have read permission on all the files served but it should not have write permission for any directory or file other than its log files. If the UNIX syslogd(8) system utility for logging is enabled there is not even any need for write permission on a log file. A good practice is to have all the files in your hierarchy which you intend to serve be owned by the maintainer or their creator. They should be world readable (assuming they are for general consumption) but with restricted write permission. The files in your hierarchy should not be owned by the user id under which WN will run.

WN does not by default use the UNIX chroot(8) system utility to further restrict the files which the server can access. Doing so would enhance security at the expense of extra work for the maintainer. The effect of this is to prevent the server from even internally accessing any file which is not in your data directory. If you are especially concerned about security you may wish to run one of the public domain TCP wrappers, such as Wietse Venema's tcp_wrappers (source code available at ftp://ftp.win.tue.nl/pub/security/tcp_wrappers_7.6.tar.gz), in conjunction with WN which will allow you to use the UNIX chroot(8) system utility. This can simultaneously enhance security for other TCP services like the UNIX ftpd(8) system utility.

4.1.2 CGI Programs

Enabling the use of programs run on the server greatly enhances its functionality but also increases the potential risk of an attack. Many things which on other servers can only be done with CGI/1.1 programs are built-in features of WN and hence entail much less risk than they would as CGI/1.1 programs. These include imagemaps, a variety of document searches, and serving conditional text based on information in the client supplied headers. If your needs can be met with these features then you can disable CGI/1.1 with the -e option and greatly improve your security.

However, there are many needs which can only be met by programs. The greatest danger in their use is that even though the program is under the control of the maintainer, the arguments passed to it can be set by a potential attacker. WN supports the CGI/1.1 or "Common Gateway Interface" protocol (see the chapter "Using CGI Programs on the WN Server" in this guide) for executing programs. Under this protocol there are three ways by which arguments are passed to programs. The first of these is used when processing HTML forms which use the GET method. Under this method all arguments are put in environment variables and the program must extract them from the environment. Moreover, they have been placed in a URL encoded format by the browser and must be decoded by the program. Thus if the request is of type GET, the arguments are examined to see if they contain an '='. If they do, it is assumed that this is a CGI/1.1 form response (something like "name=John&toppings=pepperoni"). In this case the program is executed with no arguments and the argument string is placed in an environment variable where the program can read it. This is fairly safe from the server point of view but the program writer must exercise great care.

The second method is for HTML forms using the POST method. In this case everything posted by the client (in URL-encoded form) must be sent to the UNIX stdin(3) stream of the CGI/1.1 program. Thus if the request is of type POST, information is read from the client and put in a temporary file on disk. Then the program is executed with no arguments and its stdin(3) comes from this file. Security is the responsibility of the program writer. It is not so dangerous to have arguments come from stdin(3) but the program writer must still exercise care.

Finally if the GET request has arguments but no '=' it is assumed to be an ISINDEX type request and the program should be executed with the given arguments. While the CGI/1.1 specification does not permit the altering of arguments, it does say that if the arguments pose any security problems it is permissible to put the string in an environment variable and execute the program with no arguments, just as in the CGI/1.1 forms case described above. WN takes a very strict view on this subject and considers any characters other than space and alphanumeric characters as a security problem. Accordingly, if it finds any other character in an argument it will put all arguments in the appropriate environmental variable and run the program with no command line arguments.

Again let me say the program writer must exercise great care. I can't emphasize this too strongly. When you run a CGI/1.1 program the server almost completely absolves itself of security responsibility and dumps that responsibility on the program writer. Most authors of freely distributed CGI/1.1 programs are not fully cognizant of potential security holes they may open up. Running insecure programs created locally or obtained from Usenet postings is almost certainly the single greatest risk to a WN server site. To find out more about writing secure CGI/1.1 programs I strongly recommend that you read the relevant sections of the "WWW Security FAQ" maintained by Lincoln Stein and the "Safe CGI Programming" maintained by Paul Phillips.

4.2 Internal Threats

Whenever untrusted users have accounts on a system there is risk involved. The objective of WN is to insure that running the server does not increase this risk. If the server is wisely managed, I believe this goal can be achieved. Here are some guidelines.

If it is possible make sure that no untrusted user has write access to any part of your WN hierarchy. As mentioned above an attacker with write access to your hierarchy can create an index.cache file which will give access to anything on your server which is readable by the user id under which WN runs. Even worse, she can create a shell program and a index.cache file permitting it to be executed, so it can be executed with all the permissions of that user id. A good rule of thumb is:

Note: Always assume that everyone with write access to any part of your data hierarchy has all the permissions of the user id under which your server runs!

This should not be true if you are using some of the command line options described above, but it is good practice to behave as if it were true.

Sometimes it is not possible or desirable to deny write access to your WN hierarchy. For example, you may need to allow all users to have a home page in their home directory or in some other designated place. There are two important things to do in this case.

The first of these is run the server with the -u option. This has the effect of requiring that every file served (including wrappers and includes) have the same owner as the index.cache file which grants it permission to be served. This means that untrusted users can only serve files which they own. This will prevent a user from serving the UNIX passwd(5) configuration file typically in /etc, but will not prevent him from making his own copy of passwd(5) and serving that.

If the -t or -T option is also used then index.cache files owned by the trusted user or trusted group are exempt from this requirement and they may grant permission to serve any file the server can read. For security reasons the server will refuse to use an index.cache file which is a symbolic link to another file.

The -e or -E option mentioned above are also a good idea in this case, to prevent any execution of programs or at least restrict their execution to trusted index.cache files.

You should note that when run in its default configuration there is no way to use access files or password authentication to prevent users on your system, who can create index.cache files, from gaining access to files you are serving. They can simply make a symbolic link in their part of the hierarchy to the file you want to restrict and a index.cache file permitting it to be served. Since the server has access to the restricted file it will serve it if it is listed in a index.cache file. This simple threat can be avoided by using the -u option described above, but the number of potential threats is quite large. For example, if the -e or -E option is not used a hostile user could write a CGI/1.1 program which reads the sensitive files and mails them to himself. In general I would strongly advise against trying to have sensitive documents (protected by password or .access files) and potentially hostile users on the same server. I would also strongly advise against allowing potentially hostile CGI/1.1 programs, executed includes or external modules. They can be disallowed through the use of the -e or -E options. If they are not disallowed a CGI/1.1 program can alter or destroy log files. A hostile authorization module could collect user passwords.

The -u and -E options greatly enhance security, but it is important to keep the following principle in mind. You should assume that any permissions you grant to the user id under which WN runs are also granted to every user who can create an index.cache file in your data hierarchy.

4.3 Password Authentication and Restriction by IP Address

WN offers two methods of limiting access to your hierarchy or parts of it. See the chapter "Limiting Access to Your WN Hierarchy" in this guide for information on how to use these features.

These are useful for many purposes but I would not advise using them to protect extremely sensitive information. The first of these methods is restriction by hostname or IP address. It is not impossible to spoof a server with a fake IP address, but I think it is fairly difficult. It is easier to use a counterfeit hostname. For this reason I would suggest using IP addresses rather than host names in access control files.

The other method of limiting access is by password with the HTTP/1.1 Basic Authentication scheme. This is about as secure as using passwords with the UNIX ftpd(8) system utility to protect information. This scheme is flawed in that it involves the transmission of essentially unencoded passwords over the network. It is relatively easy for unscrupulous people to obtain "sniffer" software which allows eavesdropping on all local network traffic. This means, in particular, that it is possible to intercept passwords of other users.

For security reasons when you use wnauth or any "Authorization-Module=" you are required to use either the -t or -T option or the -a or -A option when the server is run and to have the index.cache file in the protected directory owned by the trusted user or group. This is to guard against counterfeit authentication modules.

This particular problem is remedied by the "Digest" authentication scheme. Digest authentication is supported experimentally by WN but has the rather severe drawback that no publicly available clients currently support it. It is experimental, because I have no client to test it and hence it has barely been tested. I believe it will be a standard part of HTTP/1.1 and at that time will significantly improve security of password protected directories.

The directive "Authorization-Realm=", used whenever an authentication module is used, is to notify the client that for any document on this server with the same realm as this one, the same password/username combination will be valid, so the client need not ask the user for a username and password, but can reuse the one supplied for the first document with this realm. For security reasons you should always put your host and domain name in the realm. This may at least discourage attempts at other sites to forge your realm in order to collect user passwords. Your users should also be warned never to enter their password if the realm displayed when they are prompted for a password contains a different hostname than the one in the URL they are trying to access.

Both Basic authentication and access control by IP address become much more vulnerable if the potential attack comes from users who can create index.cache files for another part of your server's data hierarchy. I would recommend against trying to use either to protect information from users with home pages on your server.

If no potentially hostile users can create documents which can be served on your system the mechanisms described above provide protection adequate for many purposes. If I were an information provider selling access to a collection of information on my server, I would be comfortable using the numeric IP address to limit access to my paying customers. On the other hand I would not want any of these mechanisms used to protect my bank records.

4.4 Some Recommended Security Configurations

This a list of possible ways you might configure your server by setting values in config.h and using command line arguments. It assumes that you are running either wnsd or wnd on the privileged port 80 and that the default value of "#define USERID" and "#define GROUPID" defined in config.h have not been changed. This will mean that wnsd will be started as root, but will almost immediately switch its privileges to those of the unprivileged user nobody. Likewise if wnd is running under the UNIX inetd(8) system utility we assume that it is set to run with the privileges of nobody.

The following list of configurations is in decreasing order of security.

4.4.1 Forbid CGI and Only Maintainer Trusted

This strongest level of security is achieved by running either wnsd (or wnd under the UNIX inetd(8) system utility) with the -t or -T option and with the -e option and with no other options. For the really paranoid uncommenting the "#define FORBID_CGI" line in the file config.h and recompiling removes the CGI/1.1 code from the binary.

With these options no CGI/1.1 programs or filters or program output includes are permitted. Also the POST method is not accepted (an error is returned for a POST request). Furthermore only index.cache files owned by the user specified in the -t option are used. The server should be run as nobody (the default) and the numeric user id specified with -t option should be the maintainer's.

4.4.2 Only Maintainer or Maintainer Group Trusted

This is the the strongest level of security if you need the functionality of CGI/1.1 programs or filters or program output as server includes. This security configuration does not allow any user home pages (unless the maintainer produces the index.cache file for them). To use this level run wnsd (or wnd under inetd(8)) with the -t or -T option and no other options. This places all control in the hands of a single maintainer or a "maintainer group". No document or program output may be served unless the maintainer has authorized it by explicit mention in one of the index.cache database files. The server will not recognize any index.cache file unless it is owned by the maintainer specified with the -t option or the group specified with the -T option. Only one of -t or -T options can be used.

4.4.3 Restricted User Serving Privileges

This permits users on the server host to have and control their own home pages and documents, but with a number of limitations. They will not be permitted to run CGI/1.1 programs, filters or include programs. Also the server will require that every file served (including wrappers and includes) have the same owner as the index.cache file which grants it permission to be served. This means that users can only serve files which they own.

This is configuration is obtained by running with the -E option and the -u option. The -E option is similar to the -e option except that index.cache files owned by a trusted user id or trusted group id (set with the -t or -T option) are exempt from the restrictions. The -u option requires that in order to be served a file must be owned by the owner of the index.cache file which lists it. Trusted users as specified with -t or -T options are exempt from this restriction also.

4.5 Other WN Security Measures

One of the security problems encountered with another HTTP server involved an attack by overflowing an internal buffer with data provided by the the client in such a way that the (attacking) client could supply code that the server executed. I have, to the best of my ability, defended against this in WN code. All copying of data supplied by the client and most copying of data read from the index.cache file is done by a function which I wrote and which was designed precisely to deal with this threat. Excess data which would overflow is discarded so buffers may contain truncated data, but will not be overwritten.

Probably the most controversial security "feature" of WN is that it greatly restricts the set of characters which can be used in file or path names. Instead of trying to decide which characters are dangerous and disallow them, WN has a list of characters presumed safe and only allows them. The currently allowed characters are alphanumeric characters and '_', '-', '.', '+', '/' and '%'. The same restrictions are applied to the PATH_INFO part of URLs for CGI/1.1 programs, except that the character '=' is also allowed. These restrictions sometimes cause problems with CGI/1.1 programs that like to include unusual characters in file names or PATH_INFO.

Also the server will attempt to resolve all "../" references while staying in the server data hierarchy. If these references would result in a request for a document outside the server data hierarchy the request is treated like a request containing illegal path characters. In particular with verbose logging turned on, a message like "SECURITY Found bad character (%X hex) in path" is logged.

To defend against a "denial of service" attack the server will refuse a POST request with post data in excess of 10 megabytes. This does not defend against multiple requests with large POST data. The maximum allowed size of POST data can be altered by changing the value of MAX_POST_LEN in the file config.h

Chapter 5

Setting Up Searches on the WN Server

One of the design goals of WN is provide the maintainer with tools to create extensive navigational aids for the server. A variety of search mechanisms are available that provide this capability.

5.1 Title Searches

In response to the URL:

<http://host/dir/search=title>

the server will provide an HTML form (automatically generated or prepared by the maintainer) asking for a regular expression search term. When supplied the server will search the index.cache files in /dir and designated subdirectories for a items whose titles contain a match for the search term. An HTML document with a menu of these items is returned. Subdirectories are designated for recursive searching by an entry in directory record of the index.wn file like:

Subdirs=dir1,dir2,dir3

You can customize the message offered requesting a search term by creating an HTML form whose ACTION is the URL "http://host/dir/search=title" and which uses the GET method to return the search term with "NAME=query".

The regular expressions recognized by the WN server are the same as those of the UNIX grep(1) utility (though this utility is not used as the server has its own regular expression functions). The more general regular expressions used for example in the UNIX egrep(1) utility are not supported by WN.

5.2 Keyword Searches

Like title searches except matches are sought in keywords instead of titles. Keywords for HTML documents are automatically obtained from <META> headers. For other documents (or HTML documents) they can be manually supplied in the index.wn file. This is done by including a line like:

Keywords=keyword1, keyword2, etc.

in the relevant document's record in the index.wn file. The URL to use to cause this search to be done is:

<http://host/dir/search=keyword>

5.3 Title/Keyword Searches

Like the above keyword and title searches except the match can be either in the keyword or the title. The URL to use as the ACTION in a form or simply to invoke the search is:

<http://host/dir/search=synopsis>

If a recursive title, keyword or fielded search is requested and some directories have restricted access, only those directories which have the same access file or the same password realm as the directory where the search started will be searched. In fact, if an "Accessfile=" directive is used the path must be the same for both directories (and must necessarily be of the form "Accessfile=~/dir/.access" or "Accessfile=/dir/.access" rather than "Accessfile=.access").

5.4 Fielded Searches for User Supplied Fields

The maintainer can supply up to 20 additional field values associated with a document. These are used for searching purposes in the same way that "Keywords=" are. This is intended to give some additional "keyword like" fields, for example, document author or document id number. It works exactly like keywords except these values are not extracted from HTML files, but must be created with a line like:

Field3=any text here

in the index.wn file. The '3' in this example can be replaced with any number from 0 to 19. The URL to use as the ACTION in a form or simply to invoke the search in the example above is:

<http://host/dir/search=field3>

Like keyword and title searches the search term for a fielded search can be any UNIX grep(1) utility like regular expression.

5.5 Context Searches

Unlike keyword, title and fielded searches this is a full text search of all "text/*" documents in one directory (not subdirectories). These searches are also limited to the main files -- they will not find matches in wrappers and included files. The returned HTML document contains a list of titles of documents containing a match each with a sublist of the lines from those documents containing the match. This provides one line of context for the match. For HTML documents selecting the matched expression in one of these takes you to the document with your viewer focused on the matching location. The primary intent of this feature is to provide full text searching for an HTML "document" which might consist of a substantial number of files. It is possible to customize the text of the HTML response containing the matches. This is done with a Searchwrapper directive.

The URL to use as the ACTION in a form or simply to invoke the search is:

<http://host/dir/search=context>

It is possible to mark HTML documents with comments so that only part of them is searched. This is done with lines consisting of the comment "" which turns off searching until the line consisting of "" is encountered.

5.6 Grep Searches

A grep search is just like a context search, except that only a list of anchors pointing to files containing a match is returned. There are no lines of context showing the match. To do a grep search on the files in directory dir use:

<http://host/dir/search=grep>

5.7 Line Searches

A line search is just like a context search, except that only one list of all matching lines is returned, instead of the matching lines being sublists of a list of files containing a match. That is, all the items in sublists of a context search are concatenated in one large list of lines containing matches. The matching items are still anchors pointing to items in their respective files. To do a line search on the files in directory dir use:

<http://host/dir/search=line>

5.8 File Context and Grep Searches

A file context search is just like a context search, except limited to a single file. The file grep search returns a text/html document containing the lines in the file matching matching the regular expression. These lines will be converted to plain text and surrounded by <pre> and </pre> tags. This is done because isolated tags or partial tags taken from an HTML document would be unlikely to function properly. It is likely that you will want to use a Searchwrapper directive with a file grep search.

The URL's to invoke these searches on file foo are:

<http://host/dir/foo;search=context> <http://host/dir/foo;search=grep>

5.9 Search a Directory or Search a Hierarchy?

The different searches described above are of two types: those that search the index.cache files like field, keyword and title searches and those that do full text searches of multiple files, like context, grep, field grep and line searches. The first type recursively descends all subdirectories listed in a "Subdirs=" directive and searches all the index.cache files. The second only searches files in the one directory specified in the search.

The reason for this is efficiency. context, grep, field grep and line searches are not a replacement for WAIS, glimpse, webglimpse or some other indexed search engine. The intent of these searches is to allow a full text search of a single conceptual HTML document that is made up of a number of files with links. These searches work great with something like the HTML4.0 specification (see http://hopf.math.northwestern.edu/html4/) which consists of a number of files of moderate size, but they would be very slow with 1000 files spread out in a data hierarchy. For that you really need an indexing search engine.

Another limitation of these searches is that they will not find matches in wrappers and included files.

On the other hand, all of the searching methods listed above are built into the server and require no additional effort for the maintainer. You don't need to produce or maintain an index. They are simply referenced with URLs like "<http://host/dir/search=context>" where /dir is any directory containing files to be served and an index.cache listing them. Of course search permission can be denied for any directory or any file contained in that directory.

5.10 List Searches

The server will search an HTML document looking for an unordered list of anchors linking to WWW objects. The contents of each anchor will be searched for a match to the supplied regular expression. The search returns an HTML document containing an unordered list of those anchors with a match. This is quite useful when combined with the wn_mkdigest utility which creates HTML documents to be searched in this way from files with internal structure like mail or news digests, mailing lists, etc.

The URL to invoke this search on file foo:

<http://host/dir/foo;search=list>

5.11 Index Searches

Indexed searches can be supported in WN by auxiliary modules. Two such modules, wnseven_m and wnsectsearch, are provided as examples and maintainers may wish to create others. To use such a module you should have a form action be something like http://host/dir/search=index.

Then in the index.wn file in the directory dir you should have a line like:

Search-Module=/full/path/to/searchmod

The program searchmod should read the environment variable QUERY_STRING and return a partial HTML document. The typical case would be the program returns an unordered list of anchors to documents containing a match to the query string. This list can be wrapped by including a "Searchwrapper=" directory record. If it is not, then a default wrapper with text like "Here are the matches for your search" is supplied.

5.12 Search Modes

The different types of searches, (e.g. keyword, context, etc), are called the modes used by the search. Normally the mode is set by adding, for example "search=context" to the end of a URL. However, if an HTML form is used to initiate the search, it may be desirable to allow the mode to be selected by a form variable. Thus an HTML form like:

<form action="search=mode" method="GET">
Enter your search term <input name="query" size=15>
<input type="submit" value="Search"> by
<input type="radio" name="mode" value="title" checked> title or
<input type="radio" name="mode" value="keyword"> keywords
</form>

will execute either a title or a keyword search depending on whether the user checks the radio button for "title" or "keyword". The URL requested will end with "search=mode", but actually it could be anything since the "mode=title" (if that is what is checked) which will be in the query part of the URL will override whatever follows "search=" in the base URL.

5.13 Searchwrappers

By default when a search is done an HTML file is created with text like "Here are the matches for ...". You may wish to customize this response which is done with the "Searchwrapper=" directive in either a file record or directory record of the index.wn file.

The line:

Searchwrapper=swrap.html

specifies that the HTML file swrap.html in the current directory should be used as a wrapper for the output of all searches on this directory (if it is a directory record entry) or file (if it is a file directive). This wrapper differs from other wrappers in that it can have only a single "" line. An unordered list of anchors to the matching items will be inserted at the location of this line. You can, of course, insert the client supplied search term by use of the line "" in this file.

The remainder of this file can be anything you wish and is often an HTML form allowing subsequent searches.

If a search fails to find any matches then a default HTML response indicating this is sent. This response can also be customized but only if a "Searchwrapper=" is also used. The line:

Nomatchsub=foo.html

specifies that the HTML file foo.html in the current directory should be used for the output of all searches (title, keyword, etc) on this directory (or file if it is a file directive) which return no matches. If "Nomatchsub=" is used and a "Searchwrapper=" has not been defined an error is logged and the nomatchsub file is ignored. The nomatchsub file must be in the directory being searched and its name must not contain a '/'.

Chapter 6

Parsed Text and Server Side Includes on the WN Server

The WN server has powerful capabilities for modifying an HTML file as it is served. One such feature, called "server-side includes", is the ability to automatically insert one file in another. But the usefulness of this capability is greatly enhanced by another feature, conditional text. A simple "if - else - endif" construct allows sections of a document to be served only to certain hosts, or to clients providing an appropriate Accept header or a desired Referer or User-Agent header. Using the "else" construct allows alternate text segments for clients not meeting the desired criteria. For these features to work it is necessary for the server to parse the file looking for the location of includes or for conditionally served text. The fact that a file should be parsed in this way is indicated in its entry in its index.wn file. This is done with an entry "Attributes=parse" in the file record. This line is not necessary if the file record lists wrappers or includes since it is assumed in that case. Only documents with MIME type text/html can be parsed in this way.

6.1 Wrappers, Includes, and List-Includes

The simplest and most common use of parsed text is including additional files in one which is being served or "wrapping" a served file with a second file, i.e. inserting the requested file inside the second at a desired location. The latter is useful, for example, if you wish to place a standard message at the beginning or end (or both) of a large collection of files. All files included or used as wrappers must be listed in the index.cache file. They are not listed in the file in which they are to be included; only the location of the insertion is marked.

To ensure security various options are available including the the requirement that a served file and all its includes and wrappers have the same owner as the index.cache file listing them. This is done with the -u option.

Another important application of wrappers is to customize the HTML documents returned listing the successful search matches. If, for example, a directory is assigned a wrapper the server assumes that it contains all text describing the search and it merely supplies an unordered list of links to the matching items.

6.2 Simple Including

Suppose your server offers the file foo.html and you wish to have the file bar.html automatically inserted at some point into foo.html. You achieve this by use of the "Includes=" directive in the index.wn file in the directory containing foo.html. That index.wn file should have an entry like:

File=foo.html Includes=bar.html

and the file foo.html should contain either of the lines:

or:

<?WN #include >

These two forms are equivalent as far as the server is concerned. The second form is considered more SGML friendly by many as <?WN something> indicates a processing instruction specific to WN rather than a comment. For historical reasons this manual describes the other form, but either may be used. With the first form the '#' is required but with the other you may use either:

<?WN #include>

or:

<?WN include>

Any of the "" lines described in this section may be replaced using the "<?WN something>" syntax. Also with this syntax the case of the WN is not significant.

The marker "" should be the only thing on its line but may have white space before it. This is controlled by the "Includes=" line in the index.wn file. All including of files by WN is done only for text files and only in units of lines.

If the include marker is never found in foo.html, then this file is served and the file bar.html, is appended at the end. Thus the "Includes=" directive can be used to append a file without the need of any marker in the main file.

If you wanted to include two files in foo.html, say bar.html and bah.html you would place the marker "" at two places in foo.html and have an entry in your index.wn file like:

File=foo.html Includes=bar.html,bah.html

The two files will then be included at the marked spots in the order that they are listed in the "Includes=" directive, i.e. bar.html will be inserted at the first marker and bah.html at the second. You can have as many included files as you wish. They should all be listed on the "Includes=" line separated by commas.

Sometimes it is easier to specify the order in which multiple files are to be inserted by adding the file name in the HTML document. This is also possible with WN. However, since good security requires that only files listed in the index.wn file may be served, it is still necessary to list files you want to include there.

So, another way to include two files in foo.html, say, bar and bah, is to place the markers '' and '' at the appropriate places in foo.html and have an entry in your index.wn file like:

File=foo.html List-Includes=bar,bah

The order of the files listed in the directive is not significant. The example above grants permission for the inclusion of the three files listed. It does not require their insertion. Note the difference between the two mechanisms: with the "Includes=" directive only "" is needed in the HTML and the file inserted is taken (in order) from the list in the directive, while with the "List-Includes=" directive the HTML must contain '' so the server knows to insert file "bah" and the file list in the directive is checked to see that permission to serve this file has been granted. If you use the "List-Includes=" directive you cannot also use either the "Includes=" or "Wrappers=" directives -- the "List-Includes=" and "Includes="/"Wrappers=" mechanisms are mutually exclusive. It is fine to use the line '' to insert a file with the "Includes=" directive, but with that directive the "bar.html" is only a comment and has no effect on which file is actually inserted.

There is a "Default-List-Includes=" directory directive which can be used to grant permission for any HTML file in the directory to include one or more of the listed files. This also causes all HTML files in the directory to acquire the "Attributes=parse".

If a file has a filter only that file will be filtered, not any wrappers or includes.

Instead of a file it is possible to include the output of a program. To do this the program is listed in the "Includes=" directive but its name is preceded with the '!' character. For example:

File=foo.html Includes=!/usr/bin/date,bar.html

will insert the time and date at the first "" and the contents of bar.html at the second. "Includes=" and "List-includes=" should not list arguments to programs. Thus if you wished to use the output of the UNIX date utility to get UTC instead of local time you should create a program named "mydate" containing, for example:

#!/bin/sh exec /usr/bin/date -u

and use the directive "Includes=!mydate".

This ability to serve the output of programs can be restricted in several ways. If WN is invoked with the -e option then no includes, filters, or CGI programs will be executed.

The -E option in conjunction with the -t or -T options restricts the execution of programs to those listed in index.cache files owned by trusted users or groups. The -u option allows the execution of programs or inclusion of files owned by the owner of the index.cache file which lists them. If the -E and -u options are used together the -E option takes precedence.

If you wish to have all the standard CGI/1.1 environment variables made available to the executed include program you can do so by adding the line:

Attributes=cgi

to the file record. A list of these environment variables can be found in the appendix "CGI and other Environment Variables for the WN Server" in this guide. Also see the sample CGI program which is located in the file /docs/examples/sample.cgi which accompanies the WN distribution.

6.3 Conditional Text: If, Else, and Endif

Often a server maintainer may wish to serve different versions of a document to different clients. Here is a simple example of how this can be handled with WN.

 <a href="picture.jpg"> Here is the jpeg version of the picture: </a>  <a href="picture.png"> Here is the png version of the picture: </a> 

This tells the server to look at the Accept: headers provided by the client and if "image/jpeg" is among them then use the jpeg image and otherwise use the png image. More precisely, the "image/jpeg" part of the "if" line is a UNIX grep(1) utility like regular expression and if there is any match for it among the Accept headers the jpeg image will be used. Of course these "if - else - endif" constructs can be nested. A similar construct allows you to make the text served depend on matching a regular expression with the contents of the User-Agent header, the Referer header, the Cookie header, the client hostname, or the client IP address. A complete list of possible test clauses for the #if statement is contained in appendix "Parsed Document Syntax for the WN Server" in this guide.

Note: For this to work the file containing the conditional text needs to be parsed by the server. The server only knows to do this if the file record in the index.wn file contains a line "Attributes=parse".

If, in an "if clause", instead of the equal-tilde string '=~' (to indicate a regular expression match) the character string '!~' is used then the truth value of the match is reversed. For example the lines:

 Here is some text for newcomers to my site. 

would display the "Here is some text ..." to those clients accessing this document via any link which is not on the host my.host.edu.

There is also a redirect command which can conditionally cause the server to send an HTTP/1.1 redirect to a new URL. For example if the text:

  

is included at the beginning of an HTML document then any request from a UK host will automatically be redirected to the specified URL, the UK_mirror_url in this case. This mechanism could also be used to redirect text only browsers to a text only alternative page, etc. There must be no text sent before the '' is encountered (not even blank lines) since the server cannot send an HTTP/1.1 redirect while in the middle of transmitting a document.

Normally the URL in the '' line is fully qualified, like "http://host/path/foo". However, it can also be simply "foo" referring to a file in the same directory as the file being parsed. In this case an HTTP/1.1 redirection is not sent, and instead the file "foo" is returned immediately to the client.

Often a single regular expression is not adequate to distinguish whether or not to serve some text. For that reason WN allows you to use a file containing any number of regular expressions and serve a document based on whether any of these expressions match accept headers, referer header, user-agent, hostname, etc. For example, if the file "acceptfile" contains the lines:

image/png image/jpeg image/x-xbitmap

then the following conditional text might be appropriate:

 I see you aren't using a text only browser... 

The format of the file "acceptfile" is one UNIX grep(1) utility like regular expression per line. Lines beginning with '#' are taken to be comments. If a regular expression is preceded by the character '!' then that character is skipped but the truth value of the match is reversed. More information about files of regular expressions for conditional text can be found in appendix "Parsed Document Syntax for the WN Server" in this guide.

6.4 Conditional Text and Access Control Files

Another method of using conditional text is with a normal WN access control file. For example:

 Here is a link to a restricted directory. <a href="/secret/stuff.html">Restricted Local Stuff</a> Hosts not listed in the file "secret/access" can't look at it, so why show them a link to it? 

will display the included text and the link to "Restricted Local Stuff" only to clients on hosts permitted by the access control file "secret/access".

6.5 Logical Operations in If Conditions

The condition in the "if" or "elif" tags can be made more complex than those described above by combining simple conditions using the logical operations '&&' for 'and', '||' for 'or' and '!' for 'not'. Parentheses may be used for grouping. For example:

 Something for those sites in accessfile but not in the UK. 

This is described in greater detail in the section "Matching Regular Expressions: #if and #elif" in this guide.

6.6 More on Including: the `section` Marker

Often you may want to include HTML files in a larger document. For example, a single logical HTML document often consists of a number of separate files. This can make it very difficult for a client to download and print the entire document. To alleviate this problem you could form a single document concatenating all the pieces, but that can be problematic when the document is altered or updated as it is necessary to change both the piece and the concatenation.

WN offers a solution to this by allowing you to write a small skeleton document which includes all the pieces, forming a "virtual" document which is the concatenation. But if we just used the mechanism above the concatenation document would include the <head> elements of all the pieces. To remedy this you use the marker:

instead of "" and the server will include only the portion of the HTML document between the special comments "" and "" inserted in that document. This requires that these starting and ending comments occur in the HTML document on lines by themselves.

6.7 Wrapping Files

Suppose you have a large number of files and want a standard header or footer placed on all of them (perhaps a standard disclaimer). You could, of course, make many copies of the standard header with a different "Includes=" directive for each, but this is cumbersome. It has other drawbacks as well, such as the fact that grep and context searches do not search associated include files, only the main file.

To deal with this case WN uses wrappers. Wrapping a file is the inverse of including it. If you have an index.wn file entry like:

File=foo.html Wrappers=bar.html

then the server will send the file bar.html looking for the marker "" and inserting foo.html at the line where it is found. So this is just like the "Includes=" directive except the role of which file is included in the other is reversed. If the include marker is never found the entire wrapper, bar.html, is sent first and the main file, foo.html, is appended at the end. Thus the "Wrappers=" directive can be used to prepend a file without the need of any marker in the main file.

6.8 Search Wrappers

A search wrapper is a special kind of wrapper. Its function is to allow you to customize the results of user searches. It is normally associated with an entire directory in which case it applies to all title, keyword, field, grep and context searches of that directory. See the chapter "Setting Up Searches on the WN Server" in this guide.

A line like:

SearchWrapper=foo.html

in the directory record of an index.wn file will cause any search of that directory to return an unordered list of matches wrapped with the file foo.html. The list of matches will be inserted into foo.html at a point where the marker "" is found. You can also insert the user supplied search term by using the marker "". Both of these markers must occur on a line by themselves.

6.9 Nested Including and Wrapping

Wrapping and including can be arbitrarily nested (though this is not true of search wrappers). It can be a little complicated to get the effect you want.

Here is how to do it. Think of all your files, wrappers, includes and the main file arranged as you wish them to be combined for the final served document. Now imagine inserting an opening (or left) parenthesis at the beginning of each file and insert a closing (or right) parenthesis at the end of each file. You should have a legally nested and balanced collection of parentheses. To each of the opening parentheses attach the name of the file which begins at that point. Then write down the list of all the file names in the order their corresponding opening parentheses occur. All the files which come before the main file should be wrappers and should be listed in the "Wrappers=" line in the order in which they occur in this list. All the files after the main file should be in the "Includes=" line and should occur in the order they occur in this list.

Here's a simple example. Suppose we have a main file M and other files A, B, C, and D which we want to have nested like:

(D...(B...B)...(M...(A...(C...C)...A)...M)...D)

Then the entry in the index.wn file should look like:

File=M Wrappers=D,B Includes=A,C

6.10 Including Title, Query, Fields and Environment Variables

Often it is useful to insert items like the main document's title into a wrapper, or the search item provided by the client into a searchwrapper for the responses. This is possible with WN parsed documents using the lines:

or:

in the file at the point where the title or search term should be inserted. Both of these markers must occur on a line by themselves.

In addition to the title and query user supplied fields can be inserted in documents. This is done by including a marker such as:

in the file at the point where the value should be inserted.

You can also insert the value of any environment variable into your text with a line like:

which will be replaced by the contents of the environment variable HTTP_REFERER.

Note: If there is no "Wrappers=" or "Includes=" line in the index.wn file for this entry then there must be a line like:

Attributes=parse

so the server knows it is to parse the file to look for the marker. See the section "Set File Attributes" in this guide for more details. Also if you wish to insert the value of a CGI/1.1 environment variable (as in the example above) you must have an "Attributes=parse" line.

6.11 Meta Information

Strictly speaking this feature has nothing to do with parsed documents, but it fits here as well as anywhere. If a URL has the parameter meta appended (with a semi-colon separator), as in

http://host/path/foo;meta

for example, then it will return meta data about the resource rather than the resource itself. Here is the meta data returned by this document when the meta parameter is used.

Chapter 7

User Defined Fields on the WN Server

The maintainer may associate an arbitrary field value to any document in the WN hierarchy. It is then possible to do fielded searches.

A line like:

Field3=any text here

in the index.wn file associates the text "any text here" with user defined Field3. The '3' in this example can be replaced with any number from 0 to 19. A fielded search is invoked by a URL like:

http://host/dir/search=field3

In addition to fielded searches the user supplied fields can be inserted in documents. This is done by including a field marker such as:

in the file at the point where the value should be inserted. If there is no Wrappers= or Includes= line in the index.wn file for this entry then there must be a line like:

Attributes=parse

so the server knows it is to parse the file to look for the marker. See the section "Set File Attributes" in this guide for more details.

Chapter 8

Filters on the WN Server

8.1 Processing a File as it is Served

An arbitrary "filter" can be assigned to any file to be served. A filter is a program which reads the file and has the program output served rather than the content of the file. One common use of this feature is for on-the-fly decompression. For, example a file can be stored in its compressed form and assigned a filter like the UNIX zcat(1) utility which uncompresses it. Then the client is served the uncompressed file but only the compressed version is stored on disk. As another example, you might use "nroff -man" as a filter to process UNIX man(1) utility for formatting man pages before serving. There are many other interesting uses of filters. Be creative!

A filter is assigned to a file by use of the "Filter=" directive in the file's record in its index.cache file. For example, the lines:

File=foo.gz Content-type=text/plain Filter=/usr/local/bin/zcat Content-encoding=none

cause the compressed file foo.gz to be uncompressed on the fly and served to the client as a text/plain document. Notice that it is necessary to have the content-encoding line to override the default action of wndex which is to infer from the ".gz" suffix that the content-encoding is x-gzip. If the compressed file were named simply "foo" then the content-encoding line would be unnecessary.

The "Filter=" directive takes the value of a path to a file in three different forms. If the path begins with a '/' then it is relative to the system root. If it begins with '~/' then it is relative to the WN hierarchy root, and otherwise it is relative to the directory containing the index.wn file.

If a file has a filter only that file will be filtered, not any wrappers or includes.

The ability to filter files can be restricted in several ways. If WN is invoked with the -e option then no includes, filters, or CGI/1.1 programs will be executed.

The -E option in conjunction with the -t or -T option restricts the use of filters to those listed in index.cache files owned by trusted users or groups. The -u option allows only the use of filters owned by the owner of the index.cache file which lists them.

The directory directive:

Default-Filter=/path2/filter

specifies that files in this directory should all be treated as if the "Filter=" file directive had been set to /path2/filter. To override this setting and specify no filter use the "Filter=<none>" directive.

8.2 Filters as a Substitute for CGI Programs

The filter mechanism built into WN has most of the functionality of the Common Gateway Interface (CGI) standard and has some features CGI/1.1 does not. Filters are also in some ways more secure. This makes them a good choice for applications where compatibility with the CGI/1.1 standard is not required.

The first thing to note is that there is no requirement that the filter program actually make use of the file being filtered. This file must exist though.

Thus if an empty file "foo" is created and has an index.wn file entry like:

File=foo Content-type=text/html Filter=someprogram Attributes=parse,cgi

then the output of the program "someprogram" will be served. A program used in this way differs somewhat from a CGI/1.1 program in that no headers should be supplied by the program as WN will automatically provide them. For example, while a CGI/1.1 program typically starts with printing "Content-type: text/html" followed by a blank line, this should not be done for "someprogram" in the index.wn entry above, because WN will automatically provide the appropriate HTTP/1.1 headers based on the "Content-type=text/html" line in the index.wn file.

An important difference between filters and CGI/1.1 programs is that the output of filters can be parsed while CGI/1.1 output cannot. The fact that you want the output parsed must be signalled by the use of an "Attributes=parse" line in the directory's index.wn file.

If you wish to have all the standard CGI/1.1 environment variables made available to the filter program you can do so by adding the line:

Attributes=cgi

to the file record. A list of these environment variables can be found in appendix "CGI and other Environment Variables for the WN Server" in this guide. Also see the sample CGI program which is located in the file /docs/examples/sample.cgi which accompanies the WN distribution.

One difference between CGI/1.1 programs and filters is that with filters there is no way to have a non-empty PATH_INFO environment variable since anything appended to the path part of the URL will be interpreted as a path to an actual file. Of course the "query" part of a URL (everything after a '?') will work for filters as well as CGI/1.1 programs and its contents will be put in the QUERY_STRING environment variable.

Another difference between CGI/1.1 and filter programs is in the handling of POST or PUT data. A CGI/1.1 program reads the data provided by the client on its UNIX stdin(3) stream. This is not possible for a filter since its stdin(3) is attached to the file it is supposedly filtering. To use the PUT or POST method with a filtered file, the "Attributes=post" directive must be used since otherwise the server will not permit a POST or PUT. It is then possible to read the POSTed data by opening and reading the temporary file containing this data. The name of this file changes with each request, but if "Attributes=cgi" is used then the name is given in the WN specific CGI environment variable HTTP_POST_FILE or HTTP_PUT_FILE depending on the method used to submit the data.

One advantage of using a filter instead of a CGI/1.1 program is that it may have slightly better security. With a filter the name of the executed program is never visible outside the server. It is not in any URL and it is not in any served file. Perhaps a more important feature is that no arguments can be supplied to a filter except those listed in the index.wn file filter entry. Unlike CGI/1.1 programs, it is not possible for a remote user to supply any arguments whatsoever to the program.

Chapter 9

Ranges on the WN Server

9.1 URL Range Parameters

If the server receives a request like:

http://host/dir/foo;lines=20-30

and foo is any document with content-type text/* it will return a text/plain document consisting of lines 20 through 30 of file foo. This is very useful for structured text files like address lists or digests of mail and news. A WN utility called wn_mkdigest will produce an HTML file with a list of links to separate sections (line ranges) of the structured file. It is described below.

The option ";lines=[first]-[last]" can be appended to the URL of any text file served by the WN server. It has the effect of causing the server to return line [first] through line [last] of the document to which it is attached. Line 1 is the first line of a file. This parameter is ignored and an error is logged if it is used with a binary file. When used with a text file the content type of the returned document is set to text/plain since, for example, a range of lines from an HTML document is unlikely to be a valid HTML document.

For any file, including binaries, but not CGI/1.1 output or parsed or filtered documents, the server will honor a byte range. A request for:

http://host/dir/foo;bytes=256-1024

will cause the server to return bytes 256 to 1024 inclusive. The first byte is numbered 0 so a range of 0-100 represents the first 101 bytes. Note that while file lines are numbered starting with 1, bytes are numbered starting with 0. The content type of the returned document will be text/plain if the document foo is a text document and application/octet-stream otherwise.

A request for:

http://host/dir/foo;bytes=256-

will cause the server to return bytes 256 through the end of the file, while:

http://host/dir/foo;bytes=-1024

requests the last 1024 bytes of the file.

9.2 The "`Range:`" Header

Requests for a specific range can also be made with the proposed HTTP/1.1 header "Range:". For example, if the server receives a request with the header:

Range: bytes=100-200

it will send the 101 bytes starting with the 100th and ending with the 200th. The header "Range: bytes=300-" requests all bytes starting with number 300 and continuing until the end of the file. And "Range: bytes=-300" requests the last 300 bytes of the file. This is normally done automatically by the client in appropriate circumstances and is not under the control of the server administrator.

9.3 The `wn_mkdigest` Utility

The wn_mkdigest utility is executed with two regular expressions: one to match the section separator and the other to match the section title. For example, for a mail digest named foo, the command:

wn_mkdigest foo "^From" "^Subject:" foo

will produce an HTML file called foo.index.html which contains an unordered list of anchors. Each has as a title the content of the Subject line of a mail message in foo and href the range of lines in foo containing that particular message. More detailed information is available in the section "wn_mkdigest" in this guide.

Chapter 10

Limiting Access to Your WN Hierarchy

There are two ways to limit access to your hierarchy. You can restrict access by hostname or IP address and you can restrict access to users whose name and password are in a file on your server (authentication). You can, of course, do both. To restrict access to an entire hierarchy you must restrict access to each of its subdirectories.

Warning: If access to a directory is restricted by either of the ways described here the restrictions affect only that one directory and not its subdirectories.

10.1 Access Control Files: Limiting Access by Hostname or IP Address

If you have opted to limit access to your server in this way you do so by setting the value of the Accessfile= in the index.wn file for a directory. In the directory directive part of an index.wn file, a line like:

Accessfile=~/dir/.access

specifies that the the access control file wnroot/dir/.access contains restrictions on what sites are allowed to access this directory. The Accessfile= directive takes the value of a path to a file in different forms. If the path begins with a '/' or with '~/' then it is relative to the WN hierarchy root, and otherwise it is relative to the directory containing the index.wn file in which the directive occurs. In particular the access file must be located within your WN hierarchy.

Warning: If the Attributes=serveall directive is used in a directory with restricted access be sure the access file is not serveable. You can do this by giving it a name starting with '.' or ending with '~', or better, put it in a directory from which nothing is served.

Also note that limiting access to this directory does not limit access to subdirectories. The Accessfile= line must occur in the index.wn file of each directory you want restricted. Of course, they can all refer to the same file. To use the same file for several directories be sure to use the "Accessfile=~/dir/.access" form of the directive so the line can be the same for every index.wn file.

This will limit access to the server to those clients with an IP address or subnet address listed (and not excluded) in the file .access listed in the Accessfile= directive.

If a recursive title search or keyword search is requested and some directories have restricted access only those directories which have the same access file as the directory where the search started will be searched. In fact the path must be the same in the Accessfile= directive for both directories (and must necessarily be of the form "Accessfile=~/dir/.access" or "Accessfile=/dir/.access" rather than "Accessfile=.access").

There are three possible formats for lines in the access file. First you may list the domain names of the machines using wild cards provided the machines all have proper PTR DNS resource record. For example the line:

dogbert.widget.com

allows access to one host. To allow access to all machines in the widget.com domain, use the line:

*.widget.com

Note that this will not allow access to a machine called widget.com if it exists. One would need to add in the line widget.com to allow it access.

You can also allow access by IP address and, in general, this is somewhat more secure than using the hostnames. There are two line formats for IP addresses. The first is to explicitly list an IP address like 129.111.222.123 or a subnet address like 129.111.222. or 129.111.. In case a subnet address is listed it must end with a period like:

129.111.222.

132.123.

but complete IP addresses like 129.111.222.123 should not end with a period. If a subnet address is listed any client with an IP address beginning with that subnet address will be allowed access.

The second format for IP address restriction uses a net address, net mask pair with the two parts separated by a '/'. For example:

129.111.222.0/255.255.255.0

The presence of the '/' indicates to the server that this format is being used. The part before the '/' is the "net address" and the part after is the "net mask". The server will then take the IP address of the remote client, do a logical "and" of each of its four parts with the corresponding four parts of the net mask (255.255.255.0 in this example) and check that the four results agree with the four parts of the net address (129.111.222.0). So the access file line above will match (and allow access to) precisely those machines with IP address of the form 129.111.222.x because the 'x" part is "anded" with 0 and hence becomes 0, while the first three parts are "anded" with 255 and hence unchanged, so they must equal 129, 111, and 222 respectively.

Note that if you have #define NO_DNS_HOSTNAMES in the config.h file you must use one of the IP address formats above and not the format using a domain name. This is because #define NO_DNS_HOSTNAMES causes WN never to convert IP addresses to hostnames.

You can also exclude IP addresses or domain names by prefixing them with an '!', so if the access file contained only the lines:

!speedy.acns.nwu.edu *

Access would be permitted to every machine except speedy (the * matches, and allows access to, anything). Likewise:

!129.111. !129.222.0.0/255.255.0.0 *

would allow access to everyone except those on subnet 129.111 or on subnet 129.222. In general prefixing a line (in any of the three formats) with '!' causes immediate denial of access to any matching host. The first matching line (with or without leading '!') for a host is the one which takes effect. Once a match is found access will be granted (or denied if a '!' is present) and no subsequent lines in the access file will be considered.

A line in an access file cannot exceed 255 characters in length and every line must end with a newline (some editors don't guarantee this and the last line of a file may not have a newline). A blank line at the end is fine. If these conditions are not met an error of type "Access file line overflow" will be generated.

10.1.1 Privileged Sites

You may also designate "privileged sites" in your access files. If you list a site in an access file with a '+' prefix like:

+hopf.math.northwestern.edu +123.123.123.1 +111.111.111.0/255.255.255.0

then requests from that site will be exempt from any password requirements (as described below). In other words, no username/password pair will be required for requests from these sites, even if they are required from other sites.

Obviously the '+' and '!' prefixes for access file lines are mutually exclusive.

10.1.2 Customized Error Messages

It is possible to specify a URL referring to a customized document intended as an error message when access is denied. The easiest way to do this is to place the line:

Access-denied-URL=http://host/dir/foo.html

or the line:

Access-denied-URL=/dir/foo.html

at the beginning of the access file. When this is done and a request is denied because of failure to meet the restrictions in that access file, the browser will be redirected to the URL "http://host/dir/foo.html" or "/dir/foo.html". Access-denied-URL= is also a legal directory directive which may be placed in an index.wn file.

10.2 Limiting Access by Password Authentication

You can also maintain a password file (or files) on your system and restrict access to those users who can supply a valid user name and password. This is the so-called "Basic" authentication described in the HTTP/1.1 protocol.

Warning: I would strongly advise against using basic authentication described here to protect sensitive information on a server which runs on system on which untrusted users have accounts.

Notice that if none of the options -t, -T and -u are used then a user with his own home page can make a symbolic link to any file readable by the server and that document will be served. This is true even if the linked to document is in a password protected directory with limited access or is outside the server data hierarchy.

The use of basic authentication with WN involves three additional programs which can be found in the /bin directory of the distribution. The first of these is wn_mkpasswd which is a perl utility for creating and altering password files. It should be run the first time with the command:

wn_mkpasswd -n filename

This prompts you for a realm, a username and password and then creates a password file called "filename" with that entry. On subsequent uses the -n argument should be omitted so that entries will be added to the existing file instead of starting a new one (the -n is for "new"). If a subsequent entry is made with the same user name the entry for that user will be replaced. If the "filename" argument is omitted then the default name of wnpasswd is used. There is another optional argument which may be used with this program. The command:

wn_mkpasswd -D filename

causes a UNIX NDBM database to be created or used instead of a simple flat file. This is may be useful if you have a very large number of password entries. Depending on your system, the database may reside in the two files filename.dir and filename.pag, or in a single file filename.db. The -n option has no effect when combined with the -D option. To create a new database you must remove or rename the .pag and .dir or .db files. To remove a single entry from a password file use the command "wn_mkpasswd -d filename" or "wn_mkpasswd -D filename" for an NDBM database.

The second is the perl script wn_md5passwd which is functionally the equivalent of wn_mkpasswd but which uses an MD5 hash of the password, the realm, and random data to encode the password. The realm and random data are created when wn_md5passwd is run with the -n option. This is generally preferable as it is more secure and in particular passwords with more than eight characters are significant. The options are the same as those of wn_mkpasswd.

Note: To enable the NDBM features of wnauth you will have to uncomment the lines in wnauth/Makefile starting with #DBMFLAG and #DBMLIB and recompile the wnauth program by running the UNIX make(1) utility in the /wnauth directory.

Once you have created your password file and made sure that it is readable by the user id under which the server will run, you are ready to set up the WN authentication module, called wnauth. This is done on a per directory basis by three entries in directory record of the index.wn file. Entries like:

Authorization-realm=myrealm@host.domain Authorization-module=~/cgi-bin/wnauth "~/dir/wnpasswd" Authorization-type=basic

in the directory record specify that the authentication module wnauth is being used to check user's passwords and that it should consult the password file "wnpasswd" in wnroot/dir/. If instead of the password file "wnpasswd" you are using a NDBM database "wnpasswd.dir" and "wnpasswd.pag" created with "wn_mkpasswd -D" as described above (or created some other way), then you should use the line:

Authorization-module=~/cgi-bin/wnauth -D "~/dir/wnpasswd"

The password file can also be specified with the -P option as in:

Authorization-module=~/cgi-bin/wnauth -P wnpasswd

The name of the password file can be given in three different formats: beginning with a '/ meaning it is relative to the system root, beginning with '~/' indicating it is relative to the WN hierarchy root, or something else indicating it is relative to the directory containing this index.wn file. If you use the '~/...' form it is a good idea to put the file name in double quotes as shown above to prevent the shell from trying to interpret the '~'.

A final option for wnauth is the -l filname option as in:

Authorization-module=~/cgi-bin/wnauth -P wnpasswd -l /path2/logfile

With this option the wnauth program will do its own logging in the file logfile, listing all transactions and any error conditions. This is especially valuable for diagnosing problems in authentication.

Warning: If the Attributes=serveall directory directive is used in a directory with access restricted by password, be sure the password file is not serveable. You can do this by giving it a name starting with '.' or ending with '~', or better, put it in a directory from which nothing is served.

Note that if you designate a privileged site in your access control file then any users from that site will not be requested to supply a user name and password.

For security reasons when you use wnauth or any Authorization-Module= you are required to use either the -t or -T option or the -a or -A option when the server is run and to have the index.cache file in the protected directory owned by the trusted user or group. This is to guard against counterfeit authentication modules. Note that the four command line arguments -a, -A, -t and -T all take a numeric argument. Thus the command should be "./wnsd -t 203" and not "./wnsd -t joe" if user joe has user id 203.

The Authorization-Realm= line is to notify the client that for any document on this server with the same realm as this one, the same username/password combination will be valid, so the client need not ask the user for a username and password, but can reuse the one supplied for the first document with this realm. For security reasons it is a good idea to put your host and domain name in the realm. This may at least discourage attempts at other sites to forge your realm in order to collect user passwords. Your users should also be warned never to enter their password if the realm displayed when they are prompted for a password contains a different hostname than the one in the URL they are trying to access.

If you use different realms on the same server you should be aware that popular browsers are somewhat cavalier in their treatment of realms. In particular once a username/password pair has been accepted a browser might well continue to use it on the same site without checking the realm until authentication fails. This practice of trying to guess the username/password is more efficient if the guess is correct and most of the time it is.

Also note that password protecting a directory does not protect its subdirectories. The three "Authorization" lines must occur in the index.wn file of each directory you want to protect. Of course, these lines can all be identical for different directories if you use the:

Authorization-module=~/cgi-bin/wnauth ~/dir/wnpasswd

form to specify locations relative to your WN root.

There is also support for a "group" file with authentication. This feature is invoked by using the -g and -G options with the wnauth authentication module. The line:

Authorization-module=wnauth -g grpname -G foo -P wnpasswd

means to use the group name "grpname" and the group file "foo". The group file is a file in the format of a UNIX group(5) configuration file. That is, it has lines of the form:

grpname:*:99:user1,user3,user5

where the fields are separated by colons, the first field is a group name, and the fourth field is a comma separated list of user names. wnauth will ignore the second and third fields. If the line above is in the file foo and wnauth is invoked as above then a user will be granted access provided the supplied password matches that in the wnpasswd file and the user's username is in the list after the second ':' in the line starting with the group name. Thus, in this example users user1, user3, and user5 will be given access if they provide valid passwords and other users will not.

The format of a group file used by Apache is also supported. This format has lines of the form:

grpname: user1 user3 user5

which is the group name, a single colon and a space separated list of user names.

It is possible to specify a custom error message to be sent when password authentication fails because of an incorrect password or username as in:

Auth-denied-file=~/dir/foo.html

This specifies that any request for a document in this directory which is denied because of an authorization module restriction results in the file ~/dir/foo.html being sent instead. A default value for all directories can be set by uncommenting the #define AUTH_DENIED_FILE" line in config.h and recompiling. Note that this is not a URL but the name of a file whose content is to be sent as error text when authentication is denied. If the file name starts with '~/' as above it is assumed to be relative to the WN root directory. Otherwise it is assumed to be a path relative to the directory containing the index.wn file.

The "Basic" authentication scheme is flawed in that it involves the transmission of essentially unencoded passwords over the network. It is relatively easy for unscrupulous people to obtain "sniffer" software which allows eavesdropping on all local network traffic. This means, in particular, that it is possible to intercept passwords.

This particular problem is remedied by the HTTP/1.1 Digest Authentication scheme. Digest authentication is supported experimentally by WN, but has the rather severe drawback that no publicly available clients currently support it. It is experimental, because I have no client to test it and hence it has barely been tested.

Chapter 11

Users' Hierarchies and Personal Home Pages on the WN Server

The WN server allows you to give users on the server their own private hierarchy for their home page and accompanying documents. The URL for a document in such a hierarchy contains a special string (which the maintainer may choose) to indicate that an alternate hierarchy is being used. There are three more or less mutually exclusive methods in WN to do this. One uses user names and consults the system password file to calculate a "home page" directory in the appropriate user's home directory. The second uses a table lookup to change the server root directory to one specified by the maintainer. Both of these set the data hierarchy based on information in the URL. A third method can be used on systems which support multiple IP interfaces on a single host and will use a different data root for different IP addresses by which the server is accessed.

11.1 Hierarchies in User Home Directories

A common idiom on the Web is to use a URL like "http://hostname/~john/foo.html" to request a document in a subdirectory of the home directory of user "john". WN implements this using the first of the mechanisms above.

The maintainer uses the macro #define TILDE_USER_PWFILE in the configuration file config.h, then chooses '/~' to be the special string and the server consults the system password file and changes the root data directory to "/home/john/public_html" (assuming "/home/john" is "john's" home directory) and simultaneously deletes the "/~john". Thus, the URL becomes "http://hostname/foo.html" with root directory "/home/john/public_html" so the file accessed is "/home/john/public_html/foo.html". Both the string "/~" and the subdirectory "public_html" are configurable by changing values in config.h. They are called #define TILDE_USER_STRING and #define PUB_HTML respectively.

Thus, to avoid the problematic character '~' a maintainer might set #define TILDE_USER_STRING to "/people/" and might set #define PUB_HTML to "wwwstuff". Then the URL "http://hostname/people/john/foo.html" would result in the equivalent of a URL request for "/foo.html" with a data root of "/home/john/wwwstuff". Note that #define TILDE_USER_STRING must be everything before the user name at the beginning of the URL. Thus it needs to be "/people/" and not "/people".

As a security measure the maintainer can set a variable #define LEAST_UID (with default value 100) in config.h and any attempt to use "/~user/" will fail if the user id of "user" is less than this value.

11.2 Alternate Hierarchies From a Table Lookup

A second method of establishing an alternate data hierarchy does not use the password file but a table supplied by the maintainer. This method is enabled by defining the #define TILDE_TABLE variable in config.h. This should be the complete system pathname of a file containing names and their corresponding data hierarchy directories.

For example, if config.h contains the line:

#define TILDE_TABLE /usr/local/etc/wn.dir.table

and the file "wn.dir.table" contains lines like:

john:/home/john/public_html bob:/home/bob/public_html

then the URL "http://hostname/~bob/foo.html" will be translated to a request for "/foo.html" with the data root directory set to "/home/bob/public_html". The file consulted should have lines consisting of a name followed by a ':' followed by the full system path of the desired root data directory. Of course, the name need not be the name of a user. One difference with this method is that the #define PUB_HTML value from config.h is not used and if you want it to be part of the path it should be in the path part of each entry in your table.

This method of changing hierarchies uses the #define TILDE_USER_STRING in precisely the same way that the password file method does. In other words, if #define TILDE_USER_STRING is defined to be "/people/" instead of '/~' then "http://hostname/people/john/foo.html" will be translated to a request for "/foo.html" with root data directory "/home/john/public_html".

More precisely when the #define TILDE_USER_STRING is at the start of the URI it is deleted and everything after it up to the next '/' is taken to be the name to be looked up in the table or password file. This name is also discarded and the remainder of the URI is taken to be the path of the desired document relative to the new data hierarchy root.

When you use the table lookup method to retarget data root directories of user hierarchies, you can specify a directory that is not even on the local server host. To do this, specify a lookup table entry consisting of a name, a double colon (rather than a single colon), and the URL to the user's data root directory. For example, if user "jim" has a data root that is accessed as "~jim" on "otherhost", you can specify the lookup table entry like this:

jim::http://otherhost/~jim

Then when a client sends a request like "http://hostname/people/jim/path/file.html" to your server, the server will return a redirect to the client for "http://otherhost/~jim/path/file.html".

The ability to specify non-local user hierarchies is useful when you run a primary web server (e.g. "www.your.site.edu") and want all your users to be able to advertise home page URLs that begin with "http://www.your.site.edu/" whether or not their accounts are actually on that host.

Chapter 12

Multi-homed or Virtual Servers on the WN Server

Many systems allow multiple IP addresses for a single host. For systems with this capability it is possible to use a different data hierarchy for each IP address to which your host responds. Sometimes this is referred to as "virtual hosts" or a "multi-homed" server since when combined with DNS aliasing it permits a single server to appear to be multiple servers on different hosts. Unfortunately, I cannot help you in configuring your system to respond to multiple IP addresses. (Some information about this can be found at http://www.thesphere.com/%7Edlp/TwoServers/) However, if your system supports this capability you can have different data hierarchies for each IP address if you run the stand-alone version of the server (wnsd).

12.1 One Server per Virtual Host

There are two ways to have multiple virtual hosts. Assuming that you actually have more than one network interface (i.e. more than one NIC), the easiest is to run multiple instances of the server each with a different IP address indicated on the command line via the -i option.

For example running the commands:

wnsd -i 192.168.1.1 -L abc.log /abc/root wnsd -i 192.168.1.2 -L def.log /def/root

will start two instances of the server, both on the same port (80) but responding to different IP addresses.

Alternatively,

wnsd -h www.abc.com -L abc.log /abc/root wnsd -h www.def.com -L def.log /def/root

will start two instances of the server, both on the same port (80) but responding to different IP addresses. The server will ascertain the appropriate IP address by using the UNIX gethostbyname(3) system call for the name supplied on the command line.

Note: To use this method make sure the "#define USE_VIRTUAL_HOSTS" line in config.h is commented out.

This method has the disadvantage of using somewhat more resources than the alternate method (described below) because at least one instance of the server for each host name is running at all times and thus using memory. On the other hand with this method it is possible to specify a different log file for each host, while the method below uses only one log file and a utility to separate it into the contributions of each host.

12.2 A Single Server with Multiple Virtual Hosts

The second method to implement a multi-homed server involves only a single initial instance of the server which responds to all requests and changes the document root as appropriate based on the IP address or host name to which the request was sent. To use this method you need to edit the config.h file and uncomment the line containing "#define USE_VIRTUAL_HOSTS".

Then you have two choices. You can either create a file containing a list of the virtual hosts or you can build this list into the compiled version of the sever by editing one of the source files and recompiling. If you have only a few virtual hosts and you don't anticipate adding new ones, I recommend compiling in the list. This is easy to do and much more efficient if you use wnd with the UNIX inetd(8) system utility. If you use wnsd there is little difference in efficiency.

12.2.1 Using a Virtual Hosts List File

You can either compile in the list of virtual hosts or keep this list in a file which will be read each time the server starts. The second option is described here and the first is described in the next section. To use a virtual host file you must uncomment the line containing "#define VIRTUAL_HOSTS_FILE" in the file config.h and change the quoted path to the full path of a file containing virtual host information in a format described below. Then you must recompile the server. Once this has been done you may change the file you use with the -V option to the server. But to turn this feature on, it is necessary that "#define VIRTUAL_HOSTS_FILE" be uncommented when the server is compiled. Each time this file is changed, in order for the change to take effect, you will need to restart the server or send it the SIGHUP signal with the UNIX kill(1) utility.

The format of this file is one line per virtual host. Each such line should have the form:

hostname IP_address root_path vhost_nickname user group flag

with the four parts separated by white space. For example entries might be:

myhost.com 111.222.222.111 /usr/wn/myroot myhost bob mygroup 0 client.com 111.222.222.111 /usr/wn/client client george clientgroup 1

In particular the hostname should be a fully qualified domain name. The user and group fields are the name of the user and group to be passed to suexec if it is used. If it is not these can be set to the string NULL. However, if suexec is used, both the user and group must be defined for each virtual host -- they could have the value of the user and group under which your server runs. The flag field currently has only two allowable values, 0 and 1. The value 1 indicates that user home directories are not permitted for this virtual host. Lines in this file which are empty or start with '#' are ignored. When using verbose logging the nickname will be enclosed in angle brackets and used as the last field of each log entry. If the nickname is omitted a number will be used representing the position of virtual host in the virtual host file.

By default there is a maximum of 64 virtual hosts allowed in a virtual host file. But this can be increased by editing the file wn/vhost.c and changing the line "#define MAXVHOSTS (64)". Just replace the 64 with the value you wish.

WN supports the "Host: " header implemented by some browsers (e.g. Netscape Communicator) and so-called "full URL requests". For browsers that support either of these features it is now possible to have multiple virtual hosts with a single IP address. The HTTP/1.1 protocol requires browsers to support the "Host: " header.

Using this feature requires nothing beyond setting up the server exactly as described above for virtual hosts. Of course, all your virtual hosts will have the same IP number if your system only has one. Then if a browser provides the "Host: " header (which should contain the hostname and port it is trying to access) the WN server will use the root data directory you specified for that host name. Similarly if a full URL request like "GET http://host.abc.com/dir/foo.html" is used the server will use the root data directory corresponding to "host.abc.com". If the browser provides neither of these the server will use the first root data directory whose IP number matches (which will be the first in your list if you have only one IP address).

12.2.2 Compiling in the List of Virtual Hosts

To compile in the list of virtual hosts you need to edit the file wn/vhost.h. This file is quite short and contains something like:

#ifdef USE_VIRTUAL_HOSTS
WN_CONST
char *
WN_CONST
vhostlist[][7] =
{
    { "realname.com" , "123.123.121.1", ROOT_DIR, "nickname0", "usr0", "grp0", "0" },
    { "virtual1.com" , "123.123.121.1", "/var/data1", "nickname1", NULL, NULL, NULL },
    { "virtual2.com" , "123.123.121.1", "/var/data2", "nickname2", "usr2", "grp2", "1" },
    { "another.ip.com", "123.123.123.2", "/var/data3", "nickname3", "usr3", "grp3", "1" },
    { NULL, NULL, NULL, NULL, NULL, NULL, NULL }
};
#endif

The line containing { "virtual2.com", "123.123.121.1", "/var/data2", "nickname1" }, etc., should be replaced by a line containing one of the IP addresses of your host instead of "123.123.121.1", the correct system path to the corresponding data root instead of "/var/data1", one of the hostnames that corresponds to this IP address, and a nickname to use in logging. The "usr2" and "grp2" are only used with suexec and can have the value NULL if you are not using suexec. The final entry in this line is a string representing a flag for use with virtual hosting. For the moment it should only be NULL or one of the quoted strings "0" or "1". The quoted string "1" disallows the use of any user home directories for URL's referring to this virtual host. The values NULL or "0" are equivalent and give the default behavior which allows serving from user home directories only if the server was compiled with this feature enabled. The hostname in these lines is used by the server in only two ways: when a redirect header is sent, and to pass the correct server host name to CGI/1.1 programs in the environment variable. The nicknames are used as a field of a verbose log entry. When using verbose logging the nickname (enclosed in angle brackets) will be the last field of each log entry. If the nickname is NULL then a number will be used for its value representing the position of virtual host in the virtual host file.

All the lines of this file should be changed in a similar fashion. You may have more as many lines as you wish and you should remove any of the lines you don't need. Don't change anything else. In particular make sure that the { NULL, NULL, NULL, NULL, NULL, NULL, NULL } line is unchanged and that you change only the parts inside quotation marks. If the IP address by which the server is accessed does not match any of the addresses listed in this file then the server will use the default data root (as specified when you ran the configure program or edited the value of "#define ROOT_DIR" in config.h).

The server will not produce separate log files for each IP address. However, if the server is configured to produce verbose logs then each entry is tagged at the end with the nickname of the virtual host which received the request. The wnv2c utility can then be used to produce separate log files for each IP address.

Chapter 13

WN Utility Programs

The main utility program used by WN is wndex which is used to produce the index.cache files from index.wn files. Its use is described in detail in the section on "Using the wndex Utility" in this guide. In this chapter we consider some other utilities, mostly perl programs, which are useful in maintaining your server.

13.1 `wn_mkdigest`

wn_mkdigest is a perl program which can be found in the bin directory of the distribution. This program is designed to work with the range feature of the WN server and with list searches. It produces a list of anchors or links to sections of a structured plain text document like an address list or a mail file.

Here is how it works. The wn_mkdigest utility is executed with three (or more) arguments. The first two arguments are regular expressions. The first regular expression should match the section separator of the structured file and the second should match the beginning of the line to be used as the section title. (More about this below.) The next argument is the name of a structured file, like a mail file, news digest or address list. Instead of a single structured file several files can be listed and wn_mkdigest will process their concatenation.

Now more about the regular expressions: Suppose our structured file is a mail file in its usual format with a number of messages. The first regular expression should match just the lines which are the beginning of each section (in this case each message). For a mail file a good choice would be "^From " which matches the word "From" followed by a space at the beginning of a line.

The second regular expression matches start of the line which you would like to be the title of the section. It is convenient to have the link text be everything after the occurrence of the matching pattern for this regular expression. So for the mail file we would choose "^Subject:" for this regular expression. Then the program will produce a list of links one for each message with the text in the anchor the contents of the message Subject line (minus the word "Subject:"). Each link when accessed will produce a plain text document containing just that mail message.

So if our mail file is named foo we should execute the command:

wn_mkdigest "^From " "^Subject:" foo

Note the quotation marks which are needed to get the space after "From". It produces a file named foo.index.html which consists primarily of an unordered list. Each item in the list is an anchor referring to a line range in foo -- the ranges being delimited by lines which match the first regular expression argument. In this case that means each range will start with a line beginning with "From " which is the marker in a mail file designating the start of a new message. The anchor label for each range is taken from the first line in the range which contains a match for the second regular expression and, in fact, as mentioned above, it will consist of everything on that line after the matched regular expression.

The first line of each range or section is a line which matches the first regular expression and the next matching line will begin the next section. Normally the search for the match for the anchor title regular expression begins with this first line. However, it is sometimes useful to skip this first line in the search for a title match. This can be done by starting the second regular expression with the character '$'. For example the command:

wn_mkdigest ^$ $^ foo

is a common one. It says to divide foo into sections (line ranges) which are separated by blank lines (the regular expression '^$' matches a blank line). To obtain an anchor title for each section the blank line is skipped (since the second regular expression starts with '$') and then everything on the next line is taken as the title (since '^' matches the beginning of the next line). The regular expressions of this example would be useful, for example, for an address list foo which consisted of multi-line records separated by blank lines with an individual's name on the first line of each record. The wn_mkdigest utility would then produce a foo.index.html file with an unordered list of anchors, one for each individual in the list. Selecting an anchor would present the record for that individual. Using a list search for this file would allow a form user to enter a name or regular expression and obtain a list of anchors for matching items.

The wn_mkdigest command can have any number of files listed after the regular expressions and it will produce a single file whose name is the name of the first file with ".index.html" appended. This file will contain a list of links to all the sections of all the files given on the command line.

When wn_mkdigest writes the index file (e.g., foo.index.html), it adds two HTML comments to mark the start and end of the lines containing links to the records in your structured document. The markers look like this, where VERSION is the current version of wn_mkdigest:

 

The first time wn_mkdigest writes an index file, it writes a default leader and trailer before and after the link lines. If wn_mkdigest finds an existing index file when it runs, it uses the information preceding the first marker and following the second marker as the leader and trailer for the new index file. This means you can run wn_mkdigest to create the initial index file, then edit the beginning and/or end of the file to modify the leader and trailer. Subsequent invocations of wn_mkdigest will retain your modifications each time the index file is recreated.

If you add the -b argument when you use wn_mkdigest (i.e. run the command "wn_mkdigest -b regexp1 regexp2 foo" then it will produce a file foo.index.html which uses byte ranges rather than the default line ranges. This functions the same except the server will log the number of bytes actually sent when a request is served (the server won't bother to count the bytes in a line range request).

There are fancier tools than wn_mkdigest for displaying mail archives, but this utility has great flexibility for dealing with a wide variety of structured files.

13.2 `wnpnuts`

PNUTS (pronounced "peanuts") is an acronym for previous, next, up, top, search. wnpnuts is a perl program which takes as argument the name of a file describing the hierarchical structure of a group of HTML files constituting a single virtual document. The wnpnuts program then searches these files for lines which begin with optional whitespace followed by the string:

which it replaces with this string followed by a sequence of anchors like:

[previous] [next] [up] [top] [search] [index]

with links to the relevant files in the virtual document. Actually it replaces this line with a single line starting with , followed by the anchors. That way the next time it is run, say after inserting a new chapter in your document, the  line will be replaced by a new one with the appropriate links.

The wnpnuts program is run with a command like:

wnpnuts -s dosearch.html -i docindex.html foo.pnuts

The argument "-s dosearch.html" is optional and supplies a URL for the "[search]" anchor to be substituted. Thus if just "dosearch.html" is used this will be an anchor linking to a relative URL. Instead you could use a full URL like "http://hostname/dir/file". If there is no '-s' argument then there will be no search item in the list of items inserted by wnpnuts. The optional argument "-i docindex.html" is similar to the '-s' option except it provides the URL (relative or absolute) which should be anchored to "[index]". This URL typically points to an an HTML document created with wnindexmaker.

The file foo.pnuts contains the information by which wnpnuts knows which files to process and what the order of those files should be. It consists of a list of files relative to the current directory, one per line, in the order which should be reflected in the "[next] [previous]" links. If a file is hierarchically one level lower than the previous file this should be indicated by preceding its name with one more "<tab>" character than the preceding file. Here is an example:

top.html second.html <tab>firstsub.html <tab><tab>subsub.html <tab>secondsub.html third.html

If this list is supplied to wnpnuts it will insert anchors into all these files wherever  occurs. All those named [top] will point to the file top.html. In firstsub.html and secondsub.html the [up] link will point to second.html. The [previous] and [next] links will reflect the order top.html, second.html, firstsub.html, subsub.html, secondsub.html, third.html.

13.3 `wnindexmaker`

This is a perl program whose function is to produce an index (in the usual sense not the WN sense) for a virtual document consisting of a number of HTML files in a single directory. The index to this guide is a good example of how an index produced by wnindexmaker works.

The wnindexmaker program is run with a command like:

wnindexmaker -d path -t "Index Title" -o outputfile words

Here the -d, -t and -o arguments are optional. The -t option supplies the title for the HTML document produced. If no -t argument is given then "Index" is used as the title. The -o option provides a name for the output HTML file -- the default being docindex.html.

The -d option should be the directory containing the files being indexed. It should either begin with a '/' and be relative to the WN root directory or not begin with a '/' and be relative to the directory which will contain the docindex.html file. If there is no -d option then the docindex.html file must reside in the same directory as the files being indexed. If this is done then it is a good idea to add an Attributes=nosearch to the docindex.html record in the index.wn file for the directory. Otherwise docindex.html will index itself in addition to the other files in the directory.

The final argument to wnindexmaker is the file words. It is a list of words or phrases, in alphabetical order, one per line, which you wish to appear in the index. One way to produce it is to use UNIX utilities to produce a list of all words in the files, then run UNIX sort(1) utility with the options -dfu on it and remove unsuitable words from the list.

What the wnindexmaker program does is produce a long list of anchors, one for each word in the words file. Each word is linked to a context search for itself.

13.4 `wn_uncache`

wn_uncache is a perl program which reverses the action of wndex. It will convert an index.cache file to an index.wn file. It read from its UNIX stdin(3) stream and writes to its UNIX stdout(3) stream.

Thus when invoked with:

wn_uncache <index.cache >index

it will create a file named "index.wn" (overwriting any other file of that name). This file may not be identical to the original index.wn file used to create index.cache, but when wndex is run on this new index.wn file it should produce an index.cache identical to the one used as input for wn_uncache.

13.5 `wnv2c`

The perl program wnv2c converts log files produced by the server in the verbose format to files in the common log format handled by most server statistics utilities. It also can extract the entries for each virtual host of a multi-homed server which uses different data roots for different IP addresses or different DNS names:

wnv2c [-v] [-i nickname] <verboselog >commonlog

By default this program reads from the UNIX stdin(3) stream a WN log file produced in the verbose format and writes a non-verbose one in the "common log format" to UNIX stdout(3) stream. With the "-i nickname" option it writes only those entries from the virtual host with specified nickname (e.g. if you have edited the file /wn/vhost.h to contain:

#ifdef USE_VIRTUAL_HOSTS
WN_CONST
char *
WN_CONST
vhostlist[][4] =
{
    { "realname.com" , "123.123.121.1", ROOT_DIR, "nickname0" },
    { "virtual1.com" , "123.123.121.1", "/var/data1", "nickname1" },
    { "virtual2.com" , "123.123.121.1", "/var/data2", "nickname2" },
    { "another.ip.com", "123.123.123.2", "/var/data3", "nickname3" },
    { NULL, NULL, NULL, NULL }
};
#endif

then "wnv2c -i nickname2 <logfile >log2" will create log2, the file of log entries for the virtual host with DNS name virtual2.com. If you have used the empty string in place of nicknames in the file wn/vhost.h the virtual hosts are numbered consecutively in the order they are listed, starting with 0. So you would use "wnv2c -i 2 <logfile >log2" to get the log entries for the second virtual host.

Using the "-v" option along with the "-i" option gives the verbose form of log entries for specified virtual host.

13.6 `wnredir`

The perl program wnredir is a very simple Cache-Module for use with WN. It's function is to automatically redirect requests for documents in one directory to requests for other URL's. If you put:

Cache-module=redir http://host/dir/foo.html

in the index.wn file of a directory then every request for something in that directory will be redirected to a request for http://host/dir/foo.html.

If you put:

Cache-module=redir http://host/dir/

then a request for http://this_host/this_dir/whatever.html will be redirected to http://host/dir/whatever.html. These can be useful if you move the contents of an entire directory.

Chapter 14

WN Auxiliary Modules

WN provides some features by auxiliary modules. These are separate C programs called by the server to provide some function like authorization (authentication) or indexed searches or database access. The current release of WN has a module for authorization using the HTTP/1.1 Basic Authentication scheme supported by most clients and future releases will have a module to do index searches.

The current version also has the support infrastructure or "hooks" for several services which maintainers may wish to create.

14.1 Database Modules

The index.cache in each directory is in fact a small flat database containing characteristics of that directory and the files contained in it. This can be replaced on a per directory basis by a user supplied module to provide the same information. In the index.wn file for such a directory you should create a directory directive entry like:

Cache-Module=/full/path/to/foo

The program "foo" should be an interface to your database. It is permissible to follow "foo" in this line with any desired arguments to be passed to it. The program "foo" should read the environment variable WN_KEY which will contain everything after the final '/' in the URL being served. Based on this key it should return a single line in the format of an index.cache file. Any "File=" directives in an index.wn file which contains a "Cache-Module=" directive will be ignored.

You may wish the cache module function to have access to more information about the request. If you put the directive "Default-attributes=cgi" before the "Cache-Module=" directive then all the CGI environment variables will be set.

If the file cache module encounters an error, such as no entry corresponding to the supplied key, it should return nothing and exit with a negative status. This signals the server to return an appropriate error message.

14.2 File Modules

It is perhaps more useful to have the files being returned reside in a database. To use such a database interface called "bar", in the index.wn file for its directory you should create a directory directive entry like:

File-Module=/full/path/to/bar

The program "bar" should be an interface to your database. It should read the environment variable WN_KEY which will contain everything after the final '/' in the URL being served. Based on this key it should return a document to be served (perhaps after processing with wrappers or includes). When a file module is used it is also necessary to either use a "Cache-Module=" or have an actual index.cache file with the name of all documents which might be returned by the database or to have an index.cache file created from an index.wn file containing the directory directive "Attributes=serveall". If the file module encounters an error, such as no document corresponding to the supplied key, it should return an appropriate error message of the same content-type as the requested document.

As with cache modules, you may wish the file module function to have access to more information about the request. If you put the directive "Default-attributes=cgi" before the "File-Module=" directive in your index.wn file then all the CGI environment variables will be set.

14.3 Authorization Modules

Authorization requests in WN are normally handled by an authorization module. The current release contains a Basic Authentication module and a prototype implementation of Digest Authentication, but users are invited to provide their own.

Warning: I would strongly advise against using basic authentication described here to protect sensitive information on a server which runs on system on which untrusted users have accounts.

The contents the Authorization: request header is passed to the module on the UNIX stdin(3) stream (as of version 1.10).

The server expects this module to exit with a status indicating that access is granted, denied, or that an error occurred. Anyone writing an authorization module should consult the file /wnauth/wnauth.h. The "#defines" listed there specify the error status to use for granting, denying or indicating certain errors.

To use a user supplied authorization module named authmod the index.wn file of each directory requiring authorization should contain lines like:

Authorization-realm=myrealm Authorization-module=/full/path/to/authmod Authorization-type=basic

The program authmod should indicate the granting or denial of authorization by its exit status as described above. The "Authorization-realm=" is a designation which is sent to the client with the authorization request allowing the client to decide whether the user has already supplied a password which can be reused (if the realm is the same as when the password was supplied) or if a new password needs to be entered.

14.4 Index Search Modules

Indexed searches can be supported in WN by auxiliary modules. Two such modules are provided as examples and maintainers may wish to create others. To use such a module you should have an HTML <form> action be something like:

http://host/dir/search=index

Then in the index.wn file in the directory record you should have a line like:

Search-Module=/full/path/to/searchmod

The program searchmod should read the CGI environment variable QUERY_STRING and return a partial HTML document. The typical case would be the program returns an unordered list of anchors to documents containing a match to the query string. This list can be wrapped by including a "Searchwrapper=" directory directive. If it is not, a default wrapper with text like "Here are the matches for your search." is supplied.

Two simple examples of a search-module (written in perl) are included in the distribution in the files bin/wnseven_m and bin/wnsectsearch.

Chapter 15

Clickable Images and Imagemap files on the WN Server

As of version 1.07 the WN server does not require the NCSA C imagemap.c program to support clickable images. This functionality is now built into the server. Try clicking on this image for an example.

Clickable images like this make use of a "map file" telling the server which coordinates in the image correspond to which document. The format of this file (described below) is the same as that used with the NCSA imagemap program (with a few minor enhancements).

15.1 The Steps in Setting up a Clickable Image

15.1.1 Create an Image

There are a number of image creation and editing programs that can be used. One (which I have not tried) is xpaint. The image file must be in png, jpeg or xbm format.

15.1.2 Create a Map File

This is a file containing lines specifying regions in the image and the URL to which the client should be directed after clicking in one of these regions. The format of this file is described below in detail, but roughly a line contains the name of a shape (like a rectangle) the URL to go to and the pixel coordinates necessary to specify the rectangle.

There is a commercial program called mapedit and a free tcl/tk program mapmarker (neither of which I have used) that are designed to automate this task.

15.1.3 Install the Image and Map File on Your Server

In the index file for the directory containing the image (let's call it image.png) put the lines:

File=image.png File=image.map Attributes=imagemap

If your imagemap file redirects the client to other documents on your server make sure they are also installed. Run wndex where necessary to update your index.cache.

Note: A file named foo.map (i.e. with suffix ".map") will automatically be considered to have "Attributes=imagemap" set when wndex is run unless a "Content-type=something" line is supplied for it.

This will also work if the "Attributes=serveall" directive is specified for this directory and there is no entry in the index.wn file for foo.map. But it is necessary to run wndex since the server will not recognize foo.map as an imagemap file if wndex has not been run on the directory containing it.

15.1.4 Put a Link to Your Image Map in Your HTML Document

In the HTML file which you want to contain the clickable image must put a link to it which looks like:

<a href="image.map"> <img src="image.png" ismap> </a>

assuming that "image.map", "image.png" and the file containing this link are all in the same directory.

15.2 The Format of an Imagemap File

An imagemap file is a plain text file consisting of lines, each of which tells the server what to do if a user clicks in a certain region. You will need to know the dimensions of your image and you will have to have a way of reading the coordinates of points on your image when you click on it. I use the program xv to determine the coordinates.

A complete functioning example of imagemap use, illustrating all the methods is included with the WN distribution.

An imagemap can contain comments which are lines starting with the character '#'. These lines are ignored by the server, as are lines which are blank.

All other lines must start with one of the imagemap "method" keywords: default, rect, circle, poly, or point. Here is a description of each type of method line.

15.2.1 `default` keyword

This line contains only the keyword "default" followed by white space and the URL to be referenced if none of the other other regions is selected. An example is:

default http://webhost/dir/default.html

15.2.2 `rect` keyword

This line contains the keyword "rect" followed by white space and the URL to be referenced and more white space followed by the coordinates of the upper left and lower right corners of the rectangle. When the coordinates of the clicked on point lie within the rectangle specified by these coordinates the given URL will be referenced. An example is:

rect http://webhost/dir/rect.html 20,20 70,100

Note the coordinates of the rectangle corners consist of the X coordinate (two numbers separated by a comma) followed by whitespace followed by the Y coordinate (two more numbers separated by a comma). This URL will be returned if the clicked on point lies in the rectangle with opposite corners (20,20) and (70,10). In screen coordinates (as opposed to the Cartesian plane) these are the upper left and lower right corners respectively.

15.2.3 `circle` keyword

This line contains the keyword "circle" followed by white space and the URL to be referenced and more white space followed by the coordinates of the center of the circle and the coordinates of a point on the circle. When the coordinates of the clicked on point lie within the circle specified by these coordinates the given URL will be referenced. An example is:

circle http://webhost/dir/circle.html 70,70 70,100

This URL will be returned if the clicked on point lies in the the circle with center at (70,70) and radius 30 (i.e. the circle with center (70,70) containing the point (70,100)).

15.2.4 `poly` keyword

This line contains the keyword "poly" followed by white space and the URL to be referenced and more white space followed by the a sequence of coordinates of the vertices of a polygon, in order. The polygon so determined is the one you would obtain by connecting each of the vertices in order with a straight line and then connecting the last vertex to the first. The polygon need not be convex.

Technical note: The curve formed in this way may cross itself -- a point will be considered inside if the winding number of the curve around the point is non-zero.

An example is:

poly http://webhost/dir/triangle.html 10,20 70,100 5,150

This URL will be returned if the clicked on point lies in the triangle with vertices (10,20), (70,100) and (5,150).

15.2.5 `point` keyword

This line contains the keyword "point" followed by white space and the URL to be referenced and more white space followed by the a sequence of coordinates of one or more points in the the image. This method replaces the default. The default URL referenced is the one associated with the point in a "point method" line which is closest to the clicked location. An example is:

point http://webhost/dir/point1.html 10,20 70,100 point http://webhost/dir/point2.html 100,110 80,5

If no other method has returned a URL then the returned URL will be to point1.html or point2.html depending on whether the clicked on location is closer to one of the points (10.20) or (70,100) or to one of the points (100,110) or (80,5).

Note: Allowing multiple points on a single point method line is a WN enhancement and will not be portable to other imagemap programs.

15.2.6 `nocoords` keyword

This line contains the keyword "nocoords" followed by white space and the URL to be referenced when a user with a non-graphical browser selects the the substitute text for the image. More precisely, if any browser makes a request for the document and does not include the X and Y coordinates of the point clicked on in the image then this URL will be referenced. An example is:

nocoords http://webhost/dir/message.txt

Note: The nocoords keyword and functionality is a WN enhancement and will not be portable to other imagemap programs.

15.3 The URL Format in an Imagemap File

The URL in the lines listed above can be in one of the following three forms:

http://remote_host/dir/rect.html /dir/rect.html rect.html

Here are the differences. The first, "http://remote_host/dir/rect.html" is primarily used to link to a document on a remote host. On seeing this the server sends an HTTP/1.1 redirect to the client causing the client to automatically access this URL.

The use of "/dir/rect.html" is equivalent to "http://local_host/dir/rect.html". Any URL specified in a method line which begins with a '/' is assumed to be on the local local host and the path is assumed relative to the WN root directory. An HTTP/1.1 redirect is sent because simply sending the file would confuse the client (it would think that the document rect.html is in the same directory as the imagemap file which might not be true in which case relative URLs would fail to work).

Warning: The WN root directory is changed by use of http://host/~user/ type URLs.

The use of "rect.html" or anything which does not contain ':/' (i.e. which does not start with http://... or ftp://...) and does not start with '/' causes the server to assume that the document is relative to the directory containing the imagemap. In this case the server will immediately serve that document with no redirection. This is the most efficient way to set up an imagemap, but it requires that the response files like "rect.html" be in the same directory as the imagemap file. There is one exception to this. If the relative URL contains a '#' the server prepends the appropriate path to the URL and sends an HTTP/1.1 redirect to the client. This is necessary for the client to be able to handle the "name" anchor indicated by the '#'.

Note: Allowing relative URLs of this type is a WN enhancement and will not be portable to imagemap programs run on other servers. Also note that using a relative URL like "dir/rect.html" referring to a document in a subdirectory can confuse a browser about the location of documents referred to in "rect.html". For this reason it should only be used when the document is not an HTML document or contains a <base> tag specifying its location.

There is one special URL which may be used in map files, namely <null>. It has the effect of not sending a redirect or a document to the client but instead sending a header with status 204. This instructs the client to do nothing and in particular not to reload or alter the document it is displaying. For example, the line:

default <null>

in a map file will have the effect of making clicks in the default area behave as if the click had been somewhere outside the image, i.e. nothing will happen.

Note: the <null> URL is a WN enhancement and will not be portable to imagemap programs run on other servers.

Chapter 16

Using CGI Programs on the WN Server

CGI stands for Common Gateway Interface. It provides a standard for Web servers to interact with programs which are not part of the server but may produce output which you wish to serve.

16.1 Do You Need a CGI Program?

Many functions which are done by CGI programs on other servers are built in features of WN. If your needs can be met by these features then not only will you save yourself considerable effort in creating, setting up, and maintaining programs, but the built in feature will perform much more efficiently and much more securely than a CGI program.

These features include the ability to respond with different text or entirely different documents based on the the client request, the client's hostname, IP address, user-agent, or the "referer", the document containing the link. For information about this see the chapter "Parsed Text and Server Side Includes on the WN Server" in this guide. Also support for "imagemaps" or clickable images is built in so there is no need to use CGI for this. See the chapter "Clickable Images and Imagemap files on the WN Server" in this guide. Finally WN supports a variety of methods of searching your data including by title, keyword, or full text. See the chapter "Setting Up Searches on the WN Server" in this guide.

If these features do not meet your needs and something like a CGI program will, then you may wish to consider using a WN filter. These have most of the functionality of CGI programs, but are somewhat more secure and have one advantage: the output of filters can be parsed while CGI output cannot.

16.2 How Does the Server Recognize a CGI Program?

It would be nice if one could simply indicate in the appropriate index.wn file that a particular file is a CGI program which should be executed rather than served. Unfortunately, the CGI protocol makes it impossible to implement this in an efficient way.

There are two mechanisms in fairly common use with other servers for indicating that a file is a CGI program and WN supports them both. The first is to give the file name a special extension (by default it is ".cgi") which indicates that it is a CGI program. Thus any file you serve with the name "something.cgi" will be treated as a CGI program. The special extension ".cgi" can be changed by redefining the macro "#define CGI_EXT" by editing the file config.h and recompiling servers.

The second mechanism is to have specially named directories with the property that any file in that directory will be assumed to be a CGI program. The default for this special name is "cgi-bin". Thus, if you have a directory /cgi-bin in your hierarchy the server will assume that any file served from that directory is a CGI program. Of course, as always, only files listed in that directory's index.wn file will be servable. No files in subdirectories of /cgi-bin can be served. This is because the server will alway interpret a request for "/cgi-bin/foo/bar" as meaning run the program "/cgi-bin/foo" with the PATH_INFO CGI environment variable set to "bar". Thus if "foo" is actually a directory and "bar" a file in it, the request will fail.

There is no need for /cgi-bin to be at the top of your hierarchy. It could be anywhere in the hierarchy. And, in fact, you can have as many directories named "cgi-bin" as you like. They will all be treated the same. The special name "cgi-bin" can be changed by redefining the macro "#define CGI_BIN" by editing the file config.h and recompiling servers.

16.3 How Does a CGI Program Work?

It is beyond the scope of this document to provide an extensive tutorial in writing CGI programs. There is an online tutorial at WDVL.internet.com and another available from NCSA. A collection of links to CGI information is available at www.stars.com.

We will provide only a simple example of a CGI program written in perl. More examples can be found in the /docs/examples directory of the WN distribution.

#!/usr/local/bin/perl # Simple example of CGI program. print "Content-type: text/html\r\n"; # The first line must specify content type. Other # optional headers might go here. print "\r\n"; # A blank line ends the headers. All header lines should # end with CRLF ("\r\n"), but other lines don't need to. # From now on everything goes to the client print "<body>\n"; print "<h2>A few CGI environment variables:</h2>\n\n"; print "REMOTE_HOST = $ENV{REMOTE_HOST}<br>\n"; print "HTTP_REFERER = $ENV{HTTP_REFERER}<br>\n"; print "HTTP_USER_AGENT = $ENV{HTTP_USER_AGENT}<br>\n"; print "QUERY_STRING = $ENV{QUERY_STRING}<br>\n"; print "<p>\n"; print "</body>\n";

Notice that the first thing the program does is provide the HTTP/1.1 "Content-type:" header line. It may be followed by other optional headers you want the server to send. The end of these headers is indicated by a blank line. Of course the server will add additional headers.

By default the WN server assumes that the output of any CGI program is "dynamic" or different each time the program is run and is also "non-cachable". Hence the server behaves as if the "Attributes=dynamic,non-cachable" directive had been used. The "Attributes=dynamic" causes the server not to send a last modified date or a content length since they might be constantly changing. The "Attributes=non-cachable" attempts to dissuade clients and proxies from caching the output by sending an appropriate HTTP header.

If, in fact, the output of your program is always the same, you can use the "Attributes=nondynamic" directive. Also if you wish it to be cached you must use the "Attributes=cachable" directive. In particular, if you want the browser "back" button to return users to a a CGI generated page after they have followed a link you may need "Attributes=cachable" (especially with an HTML "<form action="post">") since otherwise the browser may not even cache the page in memory.

The program above is a good example of one which should not be cached as it prints out the client's hostname, user agent and the URL of the document which contains the link to this CGI program. The CGI program gets this information about the client from environmental variables set by the server. A complete list of the standard CGI environment variables and a description of what they contain plus a description of some additional non-standard ones supplied by the WN server can be found in the appendix "CGI and Other Environment Variables on the WN server" in this guide.

In addition to setting these environment variables appropriately the server will change the current working directory of the CGI process to the directory in which the CGI program is located.

Note: In general a CGI program has complete control over its output, so it is responsible for doing things which the server might do for a static document. This means that you cannot use many of the WN features with CGI output. In particular the server will not use a filter or parse it for "", etc. The CGI program must do these things for itself. Also the server will not provide ranges specified in the "Range:" header. Instead the contents of this header is passed to the program in the environment variable HTTP_RANGE, so the program can do the range processing.

One thing you should be aware of in writing programs is that the WN server does not send the UNIX stderr(3) stream to the error log file, but leaves its default the terminal from which the server is invoked. This allows the maintainer to set it to a file of her choice or leave it directed to the console window in which wnsd was invoked. To redirect it to a file called "my.errs" simply run wnsd with a command like:

wnsd <options> 2>my.errs

if you are using a UNIX sh(1) Borne-like shell. This can be useful when debugging CGI programs because their errors are typically sent to the UNIX stderr(3) stream so you can easily view them with the UNIX tail(1) utility like:

tail -f my.errs

rather than have them buried in a log file.

16.4 CGI Handlers

Sometimes you may have a number of files which are to be processed by the same CGI program or program. In that case you might consider designating a "handler" for these files instead of putting the the name of the CGI program in the URL for each of them.

The file directive:

CGI-Handler=bar.cgi

causes the program "bar.cgi" to be run and its output to be served in place of the document requested. This is a way to designate a CGI program to handle a file somewhat like a filter. The name of the program need not be in the URL since it is in the index.wn file. So when http://host/foo.html is requested this will cause the handler, bar.cgi, to be run with the CGI environment variable PATH_INFO set to /path2/foo.html. In normal use the program bar.cgi will do something to the file foo.html and serve the output. It is useful if you want a number of files in a directory to be handled by the same CGI program. Note the file foo.html need not be used in any way by the program, but it must exist or else the server will treat it as a non-existent file.

The directory directive "Default-CGI-Handler=handler.cgi" specifies that all files in the directory should be treated as if the "CGI-Handler=" file directive had been set to handler.cgi. To override this setting and specify no CGI handler use the "CGI-Handler=<none>" directive.

16.5 How Can CGI Programs be Made Safe?

This is an extremely important issue, but one which is beyond the scope of this document. I highly recommend the Safe CGI Programming maintained by Paul Phillips and the WWW Security FAQ maintained by Lincoln Stein.

Chapter 17

Support for the WN Server and Thanks

17.1 Support

Unfortunately, I do not have time to answer all the questions sent to me by WN users. I recommend that you subscribe to the mailing list of WN maintainers and ask your questions there. It is read by many knowledgeable WN users (including me).

To subscribe with the address you're sending mail from, send a message to majordomo@cyberwerks.com containing the line:

subscribe wn-maint your.email.address

The people reading this list seem to be very helpful.

17.2 Thanks

I would like to thank the many people who have aided in the creation of the WN package. These people include Earle Ake, David Capshaw, Henry Cejtin, Mike Crowley, Chris Davis, Paul DuBois, Mike Gallaher, Doug Hardie, Stephen Hebditch, Hamish MacEwan, Jishnu Mukerji, Marko Nordberg, Jeff Norden, Willem Pinckaers, Ron Pool, Kenji Rikitake, Jim Rees, Craig Milo Rogers, Chip Rosenthal, Sander Schippers, Kurt Siegl, Doug Simpkinson, Stephen Trier, Rico Tudor, Peter van Dijk, Pim van Riezen, and William Wells.

Special thanks are due to Jean Pierre LeJacq, who has worked extensively on the documentation and Paul Hoffman who maintains the mailing list.

Appendix A.1

Command Line Options for the WN Server

These are the command line options for both the UNIX inetd(8) system utility based daemon, wnd, and the stand-alone daemon, wnsd.

-a numeric_uid -- Set trusted numeric uid for authentication

The server will only allow password authentication if one of the options -a, -A, -t, or -T is used. Then the index.cache file for a password protected directory must be owned by the user id specified with the -a (or -t) option. The -a option only affects authentication while -t has other effects. The four command line arguments -a, -A, -t, -T all take a numeric argument. Thus the command should be "./wnsd -a 203" and not "./wnsd -a joe" if user "joe" has user id "203".

-A numeric_gid -- Set trusted numeric gid for authentication

This is similar to the -a option except the group owner (gid) of the index.cache file is used instead of the uid.

-d -- Really verbose log

When the server is run with this option it will include copies of all the client headers plus some additional information in the log file. This is only useful for debugging or if you are curious about what clients actually send.

-e -- Forbid CGI and execs

When the server is run with this option it will not execute a CGI program, a filter or any executable in an include. In addition the server will reject any request with the HTML <form action="post"> method.

-E -- Restrict CGI and execs to trusted users

This is similar to the -e option except that index.cache files owned by a trusted user id or trusted group id (set with the -t or -T option) are exempt from the restrictions. That is, only programs listed in index.cache files owned by the trusted user or group will be executed. If neither the -t or -T are used then no programs, filters or includes will be executed. This option takes precedence over the -u option, i.e. if both are used execution of programs and programs behaves as if the -u were not present (although the -u option still grants permission for the serving of ordinary files).

-F -- Run server in foreground

The server (wnsd only) will run in the foreground rather than the background which is the default. This is useful for things like daemontools.

-h hostname -- Set hostname

Tell the server that its name is hostname. This is only used for a few things: to pass to CGI programs and for redirects generated when a trailing '/' is omitted in a URL path to a directory. Also if the -i option is not used then the server will ascertain the IP address on which to listen for connections by using the first IP address obtained from the UNIX gethostbyname(3) system call with this name.

-i Listen addresses -- Set the IP addresses and ports on which the server listens (wnsd only).

Tell the server to listen on a set of address/ports. Since IPv6 has multiple IP addresses for each interface, multiple address/port combinations can be specified. The format for each entry is 'address/port' where address can be a IPv4 or IPv6 address, or a name resolvable via DNS. Port can either be a port number or the port name from the /etc/services files. Entries are separated by commas and generally should be enclosed in quotes. An example:

-i '2001:a:b:c::5/80, server.org.com/http-alt, 10.0.2.3/http'

In additon, "all" may be used for the address which will then listen on all addresses for IPv4 and all non-local IPv6 addresses: example: 'all/80'.

-L logfile -- Set log file name

Write log information to the file logfile. This will override the value specified when you ran the configure program or by setting the macro #define WN_LOGFILE in config.h. If the empty string "" is used as the value of this option then no transaction logging will be done.

-l error_log -- Set error log file name

Write information about errors to the file error_log. This will override the value specified when you ran the configure program or by setting the macro #define WN_ERRLOGFILE in config.h. If the empty string "" is used as the value of this option then the log file given with -L is used for errors as well as regular transactions.

-n effective_uid -- Set effective user numeric id (wnsd only)

This option only has an effect when the server is run by root in the stand-alone version, i.e. wnsd. When invoked with the -n option and a numeric user id (not a user name) the server will change the user id under which it runs to the specified numeric value. If you do not use this option the user id is changed from root to the value you set when you ran the configure program (this is usually the numeric id of the user "nobody"; the 'n' in -n is a mnemonic for "nobody") or edited #define USERID in config.h. The -n option allows you to override that default when you run the server. In normal use this option is not necessary and should not be used unless you have a specific need.

-N effective_gid -- Set effective group numeric id (wnsd only)

This option only has an effect when the server is run by root in the stand-alone version, i.e. wnsd. This option is identical to the -n option except it sets the numeric group id under which the server runs rather than the user id.

-p Removed -- Replaced by the -i option above.

-P -- Enable the PUT, MOVE and DELETE methods on the server.

If this option is not used the server will reject any requests for the HTTP methods PUT, MOVE and DELTE.

In the directory where objects are to be PUT, the "Put-Authorization-Module, Put-Authorization-Realm, and Put-Authorization-Type directives must be used. Finally in this same directory, the "Default-Attributes=put" directive must be used (or in the case of a single file the " Attributes=put" directive.)

-q pid_file -- Set file name which contains the (wnsd) server process id.

Write the process id or "pid" of the main server process to the file "pid_file". This file name should be the full path relative to the system root. If this is not set either with the -q option, via the configure program, or by editing the macro #define SWN_PID_FILE in the config.h file then the pid will be written to the UNIX stdout(3) stream when the server is started.

-S -- Use the UNIX syslogd(8) system utility for logging (deprecated).

This option is deprecated; use the -v option instead.

Instead of writing log messages to a file specified with -L or in config.h, use the UNIX syslog(3) facility. If #define WN_ERRLOGFILE is set to "" and the server is run with the -S option then error logging will be handled by the syslog(3) facility. If #define WN_ERRLOGFILE is given a value or the -l option is used then errors will be logged there rather than using the UNIX syslogd(8) system utility.

-t trusted_uid -- Set trusted numeric uid

When invoked with the -t option alone wnd or wnsd will not serve a document unless the index.cache file listing it has the prescribed owner (uid). This numeric uid should be that of the maintainer not the one under which wnd or wnsd runs if started by root. Indeed, for security reasons the server will refuse to use an index.cache file whose owner is the uid under which the server is running, in this case. If on your server all index.cache files are created by a single user or a single group this option or the -T option are highly recommended. This added security is weakened somewhat if you also use the -u option which allows index.cache files owned by untrusted users to permit the serving of files owned by the same user. If both -u and -t are used the trusted user specified by -t is exempt from the restrictions imposed by the -u argument. The four command line arguments -a, -A, -t, -T all take a numeric argument. Thus the command should be "./wnsd -t 203" and not "./wnsd -t joe" if user "joe" has user id "203".

-T trusted_gid -- Set trusted numeric gid

This is similar to the -t option except the numeric group owner (gid) of the index.cache file is used instead of the numeric uid.

-u -- Restrict untrusted users

When this option is invoked the server requires that every file served (including wrappers and includes) have the same owner as the index.cache file which grants it permission to be served. This means that untrusted users can only serve files which they own. If the -t or -T option is used with -u then index.cache files owned by the trusted user or trusted group are exempt from this requirement and they may grant permission to serve any file the server can read. If the -u and -E options are used together then the -E takes precedence for execution of CGI program, a filter and executable includes, but the -u still has effect for ordinary files being served.

Notice that if none of -t, -T, and -u are used then a user with his own home page can make a symbolic link to any file readable by the server and that document will be served. This is true even if the linked to document is in a directory with limited access or is outside the server data hierarchy.

-v log_format -- Set log format

The legal values for this option are "nolog", "common", "verbose", "ncsa", "syslog", and "vsyslog". The first causes no logging to take place. The next three cause the log file to be written in the so-called common log format, or WN's verbose format including user agent, referrer, virtual server nickname, and cookies, or in the NCSA extended format which includes referrer and user agent. The last two cause logging to be done by the syslogd(8) daemon and in the case of vsyslog to include the information in the verbose format.

Each base log type may be optionally followed by a colon and either nodns or revdns. If neither of these is present then the default server action is to do a DNS lookup on the client's IP address to obtain the hostname of the client for logging purposes. If the :revdns extension is present the server will additionally do a reverse DNS lookup on this name as a check against name spoofing. If the :nodns extension is present the server will do no DNS lookup and will use the IP address in the log instead of the host name. For example the option "-v verbose:nodns" indicates that the server should use the verbose log format, but should use IP addresses rather than host names in log entries. Obviously, use of :nodns is more efficient and :revdns is less efficient than the default.

The default can be set with #define VERBOSELOG macro in config.h.

-V virtual_host_file -- Set file name which contains the list of virtual hosts

The file "virtual_host_file" should be the name of the file containing the list of "virtual hosts" and their corresponding IP numbers and root directories. The format of this file is one line per virtual host. Each such line should have the form:

hostname IP_address root_path

with the three parts separated by white space. For example an entry might be:

myhost.school.edu 111.222.333.444 /var/wn

In particular the hostname should be the fully qualified domain name. Lines in this file which are empty or start with '#' are ignored.

If the virtual host file is changed you will need to restart the server for the change to take effect.

Appendix A.2

Command Line Options for WN Server `wndex` Utility

These are the command line options for the wndex utility used to process index.wn files.

-a -- Serveall (mnemonic "all").: Using the -a option is equivalent to putting the directive Attributes=serveall at the beginning of the index.wn file (or files) being processed, with one important exception. If wndex is used with this option in a directory which contains no index.wn file, it will behave as if there were an index.wn file containing only the line Attributes=serveall. This allows the creation of an index.cache file permitting access to all files in a directory without first creating an index.wn file. This can be used with the -r option to grant access to a directory hierarchy without the need to create an index.wn file in each directory.
-c file_name -- Use "file_name" instead of "index.cache".: Causes wndex to write its output to the file "file_name" rather than the default "index.cache".
-d dir_name -- Process directory "dir_name".: When wndex is run with this option it will process the directory "dir_name" rather than the current directory. It will use the index.wn file in "dir_name" and write the file "dir_name/index.cache".
-i file_name -- Use "file_name" instead of "index.wn".: Causes wndex to read its input from the file "file_name" rather than the default "index.wn".
-q -- Use quiet mode.: Causes wndex to suppress any warning messages (except fatal errors) sent to UNIX stderr(3) stream as well as other informational messages.
-r -- Recursively process subdirectories.: Causes wndex to recursively descend through some subdirectories and process them as well. By default it processes only those subdirectories listed in the Subdirs= directory directive. However, this can be changed to be either all subdirectories or all subdirectories containing an index.wn file by use of the -s option. If the -r and -a are used together then the default is to use all subdirectories.
-s all|index -- Set the value of the Subdirs= directory directive automatically.: Using the "wndex -s index" command with an index.wn file containing no Subdirs= directory directive is equivalent to having such a directive whose value is a list of all subdirectories which contain an index.wn file. Using the "wndex -s all" command with an index.wn file containing no Subdirs= directory directive is equivalent to having such a directive whose value is a list of all subdirectories of the directory being processed. If the -r and -a are used together then "wndex -s all" is automatically assumed. This can be overridden by using a real Subdirs= directory directive or using "wndex -s index".
-v -- Use verbose mode.: Causes wndex to be extra verbose.
-V -- Print version.: Causes wndex to print its version and exit.
-x -- Use UNIX stdout(3) stream.: Causes wndex to read from the UNIX stdin(3) stream instead of the file index.wn and write to the UNIX stdout(3) stream rather than the file index.cache.

Appendix B

Index File Directives for the WN Server

This is a list of the items which may be placed in an index.wn file to be processed by wndex. This file consists of a collection of records each of which consists of a group of lines pertaining to single file. Each line of a record begins with a directive like "Title=" which indicates that the remainder of that line is to be take as the title of the document whose record contains this line. The "File=" directive is special in that it indicates the beginning of a new record. The value of the "File=" directive is the name of the file whose record will follow. Letter case is not significant in directive keywords.

When the character '#' is encountered in an index.wn file it is assumed to be the start of a comment and everything after it on that line is ignored. To include the '#' character in, for example, a document title, it must be escaped with the '\' character. That is. when "\#" is encountered it does not signify a comment and the character '#' (without the backslash) is treated as a normal character. In fact, since all directives contain the character '=', all lines which do not contain this character are silently ignored. Also a single conceptual line of an index.wn file can be spread over several actual lines by ending all but the last line with the '\' character. That is, if a line ends with '\' that character is removed and the contents of the next line is considered a continuation of the current line. The maximum allowed length of a line (including continuation) is 1024 characters. The maximum allowed length of all the records corresponding to one document is 8192 characters.

The first record in an index.wn file is special and is intended to describe attributes of the entire directory rather than individual files. It contains lines with directives specifying attributes of the directory as a whole or all the files in it. The next section is a complete list of these directory directives.

B.1 Directory Directives

Accessfile -- Specify directory access control file.

The line:

Accessfile=/dir/accessfile

specifies that the file /dir/accessfile is to be used to determine access privileges (by hostname or IP address) for this directory. If this line is omitted access is allowed for everyone. Both the path /dir/accessfile and the path ~/dir/accessfile are taken relative to the WN root directory. In particular the accessfile must be in the WN hierarchy (unlike includes or filters, for example.) If the path does not begin with a '/' or a '~' then it is relative to the directory containing the index.wn file. See the chapter "Limiting Access to Your WN Hierarchy" in this guide.

Access-denied-URL -- Set URL for requests for which access is denied.

The line:

Access-denied-URL=http://host/dir/foo.html

or the line:

Access-denied-URL=/dir/foo.html

specifies that any request for a document in this directory which is denied because of an "Accessfile=" restriction should be redirected to the given URL. A default value for all directories can be set by uncommenting the "#define ACCESS_DENIED_URL" line in config.h and recompiling. If you use this directive be sure that the file foo.html does not have restricted access or you can create an infinite loop. This line has the special feature that it can also be placed as the first line of the "Accessfile=" controlling the directory. A line in the accessfile will override any value set in the index.wn file.

Attributes -- Set directory attributes.

Currently there are only two directory attributes, viz. "nosearch" and "serveall". Letter case is not significant in the attribute value.

Attributes=serveall: Specifies that any file, with a few exceptions, in this directory may be served not just those listed in the index.wn file. The server will attempt to set the content type correctly based on the file name suffix using the same default correspondences between type and suffix that wndex uses. The exceptions are that files whose name starts with '.' or ends with '~' as well as the files "index.wn" and "index.cache" will not be served.

Note: When this directive is used in a directory protected by an "Accessfile=" or a password file be sure that these files have names that start with '.', or contain a '~'. Or better, put these files in a different directory from which nothing is served.
Attributes=nosearch: Specifies that the index.cache databases in the current directory and its subdirectories should not be searched when the server does a title, keyword or user supplied field search. Likewise context and grep searches will not be allowed in this directory. In this case when an attempt is made to do so an error message is returned to the client. It is also possible to exclude only some files from searching with the "Attributes=" file directive.

Authorization-Module, Authorization-Realm, Authorization-Type -- Specify authorization module.

Currently WN includes a "basic" authorization module called wnauth. Its use is described in the chapter "Limiting Access to Your WN Hierarchy". Alternatively you can make your own module to handle authorization. Data is passed via standard input to this module. More specifically such a module should expect to read Basic user:password on standard input for Basic authentication, where "user" is the client supplied user name and "password" the client supplied password. The WN server expects this module to exit with status 0 if authorization is granted and with status 1 if access is denied. See the source of wnauth in /wnauth/wnauth.c for a detailed example.

For security reasons when you use an "Authorization-Module=" you are required to use either the -t or -T options or the -a or -A options and to have the index.cache file in the protected directory owned by the trusted user or group. This is to guard against counterfeit authorization modules.

Auth-denied-file -- Specify the name of an HTML file to be used as the error message when an authentication attempt for a password protected directory fails.

The line:

Auth-denied-file=~/dir/foo.html

specifies that any request for a document in this directory which is denied because of an authorization module restriction results in the file ~/dir/foo.html being sent instead. A default value for all directories can be set by uncommenting the "#define AUTH_DENIED_FILE" line in config.h and recompiling. Note that this is not a URL but the name of a file whose content is to be sent as error text when authentication is denied. If the file name starts with '~/' as above it is assumed to be relative to the WN root directory. Otherwise it is assumed to be a path relative to the directory containing the index.wn file.

Cache-Module -- Specify program to be used as interface to database for index.cache entries.

If this line specifies a program then instead of looking for file entries in the index.cache file this program is executed after putting the base name of the URL in the environment variable WN_KEY. This provides a mechanism to use a real database rather than the file index.cache. Note that the directory directives are still obtained from index.cache. The output of this module must be in the format of an index.cache line. Title, keyword and grep are not supported since that would require reading the entire database.

Default-Attributes -- Specify the default value of file attributes directive for every file served from this directory. This directive should not be confused with the directory attributes directive.

The line:

Default-Attributes=parse,dynamic

specifies that files in this directory should be parsed and marked as dynamic documents unless they have an attributes directive specifying the contrary.

Default-CGI-Handler -- Specify a default value for the "CGI-Handler=" file directive.

The line:

Default-CGI-Handler=/dir/handler

specifies that files in this directory should all be treated as if the "CGI-Handler=" file directive had been set to /dir/handler. To override this setting and specify no CGI handler use the "CGI-Handler=<none>" directive.

Default-Charset -- Specify the default character set to be added to the MIME type for documents with content type text/*.

The line:

Default-Charset=iso-8859-1

specifies that documents in this directory which have a MIME type of text/* should have a character set parameter with this value appended. E.g. a document of type text/html will be sent with a content type header

Content-type: text/html; charset=iso-8859-1

The default value for this is iso-8859-1.

Default-Content -- Specify the default MIME content type for items in this directory.

The line:

Default-content=text/html

specifies that files in this directory which do not end in a suffix recognizable to wndex should be given the type "text/html". Any legitimate MIME type may be used as the value.

Default-Cookie -- Specify the default cookie or cookie generating script for items in this directory.

The line

Default-Cookie=name=value

or the line

Default-Cookie=!my_cookie_script

specify that files in this directory with no Set-Cookie directive should be treated as if they had such a directive with the value name=value or !my_cookie_script.

Default-Document -- Specify the default document for this directory.

The line:

Default-Document=foo.html

specifies that a URL pointing to this directory like http://host/dir/ will result in serving the document wnroot/dir/foo.html instead of wnroot/dir/index.html. Uses of this include making the default document a CGI program with "Default-Document=foo.cgi" or having a directory with HTML files all ending with the suffix ".htm" and using the directive "Default-Document=foo.htm". This directive applies only to the directory containing the index.wn file, not to any subdirectories.

Default-Filter -- Specify a default value for the "Filter=" file directive.

The line:

Default-Filter=/path2/filter

Default-Includes -- Specify a default value for the "Includes=" file directive.

The line:

Default-Includes=footer.html

specifies that this line should be used as the "Includes=" directive for any document in this directory which does not have an "Includes=" directive explicitly set. To override this default value simply specify an explicit "Includes=" directive or use "Includes=<none>" to have none.

Default-List-Includes -- Specify a default value for the "List-Includes=" file directive.

The line:

Default-List-Includes=header.html,footer.html,disclaimer.html

specifies that this line should be used as the "List-Includes=" directive for any document in this directory which does not have an "Includes=", "Wrappers=", or "List-Includes=" directive explicitly set. To override this default value simply specify an explicit "List-Includes=" directive or use "List-Includes=<none>" to have none. Note that the example above grants permission for the inclusion of the three files listed. It does not require their insertion. However, it does cause all files in the current directory to be parsed for includes unless this "Attributes=" is overridden.

Default-Max-Age -- Specify the default value for the "Max-Age=" file directive.

The line:

Default-Max-Age=2 weeks

specifies the Cache-Control and Expires headers of all documents served from this directory should be set to expire the document 2 weeks after it is served.

The line:

Default-Max-Age=2 weeks after last-mod

specifies the Cache-Control and Expires headers of all documents served from this directory should be set to expire the document 2 weeks after the last-modified date of the document. For more details see the "Max-Age=" file directive.

Default-Wrappers -- Specify a default value for the "Wrappers=" file directive.

The line:

Default-Wrappers=wrapper.html

specifies that this line should be used as the "Wrappers=" file directive for any document in this directory which does not have a Wrappers= directive explicitly set. To override this default value simply specify an explicit "Wrappers=" directive or use "Wrappers=<none>" to have none.

File-Module -- Specify program to be used as interface to database for obtaining files.

If this line specifies a program then instead of looking for a file in the current directory this program is executed after putting the base name of the URL in the environment variable WN_KEY. The output of this program is served as if it were a file. This provides a mechanism to use a real database rather than the file index.cache.

If you wish the file module to have access to all the standard CGI environment variables then use the directive "Default-Attributes=cgi" with the File-Module= directive

Logtype -- Set the type of log entries to be used for this directory matches.

As an example, the line:

Logtype=verbose

will cause logging to be done with WN's verbose log format with no DNS hostnames only IP addresses in the log entries. The base values possible for this directive are nolog, common, verbose, ncsa, syslog and vsyslog which respectively have the effect of turning off logging or using the common log format, WN's verbose logging, or the NCSA log format in the log file or using the syslogd(8) daemon with a standard or verbose format. See the "Managing Log Files" section of this manual. The default value, used when this directive is not present is set in the source file config.h when the compile configure script is run.

Logtype=verbose:nodns

indicates that the server should use verbose log format, but should use IP addresses rather than host names in log entries. Obviously, use of :nodns is more efficient and :revdns is less efficient than the default.

Nomatchsub -- Set substitute file for searches on this directory which result in no matches.

The line:

Nomatchsub=foo.html

specifies that the HTML file foo.html in the current directory should be used for the output of all searches (title, keyword, context, grep, etc.) on this directory which return no matches. It can only be used in conjunction with the "Searchwrapper=" file directive. If Nomatchsub= is used and a "Searchwrapper=" has not been defined an error is logged and the nomatchsub file is ignored. The file foo.html must be in the directory being searched and its name must not contain a '/'. See also "Nomatchsub=" for files.

No-Such-File-URL -- Set substitute URL for requests for non-existent or unservable files.

The line:

No-Such-File-URL=http://host/dir/foo.html

or the line:

No-Such-File-URL=/dir/foo.html

specifies that any request in this directory for a non-existent file or a file not listed in the index.wn file of this directory should be redirected to the given URL. A default value for all directories and non-existent directories can be set by uncommenting the "#define NO_SUCH_FILE_URL" line in config.h and recompiling. The value set here will also be used if an index.cache file does not exist. If you use this directive be sure that the file foo.html does exist or you can create an infinite loop.

Owner -- Specify owner of directory items.

This should be a line like:

Owner=mailto:maintainer@host

The "mailto:maintainer@host" may be replaced with a URL referring to the individual who is responsible for the documents in this directory. This information is used in an HTTP header. It is not possible to designate the owner of a single file in a file directive. However, if the file is an HTML file this can be done with a <link> tag in the header of that file.

Put-Authorization-Module, Put-Authorization-Realm, Put-Authorization-Type -- Specify authorization module for PUT, MOVE and DELETE methods.

In order for the server to honor the PUT, MOVE, and DELETE methods of HTTP, the server must be run with the -P option and in addition an authorization mechanism is required. This can be the same mechanism used for access authorization, or different. Currently WN includes a "basic" authorization module called wnauth. Its use is described in the chapter "Limiting Access to Your WN Hierarchy". Alternatively you can make your own module to handle PUT authorization. Data is placed in CGI environment variables. WN expects this module to exit with status 0 if authorization is granted and with status 1 if access is denied.

For security reasons when you use a "Put-Authorization-Module" you are required to use either the -t or -T options or the -a or -A options and to have the index.cache file in the protected directory owned by the trusted user or group. This is to guard against counterfeit authorization modules.

Finally in the directory where the Put-Authorization-Module directive is used the "Default-Attributes=put" directive must be used (or in the case of a single file the "Attributes=put" directive.)

Put-handler -- Specify a module to handle the PUT, MOVE and DELETE methods. of directory items.

The line

Put-handler=~/dir/puth

tells the server to use the module ROOTDIR/dir/puth to handle the HTTP methods PUT, MOVE and DELETE. A sample put-handler called puth is provided with this distribution. The precise specification of what a put-handler should do is still evolving. The examples/roam directory shows an example of its use for support of the Netscape browser roaming feature.

Search-Module -- Specify program to be used as a search engine.

This directive allows you to create your own search engine. It is invoked with a line like:

Search-Module=/full/path/to/searchmod

The program searchmod should read the environment variable QUERY_STRING and return an HTML fragment. In the typical case the program returns an unordered list of links to documents containing a match to the query string. This list can be wrapped by including a "Searchwrapper=" in the directory record. If it is not, a default wrapper with text like "Here are the matches for your search." is supplied.

To use this module you should have a form action which is something like http://host/dir/search=index. Two simple examples of a search-module (written in perl) are included in the distribution in the files bin/wnseven_m and bin/wnsectsearch.

Searchwrapper -- Set wrapper file for searches on this directory.

The line:

Searchwrapper=swrap.html

specifies that the HTML file swrap.html in the current directory should be used as a wrapper for the output of all searches on this directory.

To specify a wrapper for searches on an individual file use the file directive "Searchwrapper=".

Subdirs -- Specify subdirectories for searching and recursive use of wndex.

When you run the wndex utility with the -r option (for recursive), it must know in which subdirectories it should descend to create a new index.cache database file. Likewise when the server does a title, keyword or user defined field search it recursively descends the data hierarchy and must know for each directory which subdirectories are part of the hierarchy.

The maintainer provides this information in a line like:

Subdirs=subdir1,subdir2,subdir3

in the directory directives giving a comma separated list of subdirectories of the directory containing the current index.wn file.

There are two special forms of the "Subdirs=" directive. Using:

Subdirs=<index>

is equivalent having a "Subdirs=" directive whose value is a list of all subdirectories which contain a file named "index.wn" (or the name specified with the -i option to wndex).

Using:

Subdirs=<all>

is equivalent having a "Subdirs=" directive whose value is a list of all subdirectories.

B.2 File Directives

A collection of lines in the index.wn file containing information about a single file in the directory of the index.wn file is called a file record. A new file record begins with a line starting with "File=" and ends with the start of a new file record. Each line in a record begins with a file directive. Here is the complete list:

Attributes -- Set file attributes.

Currently several possible attributes are possible including imagemap, nosearch, parse, noparse, post_only, nopost, dynamic, nondynamic, cachable, no-keepalive, non-cachable, put, and cgi. Multiple values, separated by commas can be put on a single "Attributes=" line, as in "Attributes=parse,dynamic,nosearch". Letter case is not significant in the attribute value. Also "Attribute=" (without the 's') is synonymous with "Attributes=".

See also the directory "Attributes=" directive.

Attributes=cachable

causes the server not to send the "Pragma: no-cache" and "Cache-control: no-cache" headers when it otherwise might. For example these headers are sent by default for CGI output. If you want the browser "back" button to return users to a a CGI generated page after they have followed a link you may need "Attributes=cachable" since otherwise the browser may not even cache the page in memory. (See also "Attributes=non-cachable".)

Attributes=cgi

indicates that the standard CGI environment variables should be set up before processing this request. This is may be useful if there is a "Filter=" directive for this document or if the document has a "Include=" which is the output of a program. In these cases the filter program or include program can access the CGI environment variables. This line is not necessary if the document it refers to is actually a CGI program since in that case this attribute is automatically set. If the document is not actually a CGI program then the environment variable PATH_INFO will always be empty. This is because the server always interprets a request without a ".cgi" suffix or a "cgi-bin" directory in it as the longest possible sequence of directories and a terminating file, i.e. a request without PATH_INFO.

Attributes=dynamic

indicates that the document may change each time it is sent. This causes the server not to send headers with a content length or a last modified date. It also will cause the server to ignore any "If-Modified-Since" date sent by the client and always resend the document. It is not necessary to set Attributes=dynamic for CGI programs as it is set by default for them. If you do not wish this done for a CGI program then use the directive "Attributes=nondynamic".

Attributes=imagemap

Indicates that the file is an imagemap used to support clickable images.

Attributes=MD5

Indicates that wndex should calculate an MD5 digest or checksum for this file and store it in the index.cache file for use as in a "Content-MD5" header for this document. If the document is subsequently modified you must re-run wndex to recalculate this digest value. If this is not done and the document is newer than the calculated MD5 digest, the server will omit the "Content-MD5" header and log an error.

Attributes=no-keepalive

indicates that the server should immediately close the connection after fulfiling a request for this document and not honor a request from the client to use a persistent connection.

If this is used as a default attribute and you wish to allow a persistent connection for a specific document then use the line:

Attributes=keepalive

which will override the default.

Attributes=non-cachable

indicates that the server should send the "Pragma: no-cache" and "Cache-control: no-cache" headers attempting to encourage clients and proxies not to cache this document. It is not necessary to set this for CGI programs or any document requiring authentication as it is set by default for them. If you wish to allow the output of a CGI program or authenticated document to be cached then use the line:

Attributes=cachable

which will override this default. This may be necessary if you want the browser "back" button to return users to this document after they have followed a link, since otherwise the browser may not even cache the page in memory.

Attributes=noget

indicates that the file referenced by this directive may not be accessed with the GET method. This might be used with a CGI program designed only to be accessed via the POST method.

Attributes=nondynamic

overrides the default CGI setting of "dynamic". If this is done the "Last-Modified" date header of the document will be that of the program.

Attributes=noparse

indicates that the file referenced by this directive should not be parsed for server includes. This is used to override a default attributes setting to parse all documents. Also this might be done to improve efficiency when, for example, a document has a wrapper but nothing is included in it. Since it has a wrapper parsing will be turned on by default, but it is not necessary since nothing is actually included.

Attributes=nopost

indicates that the file referenced by this directive may not be accessed with the POST method. If the item referenced is an ordinary file this directive is assumed and need not be set. For CGI programs, if this is set and an attempt to POST to the object is made by a client an error will be returned.

Attributes=nosearch

indicates that the file referenced by this directive should not be searched when the server does a context or grep search of the current directory.

Attributes=parse

indicates that the file referenced by this directive should be parsed for conditional text or server-side includes. This line is not necessary if there is also a "Wrappers=" line or an "Includes=" line since in that case the parse attribute is assumed. If you do not wish a document to be parsed when it otherwise would be the "Attribute=noparse" can be used.

Attributes=post_only

indicates that the file referenced by this directive may only be accessed with the POST method. If the item referenced is a CGI program and an attempt is made to access it with the GET method an error will be returned. This directive may useful for files which are filtered or "include" an executed program. In that case the POSTed data will be in placed in a temporary file. The name of the temporary file can be found by using "Attributes=cgi" which will cause the name to be placed in the environment variable WN_POST_FILE.

Attributes=put

indicates that the file referenced by this directive may be accessed with the PUT method. It must be handled by your program. The PUT data will be in placed in a temporary file. The name of the temporary file can be found by using "Attributes=cgi" which will cause the name to be placed in the environment variable WN_PUT_FILE.

CGI-Handler -- Specify the CGI program with which a file is to be processed.

The line:

CGI-Handler=/path/to/bar

causes the program "/path/to/bar" to be run and its output to be served in place of the document requested. This is a way to designate a CGI program to handle a file somewhat like a filter. The name of the program need not be in the URL since it is in the index.wn file. So when http://host/path2/foo.html is requested this will cause the handler, say /path/to/bar, to be run with the CGI environment variable PATH_INFO set to /path2/foo.html. In normal use the program /path/to/bar will do something to the file foo.html and serve the output. It is useful if you want a number of files in a directory to be handled by the same CGI program. Note the file foo.html need not be used in any way by the program, but it must exist or else the server will treat it as a non-existent file.

There is no reason for the CGI-handler program bar to be in the served data hierarchy and it is good security practice to have it be elsewhere in a place where it cannot itself be served. This prevents its execution with the CGI environment variable PATH_INFO been set by the client. It should never be located in a cgi-bin directory. If handler name begins with a '/' the name is considered as a path relative to the system root directory. If it begins with '~/' as in ~/dir/foo it is assumed to be relative to the WN root directory. Otherwise it is assumed to be a path relative to the directory containing the index.wn file.

Charset -- Specify the character set to be added to the MIME type for this document (which should have content type text/*).

The line:

Charset=iso-8859-1

specifies that this document, which should have a MIME type of text/*, should have a character set parameter with this value appended to its content type. E.g. a document of type text/html will be sent with a content type header

Content-type: text/html; charset=iso-8859-1

If no value is specified for this and the Default-Charset directive is not used then the value iso-8859-1 is used. Note, that if a content-type file directive is also used, if must occur AFTER the charset directive.

Content-encoding -- Specify the content encoding for a file.

The line:

Content-encoding=x-gzip

specifies "x-gzip" as the content encoding for the file described by this record. Only two types of content encoding are supported by common browsers. They are "x-gzip" and "x-compress". They indicate that the file has been compressed with the GNU gzip(1) utility or the UNIX compress(1) utility. The file is then sent by the server in the compressed format and will be decompressed automatically by the browser, if it supports this functionality.

In many cases this is unnecessary to specify this explicitly as the wndex program will automatically assign the the content-encoding x-gzip to a file whose name ends with ".gz" and the content-encoding x-compress to a file whose name ends in ".Z". Supplying the value "none" for the "Content-encoding=" will prevent the server from making this automatic assignment.

Content-type -- Specify the MIME content type for a file.

The line:

Content-type=audio/basic

specifies "audio/basic" as the MIME type for the file described by this record. In many cases this is unnecessary as the wndex program will automatically assign the MIME type if the file name ends in a suffix listed in the file lib/mime.types with a corresponding type. If this line is supplied it will override the default value of the content type determined by the suffix.

The mime.types file should be installed in a known location. The default location is in the WN src hierarchy, but this can be changed by specifying a different value when the configure program is run or by editing the value of "#define MIME_TYPES_FILE" in config.h. The mime.types file exists so that you can add to it if you wish to add new kinds of documents to your server. The format of the file is explained in the file. A default version of the file is in lib/mime.types. The internal defaults are the same as what is currently in this file. The mime.types file is read whenever wndex is run so wndex always knows the latest additions. This file is also read by wnsd (but not wnd) on startup for use with directories with the "Attributes=serveall". The wnsd stand-alone server reads this file when it is started or restarted, but only takes note of new suffixes and their mime types. You cannot change the mime type corresponding to one of the standard suffixes (as listed in the default mime.types file). To do that you need to change the server source and recompile.

Expires -- Specify the expiration date of a document or file.

The line:

Expires=Mon, 01 Sep 1997 14:11:01 GMT

specifies date and time which a document expires. Current practice is to use the format specified by RFC 850 and illustrated above. In particular, GMT should be used. More information about HTTP date formats can be found at RFC 1123. For HTML documents the this information is automatically extracted from the document by wndex. This requires a "<meta>" line in the head of the HTML document like:

<meta http-equiv="Expires" content="Tue, 10 Oct 1994 14:11:01 GMT">

If the "Expires=" directive is also supplied in the index.wn file it will override the expiration date in the document. See also the "Max-age=" file directive.

Field#n -- Specify a user supplied field associated with a file.

The line:

Field3=string

specifies "string" user supplied field 3 associated with the current document. These are used for field searches. The digit 3 can be replaced with any other single digit allowing a total of 10 user supplied fields.

File -- File name.

The line:

File=foo

begins a new file record for the file foo. It indicates that permission is granted for this file to be served. Other file directive lines will apply to this file until a new file record or text segment is started or the end of the index.wn file is reached. The presence of this line causes an entry for this file to be written in the index.cache file created by wndex.

Filter -- Specify the filter with which a file is to be postprocessed.

The line:

Filter=/dir/foo

causes the contents of the file whose record contains this line to be used as the UNIX stdin(3) stream of the program foo and the the UNIX stdout(3) stream of that program to be sent to the client instead of the file itself. A common use of this is to specify a decompressing program like the UNIX zcat(1) utility as the filter so that a compressed version of a file can be stored on disk and then be decompressed on the fly before being sent to the client. Another example would be "Filter=/usr/bin/nroff -man" which would convert a UNIX nroff(1) utility to convert a man(1) page to an ASCII text document on the fly.

If a listed file name begins with a '/' the name is considered as a path relative to the system root directory. If it begins with '~/' as in '~/dir/foo' it is assumed to be relative to the WN root directory. Otherwise it is assumed to be a path relative to the directory containing the index.wn file.

Header -- Add a line to the HTTP/1.1 header for this document.

The line:

Header=[some legal HTTP header]

causes the line "[some legal HTTP header]" to be added to the HTTP/1.1 header for this item. This directive can be used multiple times to add multiple lines to the header.

Note: Don't do this unless you know what you are doing!

HTTP-Status -- Return a given HTTP/1.1 status value.

The line:

HTTP-Status=404 Not Found

causes the response line of the HTTP/1.1 header to be "HTTP/1.1 404 Not Found". This is primarily of use when redirecting requests for non-existent files to an error message which should be returned with status 404 so robots understand.

Note: Don't do this unless you know what you are doing!

Includes -- Specify the files to be included in a text document.

The line:

Includes=file1,file2,file3

causes the file whose record contains this line to be parsed for lines like "". When such a marker is found one of the files listed with the "Includes=" file directive is inserted. Subsequent occurrences of the marker cause the inclusion of subsequent files in the order in which they occur in this directive.

If a listed file name begins with a '/' the name is considered as a path relative to the system root directory. If it begins with '~/' as in "~/dir/foo" it is assumed to be relative to the WN root directory. Otherwise it is assumed to be a path relative to the directory containing the index.wn file. See the section of the user guide on includes and wrappers for more information.

Keywords -- Specify the keywords associated with a document or file.

The line:

Keywords=pink, elephant, HTTP

specifies a list of keywords associated with the current document. These are used for keyword searches. For HTML documents the keywords are automatically extracted from the document by wndex. This requires a <meta> line in the head of the HTML document like:

<meta http-equiv="Keywords" content="pink, elephant, HTTP">

If the "Keywords=" file directive is also supplied in the index.wn file it will override the keywords in the document.

List-Includes -- Specify files which may be included in a text document.

The line:

List-Includes=file1,file2,file3

causes the file whose record contains this line to be parsed for lines like ''. When such a marker is found the contents of file2 is inserted. The order of the files listed in the directive is not significant. Note that the example above grants permission for the inclusion of the three files listed. It does not require their insertion.

Max-Age -- Specify the HTTP/1.1 Cache-Control and Expires headers for an entry.

The line:

Max-Age=10 days

specifies that a HTTP/1.1 Cache-Control header should be sent to expire the document in the specified time. If no "Expires=" file directive has been set elsewhere in the index.wn file or in the file itself, if it is an HTML file, then the HTTP/1.1 Expires header will also be sent with a value equal to the current time plus the time period of the HTTP/1.1 Max-Age header. The time period in the "Max-Age=" file directive can be specified in units of seconds, minutes, hours, days or weeks, but more than one unit (as in 2 weeks and 3 days) is not allowed.

The line:

Max-Age=10 days after last-mod

specifies that a HTTP/1.1 Cache-Control header and the Expires header (if none is set elsewhere) should be set to expire the document in the specified amount of time after the last-modified date of the document. Negative time values for the Cache-Control header will be ignored, but Expires headers with dates in the past will be used.

Nomatchsub -- Set substitute file for searches on this file which result in no matches.

The line:

Nomatchsub=foo.html

specifies that the HTML file foo.html in the current directory should be used for the output of all searches on this file which return no matches. It can only be used in conjunction with the "Searchwrapper=" file directive. See also "Nomatchsub=" for directories.

Redirect -- Send an HTTP/1.1 redirect to a new URL.

The lines:

File=foo Redirect=http://host/path/bar

cause a request for foo to be answered with an HTTP/1.1 redirect response. The client will then automatically request the new URL. The file foo need not exist.

The redirection always sends a HTTP/1.1 "301 Moved Permanently" status header followed by a "Location:" header whose value is "http://host/path/bar". This means that the value of a "Redirect=" file directive should always be a complete URL, starting with "http://" or "ftp://" etc. The one exception is that you may use "Redirect=<null>". This causes the server to send a status 204 "no response" which tells the client to do nothing and leave the display alone. The page won't be reloaded and won't change.

Refresh -- Set a "Refresh" header for use with "client-pull".

The line:

Refresh=60

adds an HTTP/1.1 header at the beginning of the transmission of this document. If the client receiving this header supports "client-pull" (currently only Netscape browsers support this) then it will automatically reload the document after 60 seconds. This is useful for documents that are updated very frequently, a stock ticker, for example. If the directive:

Refresh=30; URL=http://host/path/foo

is used then after 30 seconds the URL http://host/path/foo is loaded. This can be used to create an automatic slide show. The Refresh header is not part of an HTTP/1.1 standard and hence may evolve. If it does this directive will be subject to change!

Searchwrapper -- Set wrapper file for searches on this file.

The line:

Searchwrapper=swrap.html

specifies that the HTML file swrap.html in the current directory should be used as a search wrapper for the output of all searches on this file.

To specify a wrapper for all searches on a directory use the directory directive "Searchwrapper=".

Set-Cookie -- Set a "Cookie" header value.

The lines:

Set-Cookie=name1=opaque1 Set-Cookie=name=xxx; Expires=Wed, 19 Jan 2000 08:49:37 GMT

add an HTTP/1.1 header at the beginning of the transmission of this document. If the client receiving this header supports cookie caching (currently only Netscape browsers browsers support this) then it will save the name=value pairs and include them in the request headers when documents in the same directory or sub-directories are accessed. The server will put the name=value pairs in the CGI environment variable HTTP_COOKIE for access by CGI programs. This is useful for "shopping basket" type applications.

If the value of this directive begins with an '!', as in

Set-Cookie=!my_cookie_script

then it will be interpreted as the name of a program to be run to generate the value of the cookie. The program should not generate the "Set-Cookie:" part of this header, just the value. As elsewhere if the program name starts with '/' it will be taken relative to system root; if it begins with '~' it will be taken relative to the document root; and otherwise it will be assumed to be in the directory where this directive is found. The standard CGI environment variables will be available to the program.

Normally the client will discard the cookie at the end of a session. However, if an Expires parameter like the one above is provided the cookie will be saved between sessions and only discarded when it expires.

More information about the HTTP/1.1 Set-Cookie header is available at http://home.netscape.com/newsref/std/cookie_spec.html.

Title -- Specify the title of a document or file.

The line:

Title=This is the title

specifies the text "This is the title" as the title of the file. If the file is an HTML document this is not necessary as wndex will attempt to read the title from the document itself. If this line is supplied anyway it will override the title in the document. If this line is not supplied and the file is not an HTML document the default title "File <filename>" is used.

Wrappers -- Specify the files to be included in a text document.

The line:

Wrappers=file1

causes "file1" to be parsed for lines like "". When such a marker is found the file whose record contains this line is inserted and the combined document is sent to the client. It is possible to list multiple files in this directive. The semantics of this are explained in the section of the user guide on server-side includes and wrappers.

Appendix C

Parsed Document Syntax for the WN Server

This is a list of all parsing instructions recognized by WN while parsing an HTML document. Note that only documents with MIME type text/html can be parsed for the purposes described here. All parsing instructions use one of the two equivalent forms:

or:

<?WN something>

There is a maximum allowed size of 2K bytes for the entire "" expression. Current versions of WN no longer require this expression to be on a line by itself.

The second form is considered more SGML/XML friendly by many as "<?WN something >" indicates a processing instruction specific to WN rather than a comment. For historical reasons this manual describes the other form, but either may be used. With the first form the '#' is required but with the other you may use either:

<?WN #something>

or:

<?WN something>

Also "<?wn #something>" is fine. The case of the WN is not significant.

C.1 Matching Regular Expressions: `#if` and `#elif`

This section describes the use of conditionally included text of the form:

 Some conditional text goes here.  Some other conditional text goes here.  Alternate text. 

Which will insert the first conditional text only if some_condition is satisfied. The "" and "<!-- #else >" are optional. There may be multiple "#elif" lines.

In all the examples below the use of the equal-tilde string '=~' to indicate a matching regular expression can be replaced with the two characters '!~' in which case the if clause will be true when the regular expression fails to match.

Also in the examples of the form "<-- #if accept file="foo" -->" the file foo is assumed to be relative to the current directory unless it begins with a '/' in which case it is taken relative to the WN data hierarchy root. The format of these files is a list of grep(1) like regular expressions, one per line with any white space being taken as part of the expression. Lines beginning with '#' are taken to be comments. If a regular expression is preceded with the character '!' then that character is skipped but the truth value of any matches with the expression is reversed.

The condition in the "#if" or "#elif" tags can be made more complex than those described above by combining simple conditions using the logical operations '&&' for 'and', '||' for 'or' and '!' for 'not'. Parentheses may be used for grouping. For example:

 Text to show if cond_1 and cond_2 are satisfied. 

Other examples are:

  

The '&&' and '||' operations have equal precedence and associate from right to left.

C.1.1 `#if` and `#elif` Conditions

#if accept -- Match client's Accept headers

The lines:

or:

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches any of the Accept headers supplied by the client. Or for the second line if the file "foo" contains a regular expression matching any of the Accept headers.

#if accept_charset -- Match client's Accept-Charset headers

The lines:

or:

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches any of the Accept-Charset headers supplied by the client. Or for the second line if the file "foo" contains a regular expression matching any of the Accept-Charset headers.

#if accept_encoding -- Match client's Accept-Encoding headers

The lines:

or:

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches any of the Accept-Encoding headers supplied by the client. Or for the second line if the file "foo" contains a regular expression matching any of the Accept-Encoding headers.

#if accept_language -- Match client's Accept-Language headers

The lines:

or:

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches any of the Accept-Language headers supplied by the client. Or for the second line if the file "foo" contains a regular expression matching any of the Accept-Language headers.

#if after and #if before -- Select text based on date

The lines:

or:

specify that this text segment should be served if the current time is after (or before) the specified date. That is, the line:

will cause the text segment to be served only after "22 Oct 1996 17:41:26" local time. The date format is rather rigid. It must be in precisely the format shown above (specified by RFC 1123) and with a single space between each field. Only local time of the server is supported.

#if cookie -- Match client's Cookie headers

The lines:

or:

specifies that this text segment should be served if the UNIX grep(1) utility like regular expression regexp matches any of the Cookie headers supplied by the client.

More information about the proposed HTTP Set-Cookie header is available at http://home.netscape.com/newsref/std/cookie_spec.html.

#if environ VAR -- Match client's environment variable VAR

The lines:

or:

specifies that this text segment should be served if the UNIX grep(1) utility like regular expression regexp matches any of the the contents of the server's environment variable VAR.

#if field -- Match document's user defined field

The lines:

or:

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the contents of the user defined field number 3 (in the first case) or if the file "foo" contains a matching regular expression (in the second) case. Any valid field number may be used in place of 3.

#if hostname -- Match client's hostname

The lines:

or:

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the hostname of the client (in the first case) or if the file "foo" contains a matching regular expression (in the second) case. For an alternate method of doing this see the "#if accessfile" syntax described below.

Be aware that the character '.' (dot) has a special meaning in regular expressions and must be escaped with a '\' to have its usual meaning.

#if host_header -- Match server's virtual hostname from client's HTTP Host header

The line:

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the contents of the HTTP "Host:" header supplied by the client in its request.

Be aware that the character '.' (dot) has a special meaning in regular expressions and must be escaped with a '\' to have its usual meaning.

#if IP -- Match client's IP address

The lines:

or:

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the IP address of the client (in the first case) or if the file "foo" contains a matching regular expression (in the second case). For an alternate method of doing this see the "#if accessfile" syntax described below.

Be aware that the character '.' (dot) has a special meaning in regular expressions and must be escaped with a '\' to have its usual meaning.

#if language -- Match client's Accept-Language headers

The lines:

or:

#if query -- Match query string supplied in request URL

The lines:

or:

specifies that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the query string supplied by the client in the URL (in the first case) or if the file "foo" contains a matching regular expression (in the second case).

#if referer -- Match client supplied Referer: header

The lines:

or:

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the contents of the Referer: header supplied by the client or if the file "foo" contains a matching regular expression (in the second case). The Referer: header contains the URL of the document containing the link accessed to obtain the current document.

#if request -- Match client's request

The lines:

or:

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the contents of the full text of the request supplied by the client (in the first case) or if the file "foo" contains a matching regular expression (in the second case). The full request contains the "method" (GET or POST) followed by the URL requested with the "http://host" part having been removed (by the client).

#if TE -- Match client's TE header

The lines:

or:

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches the TE header supplied by the client. Or for the second line if the file "foo" contains a regular expression matching the TE header. The TE header specifies the transfer encoding.

#if true and #if false -- Include or exclude text segment

The line:

specifies that the corresponding text segment should not be served. It may be useful for "commenting out" a part of a document which is under construction. The "#if true" construct is present for logical completeness.

#if UA -- Match client's User-Agent: header

The lines:

or:

specifies that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the User-Agent: header supplied by the client (in the first case) or if the file "foo" contains a matching regular expression (in the second case).

C.1.2 Access Control Files

The normal access control files used by WN to limit access to a directory can also be used to conditionally permit or deny access to text segments.

#if accessfile="filename" -- Check access control file

The line:

specifies that the file /dir/accessfile is to be used to determine access privileges (by hostname or IP address) for this text segment. The path /dir/accessfile is relative to the server root directory. If this path does not begin with a '/' then the path is relative to the directory containing the file with this text. See the chapter "Limiting Access to Your WN Hierarchy" in this guide.

C.2 Inserting the Contents of a File

#include -- Insert the contents of a file

The line:

specifies that the contents of next file listed in the includes or wrappers should be inserted at this point. It is permissible to add the name of this file, as in:

but this acts only as a comment. The actual file inserted depends only on the "Includes=" and "Wrappers=" directives in the index.wn file (or more precisely the index.cache file created from it).

#section -- Insert part of the contents of a file

The line:

specifies that part of the contents of next file listed in the includes or wrappers should be inserted at this point. It is permissible to add the name of this file, as in:

The part of the file actually included is that portion of the document between the special comments "" and "" inserted in that document. This requires that these starting and ending comments occur in the HTML document on lines by themselves. For more information see the section "More on Including: the section Marker" in this guide.

#start and #end -- Mark the beginning and end of text to be included

The lines:

and:

mark the beginning and end of the portion of the text to be inserted from an include or wrapper in response to encountering "" in the text of a document being parsed. There can be more than one "#start/#end" pair in a document. For more information see the section "More on Including: the section Marker" in this guide.

#title, #query, and #field -- Insert the title, current search string, or a user defined field

The lines:

 

or:

in a parsed document instruct the server to include the title of the current document, the current search term from the client or the value of user defined "field #3" for the current document. All of these markers must occur on a line by themselves. For more information see the section "Including Title, Query, Fields and Environment Variables" in this guide.

#environ -- Insert the contents of an environment variable

The lines:

in a parsed document instructs the server to include the contents of the environment variable WHATEVER. Remember to use an "Attributes=parse" line when using this construct and to use an "Attributes=cgi" when it is a CGI variable like HTTP_REFERER which is to be included.

Conditional Redirecting

#redirect -- Redirect to a different URL

The line:

specifies that if no text has yet been sent the server should send an HTTP redirect to the given URL. This might be used as follows. If the text:

  

is included at the beginning of an HTML document then any request from a uk host will automatically be redirected to the specified URL, the UK_mirror_url in this case. This mechanism could also be used to redirect text only browsers to a text only alternative page, etc.

There must be no text sent before the '' is encountered (not even blank lines) since the server cannot send an HTTP redirect while in the middle of transmitting a document. Thus the example above would be an error if there are any blank lines before the "#if hostname" line or any blank lines after it before the "#redirect" line. When such an error occurs it is logged in the error file and the "#redirect" line is ignored.

Note however that:

 [Lots of text here]   

is correct since when the #redirect line is encountered no text has been sent.

Normally the URL in the "" line is fully qualified, like "http://host/path/foo". However, it can also be simply "foo" referring to a file in the same directory as the file being parsed. In this case an HTTP redirection is not sent, and instead the file "foo" is returned immediately to the client.

Appendix D

CGI and other Environment Variables for the WN Server

This is a list of environment variables set by WN. These variables are set automatically when a CGI program is called. They will also be set for use by filters or server-side includes which executes files if an "Attribute=cgi" line is added to the file record of a document in the index.wn file.

D.1 Standard CGI Variables

This is a list of standard CGI/1.1 environment variables set by WN.

AUTH_TYPE: Contains the type of authentication being used to limit access to the current document. For example, "Basic" or "Digest".
CONTENT_LENGTH: Contains the length of information provided by a client POST. The length is an ASCII string representing the number of bytes.
CONTENT_TYPE: Contains the content type supplied by the client with POST data.
GATEWAY_INTERFACE: Contains "CGI/1.1".
HTTP_ACCEPT: Contains the contents of any "Accept:" headers supplied by the client.
HTTP_ACCEPT_CHARSET: Contains the contents of any "Accept-Charset:" headers supplied by the client.
HTTP_ACCEPT_LANGUAGE: Contains the contents of any "Accept-Language:" headers supplied by the client.
HTTP_COOKIE: Contains the contents of any "Cookie:" header supplied by the client.
HTTP_FROM: Contains the contents of the "From:" header supplied by the client. This may contain the e-mail address of the client user. This is generally unreliable, as usually users choose not to supply this information. If they do give it they can choose any e-mail address they want -- there is no guarantee that this is, in fact, the real e-mail address of the client user.
HTTP_HOST: Contains the contents of the "Host:" header supplied by the client. This should contain the one of the aliases for the host on which the server is running. It should be the hostname from the URL that the client is requesting. Thus a client seeking "http://www.serverhost.com:8000/foo.html" should supply "www.serverhost.com" in this header. Many browsers do not do this. It is required in HTTP/1.1.
HTTP_RANGE: Contains the contents of any "Range:" header supplied by the client.
HTTP_REFERER: Contains the contents of the "Referer:" header supplied by the client. This is supposed to contain the URL of the document from which a link to this program activated.
HTTP_USER_AGENT: Contains the contents of the "User-Agent:" header supplied by the client. This is supposed to contain the name of the browser which the client is using to make this request
PATH_INFO: Extra information in the "path" of a CGI URL. More precisely everything after the actual name of the program. For example, the URL "http://host/dir/foo.cgi/stuff1/stuff2/stuff3" would have "/stuff1/stuff2/stuff3" in the PATH_INFO variable.
PATH_TRANSLATED: This assumes that PATH_INFO is the path of a file relative to your WN root and provides the path relative to the system root. For example, the URL "http://host/dir/foo.cgi/stuff1/stuff2/stuff3" would have "wnroot/stuff1/stuff2/stuff3" in the PATH_TRANSLATED variable.
QUERY_STRING: The contents of the "query" part of an HTTP/1.1 GET request. This is whatever comes after the '?' in the URL. For example, the URL "http://host/dir/foo.cgi/stuff1/stuff2/stuff3?dingbat" would have "dingbat" in the QUERY_STRING variable. If the request was generated by an HTML "<form>" this contains all the form information encoded as "name1=value1&name2=value2...".
REMOTE_ADDR: Contains the IP address of the client or proxy making the request.
REMOTE_HOST: Contains the hostname of the client or proxy making the request, if the server can determine it. Otherwise it contains the IP address. If "#define NO_DNS_HOSTNAMES" is defined in config.h before compilation then no attempt will be made to determine the hostname and REMOTE_HOST will contain the IP address. (This would reduce the load on the server.)
REMOTE_IDENT: Contains the user name supplied by a RFC 931 identd(8) server on the client (if there is one and you have compiled the server with "#define RFC931_TIMEOUT" macro enabled in the config.h file).
REQUEST_METHOD: Contains either "GET", "POST" or "PUT", depending on the method of the client request.
REMOTE_USER: If the CGI program is password protected this will contain the user name provided by the client.
SCRIPT_FILENAME: The name of the CGI program being executed and its path relative to the system root. For example, the URL "http://host/dir/foo.cgi/stuff1/stuff2/stuff3" would have "wnroot/dir/foo.cgi" in the SCRIPT_FILENAME variable.
SCRIPT_NAME: The name of the CGI program being executed and its path relative to the WN root data directory. For example, the URL "http://host/dir/foo.cgi/stuff1/stuff2/stuff3" would have "/dir/foo.cgi" in the SCRIPT_NAME variable.
SERVER_NAME: Contains the name of the "virtual host" currently being accessed, if virtual hosting is in use. Otherwise it contains the name of host on which server is running. The value of this variable will be identical to the value of HTTP_HOST if the client supports the "Host header:". Otherwise it will be determined by the IP address to which the request was made and the corresponding entry in the virtual hosts table.
SERVER_PORT: Contains port on which server is running.
SERVER_PROTOCOL: Contains "HTTP/0.9", "HTTP/1.0" or "HTTP/1.1" depending on which protocol the client is using.
SERVER_SOFTWARE: Contains "WN <version>". For example, "WN 2.0.0". Recall that version numbers of the form "a.b.c" with 'b' even are intended to be stable releases while 'b' odd indicates a release with new and less tested features.

D.2 WN Specific CGI Variables

The following are not standard CGI/1.1 environment variables but are provided by the WN server when CGI variables are set:

DOCUMENT_ROOT: Contains the path to your WN root data directory relative to the system root. For example, the URL "http://host/dir/foo.cgi/stuff1/stuff2/stuff3" would have "wnroot" in the DOCUMENT_ROOT variable.
HTTP_POST_FILE: Contains the path to the temporary file containing the data sent by the client via the POST method. This file is removed after the CGI program terminates. A CGI program can move it or copy it.
HTTP_PUT_FILE: Contains the path to the temporary file containing the data sent by the client via the PUT method. This file is removed after the CGI program terminates. A CGI program can move it or copy it.
HTTP_VIA: Contains the contents of any "Via" header supplied by the client.
HTTP_X_FORWARDED_FOR: Contains the contents of any "X-Forwarded-For" header supplied by the client.
URL_SCHEME: Contains "http" normally or "https" in case the server has been modified to use the Secure Sockets Layer (SSL) protocol.
WN_DIR_PATH: Contains the path to the directory containing the program relative to the system root. For example, the URL "http://host/dir/foo.cgi/stuff1/stuff2/stuff3" would have "wnroot/dir" in the WN_DIR_PATH variable.
WN_ROOT: Deprecated. See DOCUMENT_ROOT.

D.3 WN Specific Database Variables

One further environment variable is used by WN in conjunction with external databases.

WN_KEY: WN_KEY is used in conjunction with external databases and either "Cache-Module=" or "File-Module=" directory directives. For more details on these see the chapter "WN Auxiliary Modules" in this guide.

Appendix E

Creating User Home Pages for the WN Server

This document is for the ordinary user, not the system "webmaster". It is intended as a short description of how to use the WN server to set up your home page (and related documents if you wish). It is not a primer on writing HTML. There are many of those -- some are listed below. Instead it deals with what to do with a home page document after it is created to put it "on the Web". We assume that your system administrator has already set up the WN server in such a way that users are permitted to have home pages in their own directories. If that isn't the case you will have to persuade your local administrator to do so or read the full "User's Guide for the WN Server" to learn how to do it yourself.

E.1 Getting Started

But let's be optimistic and suppose everything is set up and working properly. If you haven't created a home page here is a bare bones one you can copy to expand on later:

<!doctype html public "-//W3C//DTD HTML 3.2 Final//EN"> <head> <title>John Doe's Home Page</title> </head> <body> <h1>John Doe's Home Page</h1> <p> This is the story of my life... </p> </body>

Your home page and any other files you want to serve must be located in a particular subdirectory of your home directory (your home directory is where you are when you first login) on the UNIX system which is running the WN server. Usually this subdirectory is called public_html, but your system webmaster may have chosen something different. We'll assume the name is public_html.

It probably doesn't yet exist in your home directory. If not create it with the command:

mkdir public_html<Enter>

The public_html directory also must have the correct "permissions" so that the server can look inside it. For security reasons the WN server isn't very powerful and it can only access files and directories which are not heavily protected. Just to make sure the server won't have trouble accessing your home page run the command:

chmod 755 public_html<Enter>

You must place a copy of your home page in this directory and give it the name index.html. One way to do this is enter the public_html directory and use an editor to create the home page file there with the name index.html. Alternatively, you could create it elsewhere and copy it to the public_html directory. If you need to rename it the proper UNIX command is:

mv oldname newname<Enter>

Of course the server must also be able to read your home page in order to serve it. If documents you create are automatically protected from viewing by others you will have to change that for your home page. The command:

chmod 644 index.html<Enter>

gives anyone (including the server) permission to read but not change your home page.

E.2 Granting Access to Your Home Page

There is one more step before your home page is available to the world. The WN server is very security conscious. It is much more careful about refusing to serve documents which users did not intend to be served. This means that in addition to serving only documents in designated directories (such as your "public_html") it will only serve documents which have been explicitly designated for serving in some way.

Here are two different ways to grant the server permission to serve your home page. You only need to do one of these.

E.2.1 A Lazy Way

Create a file in the public_html directory named index.wn which contains the line:

Attribute=serveall

Then execute the command:

wndex<Enter>

which will create the file "index.cache". It grants the server permission to serve anything in this directory. If you put documents to be served in subdirectories of public_html, copy the file index.wn to them also and rerun wndex in these subdirectories to grant permission to serve anything from them (actually the file "index.wn" itself is excluded from being served as are files beginning with the '.' character).

E.2.2 A Better Way

Create a file in the public_html directory named index.wn which contains the line:

File=index.html

If you have other files you want to serve add additional lines like "File=foo.html" where foo.html is the name of the file you want to serve. Then execute the command:

wndex<Enter>

Files which are HTML files should have names which end with the suffix ".html". Files which are ASCII text files should end with ".txt" and PNG image files should end with ".png". Executing the "wndex" (pronounced "windex") command will create the file "index.cache" as before, but now the server will only have permission to serve those files whose names are listed explicitly in your index.wn file.

If you put documents to be served in subdirectories of public_html, you must create an index.wn file in each of them also which lists the files which can be served.

The "better way", in addition to providing better security, also enables many of the advanced features of the WN server, such as searching capabilities. In the long run if you serve multiple documents you will probably want to use it. The trade off is that the "lazy" way is easier since it only needs to be done once, while the "better" way requires that you edit the index.wn file and rerun wndex each time you add a new file to public_html to be served.

E.3 What's Your URL?

So once your home page is functional, how do you access it? Well, your home page address (called a URL which stands for "Uniform Resource Locator") is made up from the name of your host and your user name. For example, if your user name is jdoe and the computer on which the server is running is named www.school.edu then the URL for your home page is probably:

http://www.school.edu/~jdoe/

The "probably" refers to the fact that there are many possible variations which may have been chosen by your webmaster. For example, some people don't like the '~' character in URL's so they might use "http://www.school.edu/homepages/jdoe/". In any case there are a couple of things to notice about your URL.

First it has a synonym:

http://www.school.edu/~jdoe/index.html

This is because whenever a URL ends in '/' the server knows it is referring to a directory (in this case your public_html directory) and looks for the default file "index.html" in that directory. Thus if you want to put some files in subdirectories of public_html that is fine. But keep in mind that if you create the directory "mydir" in public_html then "http://www.school.edu/~jdoe/mydir/" will really refer to "http://www.school.edu/~jdoe/mydir/index.html" and to refer to a file "foo.png" in mydir you should use "http://www.school.edu/~jdoe/mydir/foo.png". You should not use the URL "http://www.school.edu/~jdoe/mydir" referring to a directory but with no '/' at the end.

E.4 Advanced Features

There are many advanced features you can use with the WN server. To learn to use them you will need to read parts of the manual designed for system administrators. A good place to start is the chapter "An Overview of the WN Server" in this guide. It will give you a good idea of some of the things the server can do. Then you will likely also want to consult the chapter "Creating Your WN Data Directory" in this guide. Neither of these chapters is too long.

E.5 HTML References

To find out more about composing HTML documents consult "A Beginner's Guide to HTML" (at http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html) or see "Creating Web Sites" (at http://home.netscape.com/browsers/createsites/index.html). "http://www.htmlhelp.com" is an excellent site on the various web authoring standards.

Appendix F

MD5 Digests of Recent WN Releases

These are the MD5 digests of recent releases of WN. The MD5 digesting program is available from "ftp://ftp.cert.org/pub/tools/md5/". My PGP public key is available from my home page.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

MD5 (wn-2.3.8.tar.gz) = 95c9fc262a11d165c8a682aa0eddc9fd
MD5 (wn-2.3.7.tar.gz) = 011a46ce203c35b57d56078eacb984d4
MD5 (wn-2.3.6.tar.gz) = 8e572e06acfe6a1cdb3e326916b13ed9
MD5 (wn-2.3.5.tar.gz) = 32627fc73fbf2bc86cbed45748c720b2
MD5 (wn-2.3.4.tar.gz) = 5af9cc1497563819c33c095aadc03be2
MD5 (wn-2.3.3.tar.gz) = bf81789ddf326f7df40085961d48fab2
MD5 (wn-2.3.2.tar.gz) = bccbbc6651fb148c536c0184f1735a05
MD5 (wn-2.3.1.tar.gz) = a118459246a479aa9ac2ba3823c811fd
MD5 (wn-2.3.0.tar.gz) = 46a240cb6bca73302fcf16161c17f5c4

MD5 (wn-2.2.9.tar.gz) = 4219ac1c9b6511fd239e2f5b9464c9c6
MD5 (wn-2.2.8.tar.gz) = cfe5d853501b828ab593887a3ec519d8
MD5 (wn-2.2.7.tar.gz) = d864acabb2777b89932824e02f3a733c
MD5 (wn-2.2.6.tar.gz) = 248695156c29f86dcafb4fc9183b605a
MD5 (wn-2.2.5.tar.gz) = 91d9a527256d090578d54f76efaf6cfb
MD5 (wn-2.2.4.tar.gz) = 6c43b9a231bc8d8f9f021e30817f5d26
MD5 (wn-2.2.3.tar.gz) = f6c1e3b6ec4c78c0679c441eb04c68de
MD5 (wn-2.2.2.tar.gz) = 63167df9366de648a633b35f13caf395
MD5 (wn-2.2.1.tar.gz) = a622801c8a0dcce91b91f8bde81b94f5
MD5 (wn-2.2.0.tar.gz) = 8d9081585d9b689d8c394d592daf2698

MD5 (wn-2.1.7.tar.gz) = 9348438468529e38b2f6ea313d89d4eb
MD5 (wn-2.1.6.tar.gz) = 63d9523aee8808b272ef3d49b5bab830
MD5 (wn-2.1.5.tar.gz) = 30821b8e63e053e73c924ab46f082728
MD5 (wn-2.1.4.tar.gz) = 9c3a0c39f188c4b06e1ca41385a28317
MD5 (wn-2.1.3.tar.gz) = 9dd9e00d9e1627f215c3e4b13f3b76af
MD5 (wn-2.1.2.tar.gz) = 5af0446f734dd180e2f2da66125c82fd
MD5 (wn-2.1.1.tar.gz) = 84f34b550d0026c6cfa75ce20f6c4e01
MD5 (wn-2.1.0.tar.gz) = f44adec2a03294775af1597fea0321c6

MD5 (wn-2.0.11.tar.gz) = 4a3c5cafe5b30b284d759d43fcb9c689
MD5 (wn-2.0.10.tar.gz) = 44583c7ecae9486cb180ac4b39254cd7
MD5 (wn-2.0.9.tar.gz) = 5fadecb3d64db2a54e40674c06752a13
MD5 (wn-2.0.8.tar.gz) = d0eeec9de4fdfa479bb35d3172d45c9f
MD5 (wn-2.0.7.tar.gz) = 9b4fc7ccf2139563c0de2b654de54384
MD5 (wn-2.0.6.tar.gz) = b1f9505b95f753671c276e0781f501ab *
MD5 (wn-2.0.6.tar.gz) = 1a9f1e43faaad3b7beba3509c35ed999 *
MD5 (wn-2.0.5.tar.gz) = 6d5fb3d7a04aaed2f80a30f20e71c99b
MD5 (wn-2.0.4.tar.gz) = 58ef80d1d49917e88f145cbb727bc79e
MD5 (wn-2.0.3.tar.gz) = 1d2928d67bc56a58449f8feea33a89e2
MD5 (wn-2.0.2.tar.gz) = e9c4edf7535a9465c1f4fee12fb98c85 
MD5 (wn-2.0.1.tar.gz) = dfa7d8d9bf83006d04901c494e915ed1
MD5 (wn-2.0.0.tar.gz) = 2fe3889850566ea82decc55729cb1ae4

MD5 (wn-1.19.9.tar.gz) = c92e4009f59738ae521c3fc1a63c8f30
MD5 (wn-1.19.8.tar.gz) = 1544eedfbf54121879bdc51ae9b7bd68
MD5 (wn-1.19.7.tar.gz) = c3233797269691b6289eee0598f8217f
MD5 (wn-1.19.6.tar.gz) = 265453d433ca5c65e0c31ff6d9ac0f67
MD5 (wn-1.19.5.tar.gz) = 4b6b87136758624d41c2b5010a50d13e 
MD5 (wn-1.19.4.tar.gz) = 5caedab5efdf43b42745b654f64e83ed
MD5 (wn-1.19.3.tar.gz) = 96c34cad1d8d18b0296dd097740ce2a7
MD5 (wn-1.19.2.tar.gz) = bc052237a751b7225b725f97fa028954
MD5 (wn-1.19.1.tar.gz) = a24d81c0f1e11c5985607c89994da0b0
MD5 (wn-1.19.0.tar.gz) = 0ac7d1bb18086b2348a36180766013c3

MD5 (wn-1.18.7.tar.gz) = 8dfacb70b044e206cebc4271bef01cdc
MD5 (wn-1.18.6.tar.gz) = 90a8352e383462ac5f68852f80fe854d
MD5 (wn-1.18.5.tar.gz) = c8a55826e453eb17b6a72e064dd01987
MD5 (wn-1.18.4.tar.gz) = f7d1807be26031063300c3591281384c
MD5 (wn-1.18.3.tar.gz) = 0f61cf0c7de4bd1d518fe18723dcf2b2
MD5 (wn-1.18.2.tar.gz) = 3e89198e22dcfd0513b26389fc3cb022
MD5 (wn-1.18.1.tar.gz) = 63f5e294cb3c297bcda621585cb0be94
MD5 (wn-1.18.0.tar.gz) = 19aa0a483d4c4d7ead4198d98b8d0e76

(*) There were two versions of 2.0.6 differing in trivial ways.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE5NdM8muNM+ZEudwYRAiCIAJ9XK/sZ7xhs/zBO0t+48sw6v1umYQCdGvfl
ySSq/zYfVcMr3D9XJfz/lqg=
=ni83
-----END PGP SIGNATURE-----

Appendix G

Configuration Macros for WN

This section summarizes some of the important UNIX cpp(1) utility macros defined in the file config.h.dist that controls the configuration of WN.

`#define ACCESS_DENIED_URL`

Any request for a document in a directory which is denied because of an "Accessfile=" restriction should be redirected to the given URL. This can be overridden by with the "Access-denied-URL=".

The default is to give this the value "" which disables the feature.

`#define AUTH_DENIED_FILE`

This is the document returned when password authentication fails because of an incorrect password or username

The default is to give this the value "" which disables the feature.

`#define CGI_BIN`

Directory name to identify CGI programs.

The default is "cgi-bin".

`#define CGI_EXT`

File extension to identify CGI programs.

The default is ".cgi".

`#define DEFAULT_PORT`

Default port to run the stand-alone servers wnsd on. This can be overridden at runtime by using the -p command line option.

The default is (80).

`#define DEFAULT_CHARSET`

Default character set to use in the Content-type header for text/* MIME types.

The default is iso-8859-1.

`#define BUILTIN_CONTENT_TYPE`

This is the MIME Content-type header value used for documents generated by the server, such as error messages or search results.

The default is "text/html; charset=iso-8859-1".

`#define DEFAULT_URI`

This is the document returned in response to a request with only the hostname, (something like "http://hostname.edu/" with no file name at the end). You would need to change it, for example, if you wanted to have the default server response be to run a CGI program.

The default is "/index.hmtl".

`#define FORBID_CGI`

Define this if you wish to forbid the use of the use of CGI/1.1 programs on your host. This will enhance security but will, of course, decrease functionality. See the chapter "Using CGI Programs on the WN Server" in this guide.

The default is to comment out this macro.

`#define GROUPID`

Same as for "#define USERID" but for the group id.

`#define INDEXFILE_NAME`

Default file name if the requested URL does not include a file name.

The default is "index.hmtl".

`#define LEAST_UID`

Minimum allowable uid for users with personal home directories.

The default is (100).

`#define MIME_TYPE_FILE`

The full path name of the file containing information permitting wndex to translate the MIME "Content-type" from the suffix of a file name. If this file is not present, running wndex with the -v option will issue a warning but use internal default values. The file exists so that you can add to it if you wish to add new kinds of documents to your server. The format of the file is explained in the file. The default version of the file is in /lib/mime.types. The internal defaults are the same as what is currently in this file.

The default is "/usr/local/lib/wn/mime.types".

`#define NO_DNS_HOSTNAMES`

If this is set to TRUE the server will not use hostnames in your log file, just IP addresses. This will reduce the load on your server (but probably not speed up responses since the lookups usually take place after the transaction is complete). Keep in mind that setting this TRUE will mean that none of your CGI programs will get the hostname and also that your access files cannot have hostnames in them, just IP addresses.

The default value is FALSE

`#define NO_SERVEALL`

Set to TRUE disable the server's ability to serve documents not listed in an index.cache file. See the chapter "Creating Your WN Data Directory" in this guide for more details.

The default value is FALSE

`#define NO_SUCH_FILE_URL`

Specifies that any request in this directory for a non-existent file or a file not listed in the index.wn file of this directory should be redirected to the given URL. Can be overridden with the "No-Such-File-URL=" directory directive in the index.wn file.

The default is to give this the value "" which disables the feature.

`#define PUB_HTML`

Name of public directory for home directories. See the section "Alternate Hierarchies From a Table Lookup" in this guide for more details.

The default is "/public_html".

`#define RFC931_TIMEOUT`

Timeout in seconds for the server when attempting to do RFC 931 lookups on clients using the UNIX identd(8) system utility. This may be good for additional logging information, but should not be trusted for authentication. Comment out if no lookup is desired.

Because of bugs in many PC TCP/IP stacks that may cause identification connections to time out instead of being refused (certain firewall configurations may cause the same problem) this defaults to off, and the timeout should be kept low. Ten seconds is about as high as I recommend. Remember, this timeout will need to be fulfilled on every request from that client, so pages with lots of in-line images will become painful.

The default is to comment out this macro.

`#define ROOT_DIR`

Complete path of your root data directory.

The default is "/usr/local/wn".

`#define SWN_PID_FILE`

Full path of the file in which you wish the stand-alone server wnsd to deposit its process id on startup. Comment this out or set it to the empty string "" if you want the pid printed to the UNIX stdout(3) stream on startup instead of being placed in a file. This can be overridden at runtime by using the -q command line option.

The default is "/usr/local/wnlogs/wn.pid".

`#define TILDE_TABLE`

If defined to the path of a file containing name:directory pairs then URLs starting with "TILDE_USER_STRINGname/foo" will be changed to /foo and the root directory will be the directory after the ':' in the table. See the section "Alternate Hierarchies From a Table Lookup" in this guide for more details

The default is to comment out this macro.

`#define TILDE_USER_PWFILE`

If defined then the user's name will be looked up in the password file and the corresponding home directory with the string "#define PUB_HTML" appended will be the root directory. This will fail if the user's uid is less than "#define LEAST_UID". See the section "Hierarchies in User Home Directories" in this guide for more details.

The default is to comment out this macro.

`#define TILDE_USER_STRING`

URL indicating a user's home page. You can also use something like "/homepages/" or "/people/" for this but use "/~" to make "http://hostname/~user/" work. This appears to be a de facto standard. See the section "Hierarchies in User Home Directories" in this guide for more details.

The default is "/~".

`#define USE_VIRTUAL_HOSTS`

If defined to the full path to the file of virtual hosts, wnsd will support multi-homed or virtual hosts as described in the chapter "Multi-homed or Virtual Servers on the WN Server" in this guide.

The default is to comment out this macro.

`#define USERID`

The user id you want wnd to switch to when running under the UNIX inetd(8) system utility or when wnsd is run by root. When running under inetd(8) it should agree with the uid you set in the UNIX inetd.conf(5) configuration file.

The default depends on your system type but is generally something like nobody.

`#define VERBOSELOG`

By default WN uses the "Common Log Format" used by CERN httpd and NCSA httpd. Uncommenting this will cause additional information to be added to the end of each log line. In particular the User-Agent and Referrer are logged.

Note: I have chosen to use the correct spelling of Referrer in the log file. To enable interoperability and to comply with the CGI/1.1 specification it is necessary elsewhere to use the incorrect spelling Referer and the CGI environment variable HTTP_REFERER.

This can be overridden at runtime by using the -v command line option.

The default is to comment out this macro.

`#define VIRTUAL_HOSTS_FILE`

Full path to virtual hosts file. See the section "Using a Virtual Hosts List File" in this guide.

The default is to comment out this macro.

`#define WN_ERRLOGFILE`

Same as for "#define WN_LOGFILE" except for error messages. To override use the -l option.

The default is "/usr/local/wnlogs/wn.log.err".

`#define WN_HOSTNAME`

If the default value of "" is used WN will get the default hostname from your system. If you supply a name it should be a fully qualified domain name. WN will attempt to use the UNIX gethostbyname(3) system call to find the IP address of your server.

The default value is "".

`#define WN_LOGFILE`

Full path of the file you want to use for a log file. The values set here are overridden by the -L option. To use the UNIX syslogd(8) system utility run wnsd or wnd with the -S command line option. To do no logging set above to "", the empty string.

The default is "/usr/local/wnlogs/wn.log".

`#define WN_PAM_ENABLED`

Uncomment this to enable PAM based authentication if your system supports PAM (Plugable Authentication Modules). This support exists on Linux and Solaris at least.

Appendix H

Appendix I

How the Index to WN User's Guide Server Works

The file docindex.html was produced by the WN utility wnindexmaker. wnindexmaker produces a collection of HTML unordered lists with entries like:

<li> <a href="search=context?content-type">content-type</a>

This is an anchor to a relative URL. The complete URL would be:

http://host/docs/search=context?content-type

which asks the server to do a context search of the 25 files in the User's Guide with query term "content-type". When this is selected the server returns a list of all files in the directory with matches, with a sublist for each list consisting of the lines that contain a match. Selecting one of the highlighted search terms in a match line causes the server to return document positioned at the location of the matched item. This is part of all context searches. It is done by inserting an anchor on the fly at the location of the match.

Appendix J

Colophon for the WN User's Guide

colophon n. An inscription placed usually at the end of a book, giving facts about its publication. [< Gk. kolophon, finishing touch.] - The American Heritage dictionary

The User's Guide for the WN Server was written by John Franks who is also the author of the WN software. Editorial changes and cleanup of the HTML source for the User's Guide was contributed by Jean Pierre LeJacq. This colophon describes the approach and tools used to write and validate the HTML source code.

I.1 Goals

We had the following goals for the HTML source code:

I.1.1 Valid HTML

WN emphasizes the adherence the Internet standards such as HTML and HTTP. Basically, we want to practice what we preach in the documentation for WN.

The User's Guide was written using HTML 3.2 instead of the more recent HTML 4.0 since the former is currently better supported in both textual and graphical browsers.

I.1.2 HTML That Follows Good Style

As with any language, valid HTML can be written in a way that makes it difficult to maintain, non-portable and against the basic principles HTML embodies.

For example of the latter, we consider HTML primarily as a language that describes the semantic structure and not the visual representation of documents. Consequently, only minor use of HTML tags was made for formatting (e.g. centering of a documents title). Even these minor formatting decisions could be eliminated through the use of Style Sheets. However, these are currently not well supported by many browsers so they were not employed.

As an example or more minor portability issue, we check that there is no intervening white space in the start of an anchor, "<a>", tag. Browsers are inconsistent in how the display this extra white space.

I.1.3 Extensive Use of Valid Links

One of the great strengths of HTML is the linking of widely dispersed documents to create a seamless virtual document. We have provided extensive links to relevant standards, manual pages and sites to maximize this advantage. The User's Guide itself is extensively cross-linked. We also wanted to insure that the links were valid.

I.1.4 Example of Well Written HTML

One of the unfortunate consequences of the popularity of the Internet is the low quality of much of the HTML documents. Since learning by example is so important, we wanted to provide WN users an example of well written HTML.

I.2 Tools

I.2.1 Operating Systems

I.2.1.1 Linux

All of the editing and validation of the User's Guide was done using the Linux Operating System developed primarily by Linus Torvalds. We used the Debian GNU/Linux distribution developed by many volunteers on the Internet. Much of the editing was done as part of the packaging of WN for Debian.

I.2.2 Editors

I.2.2.1 Xemacs Editor

All of the editing of the User's Guide was done using the Xemacs text editor. XEmacs stems from a collaboration of Lucid, Inc. with Sun Microsystems, Inc. and the University of Illinois with additional support having been provided by Amdahl Corporation, Altrasoft and a huge amount of volunteer effort.

Much of the power of Xemacs is based on the lisp packages that are either part of the core Xemacs distribution or add-ons. We cannot list all of the packages but I do follow with several particularly important packages for editing HTML documents.

I.2.2.1.1 Emacs `hm--html-menus` Mode

The hm--html-menus package provides functions and various popup and pulldown menus for a HTML mode called hm--html-mode, a mode for writing HTML pages. The primary author is Heiko Muenkel. It provides also a minor mode, hm--html-minor-mode, which can be used together with another HTML major mode, like the psgml-html mode which is exactly what we did.

I.2.2.1.2 Emacs `psgml` Mode

psgml is a major mode for editing SGML documents. psgml contains a simple SGML parser and can work with any DTD. Functions provided includes menus and commands for inserting tags with only the contextually valid tags, identification of structural errors, editing of attribute values in a separate window with information about types and defaults and structure based editing. It also provides an interface to an external SGML parser, such as SP, to rigorously validate a file.

psgml is currently maintained by Lennart Staflin.

I.2.2.1.3 Emacs/W3 Browser

Emacs/W3 is a full-featured web browser, written entirely in Emacs-Lisp, that supports all the bells and whistles you will find in use on the web today, including frames, tables, stylesheets, and much more. We used as a quick validation of the basic presentation of the pages.

I.2.3 Validators

I.2.3.1 SP SGML Parser

SP is a free object-oriented toolkit for SGML parsing and entity management developed by James Clark. It includes the nsgmls program which parses and validates the SGML documents whose document entity is specified by the system identifiers sysid.

The Emacs psgml mode supports invoking nsgmls and parsing its output. We used the following sysid at the beginning of all HTML documents:

<!doctype html public "-//W3C//DTD HTML 3.2 Final//EN">

I.2.3.2 Weblint HTML Style Checker

Weblint is a Perl program for syntax and minimal style checking of HTML developed by Neil Bowers. We use in conjunction to SP to prevent the use of valid HTML that is either difficult to maintain or poorly supported by current browsers.

I.2.3.3 Linbot HTML Link Checker

linbot is a Python program that allows webmasters to: view the structure of a site; track down broken links; find potentially outdated HTML pages; list links pointing to external sites; view portfolio of inline images; and do all this periodically and without user intervention. Results are displayed in a set of HTML pages.