This virtual document contains all the sections of the WN User's Guide. It is provided to allow downloading and printing of the guide. The latest on line version of this guide can be found at http://hopf.math.northwestern.edu/docs/manual.html.
inetd(8)
index.wn
filewndex
Utilityindex.wn
Filesection
Markerwndex
Utility#if
and #elif
An HTTP server should do more than just serve files. It should play an active role in both navigation and presentation issues. It is my hope that this server provides better tools for the creative webmaster. - John Franks
WN is a server for the Hypertext Transfer Protocol HTTP/1.1. Its primary design goals are security, robustness, and flexibility, in that order. One of its objectives is to provide functionality usually available only with complex CGI programs without the necessity of writing or using these programs. (Of course CGI/1.1 is fully supported for those who want it). Despite this extensive functionality the WN executable is substantially smaller than the CERN httpd, NCSA httpd or Apache servers.
WN was planned with a focus on serving HTML documents. This means such things as enabling full text searching of a single logical HTML document which may consist of many files on the server, or allowing users to search all titles on the server and obtain a menu of matching items, or allowing users to download a total logical document for printing which, in fact, consists of many linked files on the server. All of these are done in a way which is transparent to the user (and largely transparent to the maintainer)! The "User's Guide for the WN Server", which this chapter is part of, provides a good example of many of these features.
Another feature not found in many other servers is conditionally served text. Often a server maintainer may wish to serve different versions of a document to different clients. By adding simple HTML comments to documents and marking those documents to be "parsed" by the server, the maintainer can arrange that different sections or entirely different documents are sent to clients, based on such things as the client's domain name, IP address, browser type, browser "Accept" header, "Cookie header", etc. This feature is described in more detail in the section "Conditional Text: If, Else, and Endif" in this guide.
But these are only examples of many new tools WN makes available to webmasters.
The design and security mechanisms of WN differ substantially from those of the httpd servers available from CERN and NCSA so a brief description of how they work is useful.
Files served by an HTTP server may have many attributes relevant to their serving. These attributes include content-type, optional title, optional expiration date, optional keywords, whether the file should be parsed for server-side includes, access restrictions, etc. Some servers try to encode this information in ad hoc ways, in a file name suffix, or in a global configuration file. The approach of WN is to keep this information in small databases, one for each directory in the document hierarchy.
The WN maintainer never needs to understand the format of these
database files (named index.cache
by default), but this
format is very simple and a brief description will indicate how
WN works. When the server receives a request, say for
/dir/foo.html
, it looks in the file
/dir/index.cache
which contains lines like:
file=foo.html&content=text/html&title=whatever...
If the server finds a line starting with "file=foo.html
"
then the file will be served. If such a line does not exist the file
will not be served (unless special permission to serve all files in the
directory has been granted). This is the basis of WN security.
Unlike other servers, the default action for WN is to deny
access to a file. A file can only be served if explicit permission to do
so has been granted by entering it in the index.cache
database or if explicit permission to serve all files in
/dir
has been given in the index.cache
file in
/dir
. This database also provides other security functions.
For example, restricting the execution of CGI/1.1 programs can be done
on the basis of the ownership (or group ownership) of their
index.cache
files. There is no need to limit execution to
programs located in particular designated directories. The location of a
file in the data hierarchy should be orthogonal to security restrictions
on it and this is the case with the WN server.
The index.cache
database file has a number of other
functions beyond its security role. Attributes of foo.html
which can be computed before it is served and which don't often change
are stored in the fields of the line starting file=foo.html
.
For example, the MIME content type "text/html
" must be
deduced from the filename suffix ".html
". This is done once
at the time index.cache
is created and need not be done
every time the file is served.
The title of a file is another example. With the WN server
every file served has a title (even binaries) and optionally has a list
of keywords, an expiration date, and other fields associated with it.
For an HTML document the title and the keywords are automatically
extracted from the header of the document and stored in fields of that
file's line in its index.cache
file. These are used for the
built-in keyword and title searches which the server supports. The
maintainer also has the option of adding his own fields to this database
file. They could contain such things as document author, document id
number, etc. These user defined fields can be searched with the built-in
WN searches or their contents can be inserted into the document,
on the fly, as it is served
So how are the index.cache
databases created? Their format
is quite simple and a maintainer is free to create them any way she
chooses, but normally they are created by the utility wndex
(pronounced
"windex"). This program, which is part of the WN distribution,
is designed to produce the index.cache
file from a file with
a friendlier format with the default name "index.wn
". A very simple index.wn
file might look like:
File=foo.html
File=clap.au
Title=Sound of one hand
clapping
File=hand
Title=Picture of one hand
clapping
Content-type=img/png
Of course if the file hand
were named hand.png
the content-type line would not be necessary as wndex
could deduce the type
from the .png
suffix. Likewise it is not necessary to give
a title for foo.html
because wndex
will read the HTML
header from that file and extract the title and perhaps other things like
keywords and expiration date.
The WN server has several features which are not available with other servers or only available through the use of CGI/1.1 programs.
One of the design goals of WN is to provide the maintainer with tools to create extensive navigational aids for the server. A variety of search mechanisms are available.
<http://host/dir/search=title>
the server will
provide an HTML form (automatically generated or prepared by the
maintainer) asking for a regular expression search term. When supplied
the server will search the index.cache
files in
/dir
and designated subdirectories for a items whose
titles contain a match for the search term. An HTML document with a
menu of these items is returned.
<META>
headers. For other documents (or HTML
documents) they can be manually supplied in the index.wn
file.
index.wn
file. Their purpose is to include items like a
document id number, or document author in the index.cache
database. A field search could then produce all documents by a given
author for example. Or using regular expressions in the search term
produce a list of all documents whose id number satisfy certain
criteria.
text/*
documents in one directory (not subdirectories).
The returned HTML document contains a list of all the titles of
documents containing a match together with a sublist of the lines from
those documents containing the match. This provides one line of
context for the match. For HTML documents the matched expression in
each of these lines will be a highlighted anchor. Selecting one takes
you to the document with your viewer focused on the matching location.
The primary intent of this feature is to provide full text searching
for an HTML "document" which might consist of a substantial number of
files.
grep
searches
grep
search returns a
text/html
document containing the lines in the file
matching matching the regular expression.
wn_mkdigest
utility
which creates HTML documents to be searched in this way from files with
internal structure like mail or news digests, mailing lists, etc.
All of the searching methods listed above except the index searches are
built into the server and require no additional effort for the
maintainer. They are simply referenced with URLs like
<http://host/dir/search=context>
where
/dir
is any directory containing files to be served and an
index.cache
listing them. Of course search permission can
be denied for any directory or any file contained in that directory.
The WN server has extensive capabilities for automatically including files in one which is being
served or "wrapping" a served file with another, i.e. pre-pending and
post-pending information to a file being served. This latter is useful
if you wish to place a standard message at the beginning or end (or both)
of a large collection of files. For security all files included in a
file or used as a wrapper for it are listed in that file's
index.cache
file. This combined with various available
security options, like requiring that a served file and all its includes
and wrappers have the same owner (or group owner) as the
index.cache
file listing them, provide a safe and productive
Web environment.
One important application of wrappers is to customize the HTML documents returned listing the successful search matches. If a search item is given a wrapper the server assumes that it contains text describing the search and it merely inserts an unordered list of links to the matching items.
In addition to including files the output of programs may be inserted and
the value of any user defined field in the index.cache
database entry for a file may be inserted.
Also parsed text may conditionally insert items with a simple if - else - endif construct. based on
Accept
headers, User-Agent
headers,
Referer
headers etc.
An arbitrary filter can be assigned to any file
to be served. A filter is a program which reads the file and has the
program output served rather than the content of the file. The name of
the filter is another field in the file's line in its
index.cache
file. One common use of this feature is for
on-the-fly decompression. For, example, a file can be stored in its
compressed form and assigned a filter like the UNIX zcat(1)
utility which uncompresses it. Then the client is served the
uncompressed file but only the compressed version is stored on disk. As
another example, you might use the UNIX nroff(1)
utility, "nroff -man
", as a filter to process UNIX man files
before serving. There are many other interesting uses of filters. Be
creative!
An arbitrary range of a file can be served if
the server is accessed via a URL like
<http://host/dir/foo;lines=20-30>
and
file
is any text/*
document it will return a
text/plain
document consisting of lines 20 through 30 of
file foo
. This is very useful for structured text files
like address lists or digests of mail and news. A WN utility
called wn_mkdigest
will produce an HTML document with a list of links to separate sections
(line ranges) of the structured file. The wn_mkdigest
utility is
executed with two regular expressions as arguments: one to match the
section separator and the other to match the section title. For a mail
digest, for example, these could be "^From
" and
"^Subject:
" respectively. Then the sections of the virtual
documents would be delimited by a line starting with "From
"
and would have the message subject as their title. A similar mechanism
provides byte ranges from files.
Get the source file from http://hopf.math.northwestern.edu/
(U.S.) or http://www.wnserver.org/
(Europe) then uncompress it and untar it to make the
WN source directory hierarchy. The file must be uncompressed
with the GNU compression utility gzip(1)
(or gunzip(1)
).
The resulting file wn.tar
should be unpacked with the UNIX
tar(1)
utility using "tar -xvf wn.tar
". The top level of
the directory created by untarring this file contains several
directories, including: wn
, wndex
,
wnauth
and docs
.
If your system supports perl, the
quickest way to get your server configured is to run the perl program "configure
"
which is in the main source directory. Do this with the command:
perl configure
This program will ask you various questions, like what version of UNIX
you are using and the path to the directory you want to be the
wnroot
of your data hierarchy.
Default answers are printed in square brackets [ ]
so
you can simply press return to enter that value. You can quit at any
time by pressing Ctrl-C
and nothing should be changed. If
you want to try it once to see what the questions are, that is fine.
This program creates two files: config.h
and
Makefile
which are customized based on the answers you gave.
You may rerun this program as many times as you like. The first time you
run the program the default values are those in the file config.h.dist
. Subsequent
times the default values for all question answers are taken from the most
recent config.h
you have
produced (if it still exists in the top level directory).
An alternative to running this program is to copy the files
Makefile.dist
and config.h.dist
to
Makefile
and config.h
respectively and edit
them manually. If you want to use some of the features which are not
turned on by default like multiple IP interfaces, you will have to edit
at least config.h
. I
recommend starting with the perl
program and getting your server up and running. Then you can go back
browse through config.h
to
see if there are things you want to change. If there are you will have
to recompile but that takes only a few minutes.
Here are some of the questions you will be asked when you run
configure
. You will be given a list of supported operating
systems and asked to pick the one you are using, e.g. SUNOS
,
SOLARIS2
, AIX
, LINUX
, etc. You
will be asked the complete path name of your data directory. You will
also have to enter the names of the access and error
log files you wish to use (they can be the same file). If you don't
want logging or you want to use the UNIX syslogd(8)
system utility (i.e. the -S
option) then these
should both be defined to be the empty string (i.e. a pair of double
quotes with nothing between them like ""
). If you specify
the names of these log files then you must make sure that either these
two files exist and are writable by the server or that they are files in
a directory where the server has permission to create them.
Additional customizations in config.h
are possible but
should not be needed. These customizations require that you manually
edit the config.h
file. For
example, there is a #define DEFAULT_URI
line in the config.h
file.
You may also customize the file Makefile
in the top level
directory. In particular you should do this if you wish to specify a C
compiler other than the UNIX cc(1)
utility (e.g. gcc)
should be used for compiling. Also some systems require that special
libraries for sockets, or whatever, be mentioned in the compile command.
The configuration program attempts to do this, but I am working from user
reports since I do not have access to most of the UNIX variations. If
they are incorrect please let me
know.
In the top level directory do a make(1)
to produce the server wnd
, the stand-alone version
wnsd
and the utility wndex.
This utility is used to produce index.cache
files for use by
the server. If the make
proceeds without problem you should
next do a "make install
". This will strip the binaries
and place them in the top level bin
directory or whatever
directory you specified when you ran the configure
program.
If you specified a log file name or error log file name when you ran the
configuration program or edited config.h
you will need to make
sure that these files exist and that they are writable by the user id
under which the server will run. The best way to do this is create the
files as root
("touch wn.log
"), then
change their ownership to the appropriate user
("chown nobody wn.log
") and finally set the
permissions appropriately ("chmod 644 wn.log
").
An alternative is to create a directory in which these files will reside
and make sure that the user nobody
has permission to create
files in this directory. Then the server will create the files with
proper ownership and permissions.
You can now either run the server as a stand-alone daemon, the
wnsd
executable, or run under the UNIX inetd(8)
system utility, the wnd
executable. We first describe the
stand-alone version. Run this with the command:
wnsd -p port [other options] wnroot
where "port
" is the number of the port on which you wish the
server to run. If this is a non-privileged port (i.e. > 1024) then
wnsd
can be run as an ordinary user. However, for
privileged ports like 80 you must run the command above as
root
. If wnsd
is run without the -p
option it will use port
80 by default. If wnsd
is run by root
then
when it starts up it will change its user id to the one set when running
the configuration program or by editing the config.h
file line containing
#define USERID
.
Otherwise it will have all the permissions of the user who runs it.
The safest practice is to use the numeric UID of nobody
for
the USERID
set in config.h
(this is the default)
and then start the server as root
.
Note: on HPUX and perhaps other systems usernobody
cannot be used. In this case just create a new user, say "www
", with the fewest possible privileges and no shell.
The server needs to have root
permissions to connect to a
socket on a privileged port and listen for requests. But immediately
after doing so it will change its user id to that of nobody
and have minimal access permissions. In this situation the user
nobody
needs to have only read permission to your server
data and should not own or have have write permission. In particular
nobody
should not have ownership or write access of the
index.cache
database file described in the chapter "Creating Your WN Data Hierarchy" of
this guide.
inetd(8)
The other way to run the server is to use it under the UNIX inetd(8)
system utility. This is an efficient way to run the server if the load on
it is relatively light (a few thousand hits per day) and the host on
which it runs is used for other purposes. There are variations on how
inetd(8)
works from system to system so you may need to look
at the man page for the UNIX inetd.conf(5)
configuration file. Here's how it works under many systems (e.g. SunOS
4.1.3): Edit the file UNIX services(5)
configuration file and create the line:
wnd 80/tcp
or replacing 80
by the port you wish to use. Then edit the
file inetd.conf(5)
and insert the line:
wnd stream tcp nowait nobody /path/wnd wnd
After the last wnd
you can have optional arguments to turn
on logging or use a different data directory. Some inetd(8)
limit the number of arguments you may use so I prefer to use a small
program in place of wnd
here. My inetd.conf(5)
line looks like:
wnd stream tcp nowait nobody /path/wn.rc wn.rc
and wn.rc
contains only the two lines:
#!/bin/sh
exec /path/wnd -t 202 -L /path2/logfile wnroot
It is important to run wnd
as nobody
(the fifth
field in the inetd.conf(5)
line above) or some other user
with no special access privileges. If you are using an
inetd(8)
with without the capability to set UID on startup
(e.g., Ultrix), you should define the group ID and user ID in config.h
so that the program is
not running as root
(look for the #define USERID
and
#define GROUPID
and set the values appropriately). It should never be
necessary to run wnd
under inetd(8)
as
root
and to do so would be a serious mistake for maintaining
security. Every attempt has been made to make wnd
as secure
as possible, even if it is run as root
, however, no program
accessible to remote users on the Internet can be assumed perfectly
secure. See the chapter "Security on the
WN Server" in this guide.
After editing the inetd.conf(5)
and services(5)
files you should find the process id number of the inetd(8)
process and use the UNIX kill(1)
utility to reload the configuration using
"kill -HUP <process_id>
". This must be done
as root
. You find the number
"<process_id>
" by using the UNIX ps(1)
utility or by looking at the contents of the file (which you specified
when you ran the configure
program) in which
the server stores this number.
If the fully qualified domain name of your server is abc.com
you might like to have your server known as www.abc.com
or
some other "vanity" name. For most purposes this is simply a matter of
properly setting up Domain Name
Service (DNS) on your system so that the system responds to the
desired name.
Note: To use multiple vanity names for different IP addresses on a single server see the chapter "Multi-homed or Virtual Servers on the WN Server" in this guide.
There are a few instances, however, where the WN server does use its own hostname. Ideally, in my opinion, the server should do nothing with its hostname and not even need to know it. This is not possible for two reasons.
First, the CGI/1.1 protocol requires the server to pass its hostname to CGI programs in an environmental variable whenever those programs are run. Secondly clients often implement redirection so that it cannot handle relative but only complete URLs. (This is a mistake in my view, but one we have to live with.) Thus when a server redirects to another local document it must supply its own hostname. These are the only places WN uses hostname.
For most cases then, WN only uses it hostname when a redirection
is done. This happens in several circumstances. The most common is when
a request is made for a directory any the trailing '/
' is
left off of the URL.
So how does WN know its hostname? When you run the configure
program you are
queried for the value you want or you have the option of using a system
call at the time the server is run. This value is placed into the config.h
header file and
compiled into your server. In the file config.h
the #define WN_HOSTNAME
macro is set by default to the empty string. If this is not changed the
server will get its name from the UNIX gethostbyaddr(3)
system call. If this is set to another string that string will be used.
If you are using WN as a multi-homed server then you need to set
different names for the different IP addresses. This is done in the file
wn/vhost.h
which you edit to set up the correspondence
between IP addresses and wnroot
directories.
After compiling and setting up the software you can test it on a sample
directory provided with the distribution. To do this first make a
symbolic link in your wnroot
data directory to the
docs
directory in the source distribution. The command
"ln -s /your/src/dir/docs docs
" executed in
the wnroot
directory should do this. If your system does
not support symbolic links you can copy this directory and its
subdirectories to your data directory temporarily.
Now you are ready to test your server installation on this directory. Try it with your favorite HTTP client. The URL should be:
http://localhost/docs/index.html
If your are running under UNIX inetd(8)
system utility as described above then to shut down
the server first remove or comment out the line you created in the UNIX
inetd.conf(5)
configuration file. Then you should again find the process id number of
the inetd(8)
process and run the UNIX kill(1)
utility using "kill -HUP <process_id>
" where
"<process_id>
" is the process id number of
inetd(8)
just as you did to start WN.
If you are running wnsd
, the stand-alone version of
WN, you should find the process id number of the running
wnsd
by using the UNIX ps(1)
utility or by looking at the contents of the file (which you specified
when you ran the configure
program or by
using the -q
option) in
which the server stores this number. Then you run the UNIX
kill(1)
utility using
"kill <process_id>
" where
"<process_id>
" is the process id number of
wnsd
. If you started the server as root
you
should be root
to kill it.
There are two ways to log WN transactions: dedicated log files
or using the UNIX syslogd(8)
system utility. We first describe dedicated log files.
Normally when you use WN you will keep two log files. The first
is a log of all "normal" transactions and the second records error
conditions or items which might require your attention. For example, if
the server cannot find a file which your index.wn
file
indicates should be served it will log an error. The error log file can
be the same file used for the normal transaction log. In general the
difference between the two is that the error log gets information about
anything which might require attention of the maintainer while routine
transactions and errors which are simply user errors tend to go to the
regular log. The intent is that a conscientious maintainer should keep
an eye on the error log but need not read the (much larger) log of
regular transactions.
There are two ways to tell the server the names of these files. The
first is by supplying the file names when you run the configure
program and then
compiling these into your server. And the second is by supplying the
file names on the command line when you execute the server. This is done
with the -L
option and
the -l
option to specify
the transaction and error log files respectively.
For example, executing the command:
wnsd -L /path2/logfile -l /path3/error.log wnroot
will cause the server to use "logfile
" and
"error.log
" as the log file and error log respectively. Of
course, it is necessary for the server to have write permission to these
files and execute permission on the directory containing them.
A good way to achieve this if the server is running as
nobody
is to create the files yourself and change their
ownership to the user nobody
. This can be done, for
example, with the commands:
touch logfile
/usr/etc/chown nobody logfile
chmod 600 logfile
executed as root
in the directory where the log file is to
reside. The first of these commands creates the file
"logfile
". The second makes nobody
the owner
and the third gives nobody
(and no one else except
root
) permission to read and write this file. You might
want to allow others to read, but not write to the log file, or security
of the log file might not be a concern.
Thus a program executed by the UNIX crond(8)
system utility to rotate log files might look like:
cd /path2
mv logfile logfile.old
touch logfile
chown nobody logfile
chmod 600 logfile
kill -HUP `/bin/cat /path/to/wn.pid`
chown maintainer logfile.old
chmod 600 logfile.old
where /path/to/wn.pid
is a file containing the processes id
of the server created by using the -q
option or by specifying
this filename when the configure
program is run.
If neither of these has been done the stand-alone server,
wnsd
, will print its process id on the UNIX stdout(3)
stream when it is run. If you are using wnd
under inetd(8)
there is no need to send the -HUP
signal as the server must
close this file after each transaction.
If you are using a system that supports the logrotate
program, it provides a good way to handle logs. Here is an excerpt
from a typical logrotate.conf file.
/var/log/wnsd/error_log {
daily
}
/var/log/wnsd/access_log {
daily
postrotate
/usr/bin/killall -HUP wnsd
endscript
}
This causes the logs to be rotated daily. The command
/usr/bin/killall -HUP wnsd
assumes that your
system supports the killall
command, in which
case it finds the process id of the wnsd process and sends it
the -HUP signal, as was done by the kill
command.
in the script above.
There are three formats which the server can use in writing its log files, each with three options concerning how much DNS lookup to do for the remote host. The two most common formats are "verbose" and "common log format". The verbose mode is essentially the common log format but with the user-agent, referrer, HTTP cookie, and virtual server nickname appended to the line for that transaction as well as better transaction error messages if necessary.
You can chose between verbose and common log formats by answering the
relevant question when running the configure
program before
compilation (or by editing config.h
).
To use the third format you need to use the -v
command line
option. When the server is invoked with the -v
option it will
write a log file in the format specified by the value of this
option. The legal values for this option are
"common
", "verbose
", and
"ncsa
". They cause the log file to be written in
the so-called common log format, or WN's verbose format
including user agent, referrer, X-Forwarded-For header, virtual
server nickname, and cookies, or in the NCSA extended format
which includes just referrer and user agent. When using verbose
logging the nickname (enclosed in angle brackets) will be the
last field of each log entry. If virtual hosting is not being used
this field will be omitted.
More precisely a verbose log line begins with a normal "common" log format line and then adds the following:
<(pid/count) msg1: msg2> <user_agent> <referrer> <cookie> <X-forwarded-for> <nickname>
The punctuation characters, i.e., < > ( )
and
:
, will always be present in this order. The
fields "msg1
" and "msg2
" may contain
additional parentheses or colons. Hopefully none of the fields
will contain the character '<
' or the character
'>
', but "referrer", "cookie", "X-forwarded-for"
and "nickname" are provided by the browser or the server
maintainer so WN has no control over them.
The fields are as follows:
field | description |
---|---|
pid | Process id of the process serving the transaction. |
count | n if this is the nth transaction of this (keepalive) connection. |
msg1 | Description of transaction. May be sent to user. |
msg2 | Description of transaction. Information NOT sent to user. |
user_agent | From the HTTP user agent header. |
referrer | From the HTTP referer (sic) header. |
cookie | From the client's cookie header. |
X-forwarded-for | From the client's X-Forwarded-For header. |
nickname | Value assigned this virtual host by the maintainer (omitted if virtual hosting is not used). |
The NCSA format will likely only be of interest if you want to use log
processing tools which expect this format. If the -v
option option is not
specified the server will default to either the common log format or the
WN verbose format depending on which was selected when the configure
program was run.
The utility wnv2c
can
convert verbose log files to log files in the shorter common log format.
The WN server does not send the UNIX stderr(3)
stream output to the error log file, but leaves its default the terminal
from which the server is invoked. This allows the maintainer to set it
to a file of her choice, for example the error log, or leave it directed
to the console window in which wnsd
was invoked. To
redirect it to a file called my.errs
simply run
wnsd
with a command like
"wnsd <options> 2>my.errs
" if you are
using a Borne-like shell like sh(1)
. The server itself
sends very few things to stderr(3)
-- only errors which it
is impossible to put in the error log (like "Can't open error log file").
The real usefulness of redirecting stderr(3)
comes when you
are creating CGI/1.1
programs because their errors are typically sent to
stderr(3)
so you can easily view them rather than have them
buried in a log file.
If things are not working as they should here are some tips to help you isolate the problems.
If the compilation was successful you can check the server itself by executing it from the command line. If you use the command:
wnd wnroot
it should run and pause for input. Type the line:
GET /<ret>
and in response wnd
should print the raw HTML of the
index.html
file in your top level directory (perhaps along
with a message about not being able to open a log file). If instead you
type:
GET /docs/overview.html<ret>
(and you still have the /docs
subdirectory in your top level
directory) the overview document should be sent to your screen. If this
doesn't happen there should be an error message which may be helpful.
Better error messages are placed in the log file so you may want run
wnd
again with the additional arguments
"-L logfile
" and then examine the contents of the log
file. Or if you run "wnd -L /dev/tty
" the log
entries will be printed to your screen instead of being put in a file. If
the server can't open a file, for example, the name of that file will be
recorded in the log file. Check its permissions. Remember that all files
that wnd
serves must be world readable. More serious errors
are put in a separate error log. So you might want to try the command
"wnd -L file -l file2
" and then type the
GET
requests described above.
If this succeeds you should run the server for real, either under inetd(8)
or stand-alone. In order to use port 80 the server
must be started by root
. It will then switch to user
nobody
. It does this immediately after connecting to port
80, before it does anything else including opening its log file. If you
get a message that the server cannot open its log file then either you
have specified putting the log file in a directory where user
nobody
does not have permission to create files or you have
specified an existing file which the server does not have permission to
write.
After starting the server a useful test is to use the UNIX telnet(1)
utility to connect to your server at port on which you are running. You
should get a connection message and a pause for input. If you get a
"Connection refused
" message and you are running under
inetd(8)
, it is likely there is a problem with your
inetd(8)
setup or for some reason your system can't find or
can't execute the wnd
binary. If you are using
wnsd
this message means that wnsd
is not in
fact running.
If you still have problems feel free to ask questions on the WN list server. There are many helpful people there. But it is a good idea to try the steps above first and to include the relevant log file messages with your request.
index.wn
file
In each directory of your data hierarchy you create a file called
index.wn
with information about each file you want to serve.
The simplest index.wn
file might contain the single line:
Attributes=serveall
which when properly processed will grant the server permission to serve
any file in the directory (but not in subdirectories). For more
information about this directive see the section on the serveall
attribute below. A more
elaborate index.wn
file might look like the following:
Owner=mailto:webmistress@host.edu
File=file.txt
Title=This is a descriptive
title for file.txt
# This is a comment
File=file2.html
File=soundfile
Title=This plays some sounds
Content-type=audio/basic
The file contains four groups of lines called records. The first record
(the single line starting Owner=
in this example)
describes properties of the directory and is called the directory record. It can be empty, but in
general it is a good idea for the directory record to contain an owner
line, like the one above, referring to the maintainer of the directory.
The remainder of this index.wn
file has three file records describing three files,
file.txt
, file2.html
and
soundfile
, in the directory which we wish to serve. The line
starting with '#
' is a comment. Wherever a '#
'
occurs the remainder of that line is treated as a comment (i.e. ignored).
The index.wn
file is processed with the utility wndex
(pronounced "windex") to produce a
small database called index.cache
containing information
about this directory and its contents. Detailed information on the wndex
utility is given below, but simply
running it with no arguments in a directory containing an
index.wn
file will produce the index.cache
file
for that directory. This file contains all the information in the
index.wn
file plus additional information gathered
automatically about the files to be served. In particular the
index.cache
file will list the names of the files given in
the File=
lines of the
index.wn
file. Any file on the server whose name is not listed
in an index.cache
file will not be served. This is the
basis of WN security. For security reasons the server will
refuse to use any index.cache
file which is in reality a
symbolic link to another file.
The index.cache
database has a number of other functions
beyond its security role. Attributes of the files listed in the
index.wn
file which can be computed before they are served and
which don't often change are stored in the index.cache
file.
For example, the MIME
content type of soundfile
is read from the Content-type=
line.
The other files do not need such a line since wndex
can deduce from the file name
extensions that file.txt
has type text/plain
and file2.html
has type text/html
. This is
done once at the time index.cache
is created and need not be
done every time the file is served. By the way, if the sound file were
named soundfile.au
it wouldn't need a
Content-type
line either.
The title of a file is another example of information stored in the
index.cache
file. With the WN server every file
served has a title (even binaries) and optionally has a list of keywords
associated with it. For an HTML document the title and the keywords are
automatically extracted by wndex
from
the header of the document and stored in fields of that file's line in
index.cache
. These are used for the built-in keyword and
title searches which the server supports.
The files which you wish to serve should be owned by you, or by their
creator, or by whoever is in charge of maintaining them. They should not
be owned by nobody
or whatever user id the server runs under
as set in config.h
.
This because the nobody
id should have the minimum
permissions possible. It needs to have read access to the files to be
served, but it has no need to be able to write to those files or alter
them in any way.
Thus normally the files served might be owned by the maintainer and have their permissions set to be world readable but writable only by the maintainer (or by no one).
Likewise the index.cache
file which controls access to
everything in a directory should be owned by the maintainer of that
directory and the only permission nobody
should have for
this file is read permission. In fact, for security reasons it the
server was started as root
(and then switched to a safer
user like nobody
) wnd
or wnsd
will
refuse to use any index.cache
file which is owned by the
user id (e.g. nobody
) under which the server is running.
This restriction does not apply if wnsd
is run on an
unprivileged port by an ordinary user, because such a user might not be
able to make index.cache
files owned by someone else.
There is one exception to the rule of having nothing owned by
nobody
(and that's not a double negative). The exception is
the log files. These files must be writable by the server and it
generally seems sensible to have them owned by the user
nobody
under whose identity the server runs. The log file
and the error log file can be specified on the command line when the
server is run or can be set in the config.h
with the #define WN_LOGFILE
and #define WN_ERRLOGFILE
macros.
wndex
Utility
Before describing the index.wn
file in greater detail we
briefly explain the use of the program which reads this file and produces
the index.cache
database file. Simply running
wndex
with no arguments in a directory containing a file
named index.wn
causes that file to be read and a file called
index.cache
to be created in that directory.
There are several command line arguments for wndex
. The -r
option causes
wndex
to recursively descend your data hierarchy using all
subdirectories listed in the Subdirs=
line of the
directory record in the index.wn
file (see below).
The -i
and -c
options specify an
alternate name for the index.wn
file and the
index.cache
file respectively. For example the command:
wndex -i foo -c bar
will attempt to use foo
as the index.wn
file and
produce the file bar
instead of index.cache
.
The -d
option specifies
a directory other than the current directory in which to find the
index.wn
file and in which to create the
index.cache
and index.html
files.
Finally the -q
option
(for quiet) suppresses the printing of any warning or informational
messages by wndex
.
The first group of lines in an index.wn
file provides
information about the directory itself and the collection of files it
contains rather than about any single file in the directory. It is
called the directory record. This beginning collection of lines might
look like:
Owner=mailto:you@host.edu
SearchWrapper=dir_search_wrap
Accessfile=/dir/access
Subdirs=dir1,dir2,directory3
The Owner=
line
specifies the owner of items in the directory (which is used in the HTTP/1.1 headers sent by the
server).
The SearchWrapper=
line
specifies a "wrapper" for the various searches of the directory. That is
an HTML document which provides a customized response listing the
matching items in one of the various searches of the directory. For more
details see the chapter "Parsed Text and Server Side
Includes on the WN Server" in this guide.
The Accessfile=
line specifies the name of the file which controls access (by IP address)
to this directory. If this item is omitted then items in the directory
may be served to anyone. For more information on using the access
mechanism see the chapter "Limiting Access to Your
WN Hierarchy" in this guide.
Finally the line starting with Subdirs=
specifies the
subdirectories of this directory which you wish to have recursively
searched when a title or keyword search is done on this directory. More
information about searching can be found in the chapter "Setting Up Searches on the WN Server" in
this guide.
For a complete list of the possible lines (called "directives") which a directory can have see the section "Directory Directives" in this guide.
After the directory record line group an index.wn
file will
typically have groups of lines called file records describing a
particular file. A file record can be as simple as a single line like
the line "File=file2.html
" in the
example above or it can contain several lines describing the file. For a
complete list of the possible lines (called "directives") which a file
can have see the section "File Directives"
in this guide.
When someone sends a request to your server with only the server name and no file name like:
http://hopf.math.northwestern.edu/
the WN server automatically translates this to:
http://hopf.math.northwestern.edu/index.html
adding the file name "index.html
". More generally if a
request is made for a directory, say with the URL
http://host/dir1/dir2/
, this will be translated to a request
for http://host/dir1/dir2/index.html
.
If you wish the default file name in a particular directory to be
something other than "index.html
" you can use the Default-Document=
directive in the directory record of your index.wn
file to
change it. If you wish to change the default file name for all
directories on the server you can change the #define INDEXFILE_NAME
line in the config.h
file
and recompile.
index.wn
File
WN is also able to serve files without explicitly listing them
in an index.wn
or index.cache
file. This is done
by putting the line:
Attributes=serveall
in the directory record of the index.wn
file for a directory or
by running wndex
with the -a
option. Either of these
specify that any file in this directory, which does not start with the
character '.
', or contain a '~
', may be served,
not just those listed in the index.wn
file. The files
index.wn
and index.cache
will also not be served.
(Indeed if the -a
option
is used with wndex
there need not even
be an index.wn
file, because an index.cache
file
will be created just as if the Attributes=serveall
directive had been used.)
Note: When this directive is used in a directory protected by anaccessfile
or a password file be sure that these files have names that start with '.
', or contain a '~
'. Or better, put these files in a different directory from which nothing is served.
When the Attributes=serveall
directive is used the server will attempt to set the content type
correctly based on the file name suffix using the same default
correspondences between type and suffix that wndex
uses. Indeed when wndex
is run on a directory with the
Attributes=serveall
directive, it behaves as if all files in the directory (except those
starting with '.
' or containing a '~
') were
listed with a File=
directive. If the Attributes=serveall
line (and the corresponding entry it creates in the
index.cache
file) are not present then only the files
explicitly listed with a File=
directive will be
served.
The default correspondences between file name suffixes and MIME types are
specified in the "mime.types
" file. A default version of
the file is in /lib/mime.types
. The mime.types
file should be installed in a known location. The default location is in
the WN src
hierarchy, but this can be changed by
specifying a different value when the configure
program
is run or by editing the value of #define MIME_TYPE_FILE
in config.h
. The
mime.types
file exists so that you can add to it if you wish
to add new kinds of documents to your server. The format of the file is
explained in the file. If this file cannot be opened then wndex
will use compiled in defaults which
are the same as what is currently in the default version of this file.
The mime.types
file is read whenever wndex
is run so wndex
always knows the latest additions.
This file is also read by wnsd
(but not wnd
) on
startup for use with directories with the Attributes=serveall
directive. The wnsd
stand-alone server reads this file when
it is started or restarted, but only takes note of new suffixes and their
MIME types. You cannot change the MIME type corresponding to one of the
standard suffixes (as listed in the default mime.types
file). To do that you need to change the server source and recompile.
It is fine to have file records in an index.wn
file which also
has the Attributes=serveall
directive. In this case the file directives take precedence. Thus if
you had an index.wn
file consisting of:
Attributes=serveall
File=foo.html
Content-type=application/postscript
the server would consult the file record for "foo.html
"
first and see that it is of type application/postscript
(it
would be silly to actually do this, of course) and use that type. But
another file "bar.html
" in the directory would also be
served with the type indicated by its suffix. Files with no file record
in the index.wn
file and no recognized suffix will be given the
default content type which can set with the Default-Content=
directive.
When wndex
is run on an
index.wn
file with the Attributes=serveall
directive all the files currently in that directory which can be served
are given entries in the index.cache
file. Title and keyword searches only see files listed in
an index.cache
file. Likewise context and grep searches only seek matches in files
listed in the index.cache
file. Thus if a file is added to
a directory with the Attributes=serveall
directive it will not be visible to searches unless wndex
is re-run in that directory. If it
has not been re-run the file will still be served, however. Still, it is
good practice to re-run wndex
every
time you add or delete a file in a directory with the Attributes=serveall
directive. (Of course, it is required to do this for a directory without
the Attributes=serveall
directive.) There is no need to re-run wndex
if you only change an existing file,
unless you change its title or keywords.
There is no way to use wrappers or includes for
files not listed in the index.wn
file. So generally, the few
seconds it takes to add a document's name and a descriptive title to your
index.wn
file and then to run wndex
will pay off.
If you do not wish the Attributes=serveall
directive to be allowed on your server you can disable it by uncommenting
the "#define NO_SERVEALL
"
line in the config.h
file.
This does not affect the ability of wndex
to write index.cache
entries for all files in a directory with the Attributes=serveall
directive. But it means the server will only serve files listed an
index.cache
file.
There are three situations when the client request will be denied but for which you can supply customized error messages. These are requests for non-existent files, requests for files which require a password but for which no valid password was given, and requests from an invalid host for files limited to certain hosts. The lines:
No-Such-File-URL=http://host/dir/nosuch.html
Access-denied-URL=http://host/dir/noaccess.html
Auth-denied-file=~/dir/nopassword.html
in a directory record of an index.wn
file specify URL's to which clients are redirected when a non-existent
file is requested and when a document protected by an access control file is requested from an
invalid host. The last line specifies a file to be sent when a password
protected file is requested without a password or with an invalid
password. For technical reasons it wouldn't work to have this be a
redirection.
In the first two lines above (specifying redirection) the URL's given can be relative URL's, so the lines:
No-Such-File-URL=/dir/nosuch.html
Access-denied-URL=noaccess.html
are valid. Default values for these three directives may be specified by
editing the config.h
file
and recompiling the server. More information on customized error
messages can be found in section "Directory
Directives" in this guide.
A great deal of effort has gone into attempting to make WN as secure as possible. Security has received the highest priority in all design decisions. This is not grounds for WN maintainers to feel they can lessen their vigilance, however. The first thing you should be aware of is that there is a trade-off between security and functionality. You can have high security and restricted functionality or lower security with greater functionality, or something in between. WN is designed to let the maintainer choose the point on this continuum he or she is comfortable with. This document tries to discuss the various options you as a maintainer will have and what the implications of your choices are.
First, it is important to understand possible threats to the integrity of a system running the WN server. There are two types of threat which this document addresses separately: (1) external, from a client or purported client on a remote host, and (2) local, from a user with an account on the server host.
After reading this section you may wish to look at the section "File Ownership and Permissions" in this guide.
The maintainer's objective is to prevent any unauthorized access to (or
alteration of) files on the host system. Programs run on the server with
the CGI/1.1 protocols
cause special problems and are discussed separately below. If you do not
need to use any executable programs you should run the server with the -e
option. This option
disallows any attempt to execute a command on your server and does not
allow any data sent by a client even to be written to a temporary disk
file. In this situation the key to WN security is twofold: no
document is served without explicit permission from the maintainer; and
nothing is written to disk on the server except the log file.
The basic philosophy of WN security is that by default no client requests are granted. Permission to serve a document must be explicitly granted by the maintainer. The WN server keeps a small database in each directory of its data hierarchy which contains information about files to be served from that directory. In particular no document can be served unless explicit permission to serve it is given in such a database.
Note: For more information on these database files the chapter "An Overview of the WN Server" in this guide is a good place to start. These files are very easy to create and maintain. See the chapter "Creating Your WN Data Directory" in this guide.
Despite this strong security foundation several additional steps are prudent. The most important is that the maintainer must assure that no untrusted person has write access to any part of the WN hierarchy. For example an incoming anonymous ftp directory should never be part of a WN hierarchy (better yet don't have one at all), because an attacker might be able to put a database there granting illicit access to some documents on the server system for which the user id running the server has read permission. There are several defenses against such a counterfeit database and we discuss them next.
index.cache
Files
All security control for the WN server resides in the per
directory database files (these files have the default name
index.cache
). Consequently it is extremely important to
guarantee their integrity. There are several command line options for
the server which help protect against counterfeit
index.cache
files.
The -t
or -T
option to
wnd
and wnsd
allow you to specify a trusted
owner or group owner (not both) for index.cache
files. When
invoked with only the -t
argument (or the -T
argument) wnd
or wnsd
will not serve a document
unless the index.cache
file listing it has the prescribed
uid or gid. This uid or gid should be that of the maintainer
not the user id under which wnd
or
wnsd
runs. Indeed, for security reasons if the server has
been started as root
and changed to another uid it will
refuse to use an index.cache
file whose owner is the uid
under which it is running. If on your server all
index.cache
files are created by a single user or a single
group I strongly recommend using the -t
or -T
option.
This added security is weakened somewhat if you use the -u
option which allows
index.cache
files owned by untrusted users, but only permits
them to grant access to files owned by the same user as the
index.cache
file. This option might be appropriate if you
permit users to have their own home page on your server. It would allow
users to serve documents which they own but no others. If both the -u
and the -t
argument are used the -u
takes effect except the
trusted user specified with the -t
option is exempt from
its restrictions. Notice that if neither the -t
or -u
argument is used then a
user with his own home page can make a symbolic link to any file readable
by the server and that document will be served! This is true even if the
linked to document is in a directory with limited
access or is outside the server data hierarchy.
When the server is run it must assume the permissions of some user on the
host. Which user is determined when you run the configure
program
or by defining "#define USER_ID
" in config.h
. It is important that
USER_ID
have as few permissions as possible. On many systems
there is a user called nobody
with minimal permissions. The
numeric user_id of nobody
is a good choice and is the
default choice of the WN configure program. Of course the server
must have read permission on all the files served but it should not have
write permission for any directory or file other than its log files. If
the UNIX syslogd(8)
system
utility for logging is enabled there is not even any need for write
permission on a log file. A good practice is to have all the files in
your hierarchy which you intend to serve be owned by the maintainer or
their creator. They should be world readable (assuming they are for
general consumption) but with restricted write permission. The files in
your hierarchy should not be owned by the user id under which
WN will run.
WN does not by default use the UNIX chroot(8)
system utility to further restrict the files which the server can access.
Doing so would enhance security at the expense of extra work for the
maintainer. The effect of this is to prevent the server from even
internally accessing any file which is not in your data directory. If
you are especially concerned about security you may wish to run one of
the public domain TCP wrappers, such as Wietse Venema's tcp_wrappers
(source code available at ftp://ftp.win.tue.nl/pub/security/tcp_wrappers_7.6.tar.gz
),
in conjunction with WN which will allow you to use the UNIX
chroot(8)
system utility. This can simultaneously enhance
security for other TCP services like the UNIX ftpd(8)
system utility.
Enabling the use of programs run on the server greatly enhances its
functionality but also increases the potential risk of an attack. Many
things which on other servers can only be done with CGI/1.1 programs are built-in
features of WN and hence entail much less risk than they would
as CGI/1.1 programs.
These include imagemaps, a variety of document searches, and serving conditional text based on information in the
client supplied headers. If your needs can be met with these features
then you can disable CGI/1.1 with the -e
option and greatly
improve your security.
However, there are many needs which can only be met by programs. The
greatest danger in their use is that even though the program is under the
control of the maintainer, the arguments passed to it can be set by a
potential attacker. WN supports the CGI/1.1 or "Common Gateway
Interface" protocol (see the chapter "Using CGI
Programs on the WN Server" in this guide) for executing
programs. Under this protocol there are three ways by which arguments
are passed to programs. The first of these is used when processing HTML forms which use the
GET
method. Under this method all arguments are put in
environment variables and the program must extract them from the
environment. Moreover, they have been placed in a URL
encoded format by the browser and must be decoded by the program. Thus
if the request is of type GET
, the arguments are examined to
see if they contain an '=
'. If they do, it is assumed that
this is a CGI/1.1 form
response (something like
"name=John&toppings=pepperoni
"). In this case the
program is executed with no arguments and the argument string is placed
in an environment variable where the program can read it. This is fairly
safe from the server point of view but the program writer must exercise
great care.
The second method is for HTML
forms using the POST
method. In this case everything posted
by the client (in URL-encoded
form) must be sent to the UNIX stdin(3)
stream of the CGI/1.1
program. Thus if the request is of type POST
, information
is read from the client and put in a temporary file on disk. Then the
program is executed with no arguments and its stdin(3)
comes
from this file. Security is the responsibility of the program writer. It
is not so dangerous to have arguments come from stdin(3)
but
the program writer must still exercise care.
Finally if the GET
request has arguments but no
'=
' it is assumed to be an ISINDEX
type request
and the program should be executed with the given arguments. While the
CGI/1.1 specification does
not permit the altering of arguments, it does say that if the arguments
pose any security problems it is permissible to put the string in an
environment variable and execute the program with no arguments, just as
in the CGI/1.1 forms case
described above. WN takes a very strict view on this subject
and considers any characters other than space and alphanumeric characters
as a security problem. Accordingly, if it finds any other character in
an argument it will put all arguments in the appropriate environmental
variable and run the program with no command line arguments.
Again let me say the program writer must exercise great care. I can't emphasize this too strongly. When you run a CGI/1.1 program the server almost completely absolves itself of security responsibility and dumps that responsibility on the program writer. Most authors of freely distributed CGI/1.1 programs are not fully cognizant of potential security holes they may open up. Running insecure programs created locally or obtained from Usenet postings is almost certainly the single greatest risk to a WN server site. To find out more about writing secure CGI/1.1 programs I strongly recommend that you read the relevant sections of the "WWW Security FAQ" maintained by Lincoln Stein and the "Safe CGI Programming" maintained by Paul Phillips.
Whenever untrusted users have accounts on a system there is risk involved. The objective of WN is to insure that running the server does not increase this risk. If the server is wisely managed, I believe this goal can be achieved. Here are some guidelines.
If it is possible make sure that no untrusted user has write access to
any part of your WN hierarchy. As mentioned above an attacker
with write access to your hierarchy can create an
index.cache
file which will give access to anything on your
server which is readable by the user id under which WN runs.
Even worse, she can create a shell program and a index.cache
file permitting it to be executed, so it can be executed with all the
permissions of that user id. A good rule of thumb is:
Note: Always assume that everyone with write access to any part of your data hierarchy has all the permissions of the user id under which your server runs!
This should not be true if you are using some of the command line options described above, but it is good practice to behave as if it were true.
Sometimes it is not possible or desirable to deny write access to your WN hierarchy. For example, you may need to allow all users to have a home page in their home directory or in some other designated place. There are two important things to do in this case.
The first of these is run the server with the -u
option. This has the
effect of requiring that every file served (including wrappers and includes) have the same owner as the
index.cache
file which grants it permission to be served.
This means that untrusted users can only serve files which they own.
This will prevent a user from serving the UNIX passwd(5)
configuration file typically in /etc
, but will not prevent
him from making his own copy of passwd(5)
and serving that.
If the -t
or -T
option is also used then
index.cache
files owned by the trusted user or trusted group
are exempt from this requirement and they may grant permission to serve
any file the server can read. For security reasons the server will
refuse to use an index.cache
file which is a symbolic link
to another file.
The -e
or -E
option mentioned above are also a good idea in
this case, to prevent any execution of programs or at least restrict
their execution to trusted index.cache
files.
You should note that when run in its default configuration there is no
way to use access files or password authentication to prevent
users on your system, who can create index.cache
files, from
gaining access to files you are serving. They can simply make a symbolic
link in their part of the hierarchy to the file you want to restrict and
a index.cache
file permitting it to be served. Since the
server has access to the restricted file it will serve it if it is listed
in a index.cache
file. This simple threat can be avoided by
using the -u
option
described above, but the number of potential threats is quite large. For
example, if the -e
or -E
option is not used a
hostile user could write a CGI/1.1 program which reads
the sensitive files and mails them to himself. In general I would
strongly advise against trying to have sensitive documents (protected by
password or .access
files) and
potentially hostile users on the same server. I would also strongly
advise against allowing potentially hostile CGI/1.1 programs, executed
includes or external modules. They can be disallowed through the use of
the -e
or -E
options. If they are
not disallowed a CGI/1.1
program can alter or destroy log files. A hostile authorization module
could collect user passwords.
The -u
and -E
options greatly enhance
security, but it is important to keep the following principle in mind.
You should assume that any permissions you grant to the user id under
which WN runs are also granted to every user who can create an
index.cache
file in your data hierarchy.
WN offers two methods of limiting access to your hierarchy or parts of it. See the chapter "Limiting Access to Your WN Hierarchy" in this guide for information on how to use these features.
These are useful for many purposes but I would not advise using them to protect extremely sensitive information. The first of these methods is restriction by hostname or IP address. It is not impossible to spoof a server with a fake IP address, but I think it is fairly difficult. It is easier to use a counterfeit hostname. For this reason I would suggest using IP addresses rather than host names in access control files.
The other method of limiting access is by password with the HTTP/1.1 Basic Authentication
scheme. This is about as secure as using passwords with the UNIX ftpd(8)
system utility to protect information. This scheme is flawed in that it
involves the transmission of essentially unencoded passwords over the
network. It is relatively easy for unscrupulous people to obtain
"sniffer" software which allows eavesdropping on all local network
traffic. This means, in particular, that it is possible to intercept
passwords of other users.
For security reasons when you use wnauth
or any "Authorization-Module=
"
you are required to use either the -t
or -T
option or the -a
or -A
option when the
server is run and to have the index.cache
file in the
protected directory owned by the trusted user or group. This is to guard
against counterfeit authentication modules.
This particular problem is remedied by the "Digest" authentication scheme. Digest authentication is supported experimentally by WN but has the rather severe drawback that no publicly available clients currently support it. It is experimental, because I have no client to test it and hence it has barely been tested. I believe it will be a standard part of HTTP/1.1 and at that time will significantly improve security of password protected directories.
The directive "Authorization-Realm=
",
used whenever an authentication module is used, is to notify the client
that for any document on this server with the same realm as this one, the
same password/username combination will be valid, so the client need not
ask the user for a username and password, but can reuse the one supplied
for the first document with this realm. For security reasons you should
always put your host and domain name in the realm. This may at least
discourage attempts at other sites to forge your realm in order to
collect user passwords. Your users should also be warned never to enter
their password if the realm displayed when they are prompted for a
password contains a different hostname than the one in the URL they are
trying to access.
Both Basic authentication and access control by IP address become much
more vulnerable if the potential attack comes from users who can create
index.cache
files for another part of your server's data
hierarchy. I would recommend against trying to use either to protect
information from users with home pages on your server.
If no potentially hostile users can create documents which can be served on your system the mechanisms described above provide protection adequate for many purposes. If I were an information provider selling access to a collection of information on my server, I would be comfortable using the numeric IP address to limit access to my paying customers. On the other hand I would not want any of these mechanisms used to protect my bank records.
This a list of possible ways you might configure your server by setting
values in config.h
and using
command line arguments. It assumes that you are running either
wnsd
or wnd
on the privileged port 80 and that
the default value of "#define USERID
" and
"#define GROUPID
" defined
in config.h
have not been
changed. This will mean that wnsd
will be started as
root
, but will almost immediately switch its privileges to
those of the unprivileged user nobody
. Likewise if
wnd
is running under the UNIX inetd(8)
system utility we assume that it is set to run with the privileges of
nobody
.
The following list of configurations is in decreasing order of security.
This strongest level of security is achieved by running either
wnsd
(or wnd
under the UNIX inetd(8)
system utility) with the -t
or -T
option and with the
-e
option and with
no other options. For the really paranoid uncommenting the "#define FORBID_CGI
"
line in the file config.h
and recompiling
removes the CGI/1.1
code from the binary.
With these options no CGI/1.1 programs or
filters or program output includes are permitted. Also the
POST
method is not accepted (an error is returned for a
POST
request). Furthermore only
index.cache
files owned by the user specified in the -t
option are used.
The server should be run as nobody
(the default) and the
numeric user id specified with -t
option should be the
maintainer's.
This is the the strongest level of security if you need the
functionality of CGI/1.1 programs or
filters or program output as server includes. This security
configuration does not allow any user home pages (unless the
maintainer produces the index.cache
file for them). To
use this level run wnsd
(or wnd
under
inetd(8)
) with the -t
or -T
option and no other
options. This places all control in the hands of a single maintainer
or a "maintainer group". No document or program output may be served
unless the maintainer has authorized it by explicit mention in one of
the index.cache
database files. The server will not
recognize any index.cache
file unless it is owned by the
maintainer specified with the -t
option or the group
specified with the -T
option. Only one of
-t
or -T
options can be used.
This permits users on the server host to have and control their own
home pages and documents, but with a number of limitations. They
will not be permitted to run CGI/1.1 programs, filters
or include programs. Also the server will require that every file
served (including wrappers and includes) have the same owner as the
index.cache
file which grants it permission to be
served. This means that users can only serve files which they own.
This is configuration is obtained by running with the -E
option and the -u
option. The -E
option is similar to
the -e
option except
that index.cache
files owned by a trusted user id or
trusted group id (set with the -t
or -T
option) are exempt
from the restrictions. The -u
option requires that
in order to be served a file must be owned by the owner of the
index.cache
file which lists it. Trusted users as
specified with -t
or
-T
options are
exempt from this restriction also.
One of the security problems encountered with another HTTP server
involved an attack by overflowing an internal buffer with data provided
by the the client in such a way that the (attacking) client could supply
code that the server executed. I have, to the best of my ability,
defended against this in WN code. All copying of data supplied
by the client and most copying of data read from the
index.cache
file is done by a function which I wrote and
which was designed precisely to deal with this threat. Excess data which
would overflow is discarded so buffers may contain truncated data, but
will not be overwritten.
Probably the most controversial security "feature" of WN is that
it greatly restricts the set of characters which can be used in file or
path names. Instead of trying to decide which characters are dangerous
and disallow them, WN has a list of characters presumed safe and
only allows them. The currently allowed characters are alphanumeric
characters and '_
', '-
', '.
',
'+
', '/
' and '%
'. The same
restrictions are applied to the PATH_INFO
part of URLs for
CGI/1.1 programs, except
that the character '=
' is also allowed. These restrictions
sometimes cause problems with CGI/1.1 programs that like to
include unusual characters in file names or PATH_INFO
.
Also the server will attempt to resolve all "../
" references
while staying in the server data hierarchy. If these references would
result in a request for a document outside the server data hierarchy the
request is treated like a request containing illegal path characters. In
particular with verbose logging turned
on, a message like "SECURITY Found bad character (%X hex) in
path
" is logged.
To defend against a "denial of service" attack the server will refuse a
POST
request with post data in excess of 10 megabytes. This
does not defend against multiple requests with large POST
data. The maximum allowed size of POST
data can be altered
by changing the value of MAX_POST_LEN
in the file
config.h
One of the design goals of WN is provide the maintainer with tools to create extensive navigational aids for the server. A variety of search mechanisms are available that provide this capability.
In response to the URL:
<http://host/dir/search=title>
the server will provide an HTML
form (automatically generated or prepared by the maintainer) asking for a
regular expression search term. When supplied the server will search the
index.cache
files in /dir
and designated
subdirectories for a items whose titles contain a match for the search
term. An HTML document with a
menu of these items is returned. Subdirectories are designated for
recursive searching by an entry in directory record of the index.wn
file like:
Subdirs=dir1,dir2,dir3
You can customize the message offered requesting a search term by
creating an HTML form whose
ACTION
is the URL
"http://host/dir/search=title
" and which uses the
GET
method to return the search term with
"NAME=query
".
The regular expressions recognized by the WN server are the same
as those of the UNIX grep(1)
utility (though this utility is not used as the server has its own
regular expression functions). The more general regular expressions used
for example in the UNIX egrep(1)
utility are not supported by WN.
Like title searches except matches are sought in
keywords instead of titles. Keywords for HTML documents are automatically
obtained from <META>
headers. For other documents (or
HTML documents) they can be
manually supplied in the index.wn
file. This is done
by including a line like:
Keywords=keyword1, keyword2, etc.
in the relevant document's record in the index.wn
file. The URL to use
to cause this search to be done is:
<http://host/dir/search=keyword>
Like the above keyword and title searches except the match can be either in the
keyword or the title. The URL to use as the ACTION
in a
form or simply to invoke the search is:
<http://host/dir/search=synopsis>
If a recursive title, keyword or fielded search is requested and some
directories have restricted access, only
those directories which have the same access file or the same password realm as the directory where
the search started will be searched. In fact, if an "Accessfile=
" directive
is used the path must be the same for both directories (and must
necessarily be of the form "Accessfile=~/dir/.access
" or
"Accessfile=/dir/.access
" rather than
"Accessfile=.access
").
The maintainer can supply up to 20 additional field values associated
with a document. These are used for searching purposes in the same way
that "Keywords=
"
are. This is intended to give some additional "keyword like" fields, for
example, document author or document id number. It works exactly like
keywords except these values are not extracted from HTML files, but must be created
with a line like:
Field3=any text here
in the index.wn
file. The
'3
' in this example can be replaced with any number from
0
to 19
. The URL to use as the
ACTION
in a form or simply to invoke the search in the
example above is:
<http://host/dir/search=field3>
Like keyword and title
searches the search term for a fielded search can be any UNIX grep(1)
utility like regular expression.
Unlike keyword, title and fielded searches this is a full text search of all
"text/*
" documents in one directory (not subdirectories).
These searches are also limited to the main files -- they will not find
matches in wrappers and included files. The returned HTML document contains a list of
titles of documents containing a match each with a sublist of the lines
from those documents containing the match. This provides one line of
context for the match. For HTML
documents selecting the matched expression in one of these takes you to
the document with your viewer focused on the matching location. The
primary intent of this feature is to provide full text searching for an
HTML "document" which might
consist of a substantial number of files. It is possible to customize
the text of the HTML response
containing the matches. This is done with a Searchwrapper directive.
The URL to use as the ACTION
in a form or simply to invoke
the search is:
<http://host/dir/search=context>
It is possible to mark HTML
documents with comments so that only part of them is searched. This is
done with lines consisting of the comment
"<!-- #search_off -->
" which turns off
searching until the line consisting of
"<!-- #search_on -->
" is encountered.
A grep search is just like a context search,
except that only a list of anchors pointing to files containing a match
is returned. There are no lines of context showing the match. To do a
grep search on the files in directory dir
use:
<http://host/dir/search=grep>
A line search is just like a context search,
except that only one list of all matching lines is returned, instead of
the matching lines being sublists of a list of files containing a match.
That is, all the items in sublists of a context search are concatenated
in one large list of lines containing matches. The matching items are
still anchors pointing to items in their respective files. To do a line
search on the files in directory dir
use:
<http://host/dir/search=line>
A file context search is just like a context
search, except limited to a single file. The file grep search
returns a text/html
document containing the lines in the
file matching matching the regular expression. These lines will be
converted to plain text and surrounded by <pre>
and
</pre>
tags. This is done because isolated tags or
partial tags taken from an HTML
document would be unlikely to function properly. It is likely that you
will want to use a Searchwrapper directive
with a file grep search.
The URL's to invoke these searches on file foo
are:
<http://host/dir/foo;search=context>
<http://host/dir/foo;search=grep>
The different searches described above are of two types: those that
search the index.cache
files like field, keyword and title searches and those that do full text searches of
multiple files, like context, grep, field grep and line searches. The first type recursively descends all
subdirectories listed in a "Subdirs=
" directive and
searches all the index.cache
files. The second only
searches files in the one directory specified in the search.
The reason for this is efficiency. context, grep, field grep and line searches are not a replacement for WAIS, glimpse, webglimpse or some other indexed search engine. The intent of these searches is to allow a full text search of a single conceptual HTML document that is made up of a number of files with links. These searches work great with something like the HTML4.0 specification (see http://hopf.math.northwestern.edu/html4/) which consists of a number of files of moderate size, but they would be very slow with 1000 files spread out in a data hierarchy. For that you really need an indexing search engine.
Another limitation of these searches is that they will not find matches in wrappers and included files.
On the other hand, all of the searching methods listed above are built
into the server and require no additional effort for the maintainer. You
don't need to produce or maintain an index. They are simply referenced
with URLs like "<http://host/dir/search=context>
"
where /dir
is any directory containing files to be served
and an index.cache
listing them. Of course search
permission can be denied for any directory or any file contained in that
directory.
The server will search an HTML
document looking for an unordered list of anchors linking to WWW objects.
The contents of each anchor will be searched for a match to the supplied
regular expression. The search returns an HTML document containing an
unordered list of those anchors with a match. This is quite useful when
combined with the wn_mkdigest
utility which
creates HTML documents to be
searched in this way from files with internal structure like mail or news
digests, mailing lists, etc.
The URL to invoke this search on file foo
:
<http://host/dir/foo;search=list>
Indexed searches can be supported in WN by auxiliary modules.
Two such modules, wnseven_m and
wnsectsearch, are provided as examples and maintainers may wish to
create others. To use such a module you should have a form action be
something like http://host/dir/search=index
.
Then in the index.wn
file
in the directory dir
you should have a line like:
Search-Module=/full/path/to/searchmod
The program searchmod
should read the environment variable
QUERY_STRING
and return a partial HTML document. The typical case
would be the program returns an unordered list of anchors to documents
containing a match to the query string. This list can be wrapped by
including a "Searchwrapper=
" directory
record. If it is not, then a default wrapper with text like "Here
are the matches for your search
" is supplied.
The different types of searches, (e.g. keyword, context, etc), are called the modes used by
the search. Normally the mode is set by adding, for example
"search=context
" to the end of a URL. However, if an HTML form is used to initiate the
search, it may be desirable to allow the mode to be selected by a form
variable. Thus an HTML form
like:
<form action="search=mode" method="GET"> Enter your search term <input name="query" size=15> <input type="submit" value="Search"> by <input type="radio" name="mode" value="title" checked> title or <input type="radio" name="mode" value="keyword"> keywords </form>
will execute either a title or a keyword search depending on whether the
user checks the radio button for "title
" or
"keyword
". The URL requested will end with
"search=mode
", but actually it could be anything since the
"mode=title
" (if that is what is checked) which will be in
the query part of the URL will override whatever follows
"search=
" in the base URL.
By default when a search is done an HTML file is created with text like
"Here are the matches for ...
". You may wish to customize
this response which is done with the "Searchwrapper=
"
directive in either a file record or directory record of the index.wn
file.
The line:
Searchwrapper=swrap.html
specifies that the HTML file
swrap.html
in the current directory should be used as a
wrapper for the output of all searches on this directory (if it is a
directory record entry) or file (if it is a file directive). This
wrapper differs from other wrappers in that it can have only a single
"<!-- #include -->
" line. An unordered list
of anchors to the matching items will be inserted at the location of this
line. You can, of course, insert the client supplied search term by use
of the line "<!-- #query -->
" in this file.
The remainder of this file can be anything you wish and is often an HTML form allowing subsequent searches.
If a search fails to find any matches then a default HTML response indicating this is
sent. This response can also be customized but only if a "Searchwrapper=
" is
also used. The line:
Nomatchsub=foo.html
specifies that the HTML file
foo.html
in the current directory should be used for the
output of all searches (title, keyword, etc) on this directory (or file if it is a
file directive) which return no matches. If "Nomatchsub=
" is used and
a "Searchwrapper=
" has
not been defined an error is logged and the nomatchsub
file
is ignored. The nomatchsub
file must be in the directory
being searched and its name must not contain a '/
'.
The WN server has powerful capabilities for modifying an HTML
file as it is served. One such feature, called "server-side includes",
is the ability to automatically insert one file in another. But the
usefulness of this capability is greatly enhanced by another feature, conditional text. A simple "if - else -
endif
" construct allows sections of a document to be served only
to certain hosts, or to clients providing an appropriate
Accept
header or a desired Referer
or
User-Agent
header. Using the "else
" construct
allows alternate text segments for clients not meeting the desired
criteria. For these features to work it is necessary for the server to
parse the file looking for the location of includes or for conditionally
served text. The fact that a file should be parsed in this way is
indicated in its entry in its index.wn
file. This is done with
an entry "Attributes=parse
"
in the file record. This line is not necessary if the file record lists
wrappers or includes
since it is assumed in that case. Only documents with MIME type
text/html
can be parsed in this way.
The simplest and most common use of parsed text is including additional
files in one which is being served or "wrapping" a served file with a
second file, i.e. inserting the requested file inside the second at a
desired location. The latter is useful, for example, if you wish to
place a standard message at the beginning or end (or both) of a large
collection of files. All files included or used as wrappers must be
listed in the index.cache
file. They are not listed in the
file in which they are to be included; only the location of the insertion
is marked.
To ensure security various options are available including the the
requirement that a served file and all its includes and wrappers have the
same owner as the index.cache
file listing them. This is
done with the -u
option.
Another important application of wrappers is to customize the HTML documents returned listing the successful search matches. If, for example, a directory is assigned a wrapper the server assumes that it contains all text describing the search and it merely supplies an unordered list of links to the matching items.
Suppose your server offers the file foo.html
and you wish to
have the file bar.html
automatically inserted at some point
into foo.html
. You achieve this by use of the "Includes=
" directive in
the index.wn
file in the
directory containing foo.html
. That index.wn
file should have an entry
like:
File=foo.html
Includes=bar.html
and the file foo.html
should contain either of the lines:
<!-- #include -->
or:
<?WN #include >
These two forms are equivalent as far as the server is concerned. The
second form is considered more SGML friendly by many as
<?WN something>
indicates a processing
instruction specific to WN rather than a comment. For
historical reasons this manual describes the other form, but either may
be used. With the first form the '#
' is required but with
the other you may use either:
<?WN #include>
or:
<?WN include>
Any of the "<!-- #something -->
" lines
described in this section may be replaced using the
"<?WN something>
" syntax. Also with this syntax
the case of the WN is not significant.
The marker "<!-- #include -->
" should be the
only thing on its line but may have white space before it. This is
controlled by the "Includes=
" line in the
index.wn
file. All
including of files by WN is done only for text files and only in
units of lines.
If the include marker is never found in foo.html
, then this
file is served and the file bar.html
, is appended at the
end. Thus the "Includes=
" directive can
be used to append a file without the need of any marker in the main file.
If you wanted to include two files in foo.html
, say
bar.html
and bah.html
you would place the
marker "<!-- #include -->
" at two places in
foo.html
and have an entry in your index.wn
file like:
File=foo.html
Includes=bar.html,bah.html
The two files will then be included at the marked spots in the order that
they are listed in the "Includes=
" directive,
i.e. bar.html
will be inserted at the first marker and
bah.html at the second. You can have as many included files as you wish.
They should all be listed on the "Includes=
" line separated
by commas.
Sometimes it is easier to specify the order in which multiple files are
to be inserted by adding the file name in the HTML document. This is also
possible with WN. However, since good security requires that
only files listed in the index.wn
file may be served, it is
still necessary to list files you want to include there.
So, another way to include two files in foo.html
, say,
bar
and bah
, is to place the markers
'<!-- #include "bar" -->
' and
'<!-- #include "bah" -->
' at the
appropriate places in foo.html
and have an entry in your
index.wn
file like:
File=foo.html
List-Includes=bar,bah
The order of the files listed in the directive is not significant. The
example above grants permission for the inclusion of the three files
listed. It does not require their insertion. Note the difference
between the two mechanisms: with the "Includes=
" directive only
"<!-- #include -->
" is needed in the HTML and the file inserted is taken
(in order) from the list in the directive, while with the "List-Includes=
"
directive the HTML must contain
'<!-- #include "bah" -->
' so the server
knows to insert file "bah
" and the file list in the
directive is checked to see that permission to serve this file has been
granted. If you use the "List-Includes=
"
directive you cannot also use either the "Includes=
" or "Wrappers=
" directives --
the "List-Includes=
" and
"Includes=
"/"Wrappers=
" mechanisms are
mutually exclusive. It is fine to use the line
'<!-- #include "bar.html" -->
' to
insert a file with the "Includes=
" directive, but
with that directive the "bar.html
" is only a comment and has
no effect on which file is actually inserted.
There is a "Default-List-Includes=
"
directory directive which can be used to grant permission for any HTML file in the directory to
include one or more of the listed files. This also causes all HTML files in the directory to
acquire the "Attributes=parse
".
If a file has a filter only that file will be filtered, not any wrappers or includes.
Instead of a file it is possible to include the output of a program. To
do this the program is listed in the "Includes=
" directive but
its name is preceded with the '!
' character. For example:
File=foo.html
Includes=!/usr/bin/date,bar.html
will insert the time and date at the first
"<!-- #include -->
" and the contents of
bar.html
at the second. "Includes=
" and "List-includes=
"
should not list arguments to programs. Thus if you wished to use the
output of the UNIX date
utility to get UTC instead of local time you should create a program
named "mydate
" containing, for example:
#!/bin/sh
exec /usr/bin/date -u
and use the directive "Includes=!mydate
".
This ability to serve the output of programs can be restricted in several
ways. If WN is invoked with the -e
option then no includes,
filters, or CGI
programs will be executed.
The -E
option in
conjunction with the -t
or -T
options restricts
the execution of programs to those listed in index.cache
files owned by trusted users or groups. The -u
option allows the
execution of programs or inclusion of files owned by the owner of the
index.cache
file which lists them. If the -E
and -u
options are used
together the -E
option
takes precedence.
If you wish to have all the standard CGI/1.1 environment variables made available to the executed include program you can do so by adding the line:
Attributes=cgi
to the file record. A list of these environment variables can be found in the appendix "CGI and other Environment Variables for the WN Server" in this guide. Also see the sample CGI program which is located in the file /docs/examples/sample.cgi which accompanies the WN distribution.
Often a server maintainer may wish to serve different versions of a document to different clients. Here is a simple example of how this can be handled with WN.
<!-- #if accept =~ "image/jpeg" -->
<a href="picture.jpg">
Here is the jpeg version of the picture:
</a>
<!-- #else -->
<a href="picture.png">
Here is the png version of the picture:
</a>
<!-- #endif -->
This tells the server to look at the Accept: headers provided by the
client and if "image/jpeg
" is among them then use the
jpeg
image and otherwise use the png
image.
More precisely, the "image/jpeg
" part of the
"if
" line is a UNIX grep(1)
utility like regular expression and if there is any match for it among
the Accept
headers the jpeg
image will be used.
Of course these "if - else - endif
" constructs can be
nested. A similar construct allows you to make the text served depend on
matching a regular expression with the contents of the
User-Agent
header, the Referer
header, the
Cookie
header, the client hostname, or the client IP
address. A complete list of possible test clauses for the
#if
statement is contained in appendix "Parsed Document Syntax for the WN
Server" in this guide.
Note: For this to work the file containing the conditional text needs to be parsed by the server. The server only knows to do this if the file record in theindex.wn
file contains a line "Attributes=parse
".
If, in an "if clause", instead of the equal-tilde string
'=~
' (to indicate a regular expression match) the character
string '!~
' is used then the truth value of the match is
reversed. For example the lines:
<!-- #if referer !~ "my.host.edu" -->
Here is some text for newcomers to my site.
<!-- #endif -->
would display the "Here is some text ...
" to those clients
accessing this document via any link which is not on the host
my.host.edu
.
There is also a redirect command which can conditionally cause the server to send an HTTP/1.1 redirect to a new URL. For example if the text:
<!-- #if hostname =~ "\.uk$" -->
<!-- #redirect = "UK_mirror_url" -->
<!-- #endif -->
is included at the beginning of an HTML document then any request from
a UK host will automatically be redirected to the specified URL, the
UK_mirror_url
in this case. This mechanism could also be
used to redirect text only browsers to a text only alternative page, etc.
There must be no text sent before the
'<!-- #redirect = "url" -->
' is
encountered (not even blank lines) since the server cannot send an HTTP/1.1 redirect while in the
middle of transmitting a document.
Normally the URL in the
'<!-- #redirect = "URL" -->
' line
is fully qualified, like "http://host/path/foo
". However,
it can also be simply "foo
" referring to a file in the same
directory as the file being parsed. In this case an HTTP/1.1 redirection is not
sent, and instead the file "foo
" is returned immediately to
the client.
Often a single regular expression is not adequate to distinguish whether
or not to serve some text. For that reason WN allows you to use
a file containing any number of regular expressions and serve a document
based on whether any of these expressions match accept headers, referer
header, user-agent, hostname, etc. For example, if the file
"acceptfile
" contains the lines:
image/png
image/jpeg
image/x-xbitmap
then the following conditional text might be appropriate:
<!-- #if accept file = "acceptfile" -->
I see you aren't using a text only browser...
<!-- #endif -->
The format of the file "acceptfile
" is one UNIX grep(1)
utility like regular expression per line. Lines beginning with
'#
' are taken to be comments. If a regular expression is
preceded by the character '!
' then that character is skipped
but the truth value of the match is reversed. More information about
files of regular expressions for conditional text can be found in
appendix "Parsed Document Syntax for the
WN Server" in this guide.
Another method of using conditional text is with a normal WN access control file. For example:
<!-- #if accessfile="secret/access" -->
Here is a link to a restricted directory.
<a href="/secret/stuff.html">Restricted Local Stuff</a>
Hosts not listed in the file "secret/access" can't look at it, so why
show them a link to it?
<!-- #endif -->
will display the included text and the link to
"Restricted Local Stuff
" only to clients on hosts
permitted by the access control file
"secret/access
".
The condition in the "if
" or "elif
" tags can be
made more complex than those described above by combining simple
conditions using the logical operations '&&
' for
'and', '||
' for 'or' and '!
' for 'not'.
Parentheses may be used for grouping. For example:
<!-- #if hostname =~ "\.uk$" && accessfile="secret/access" -->
Something for those sites in accessfile but not in the UK.
<!-- #endif -->
This is described in greater detail in the section "Matching Regular Expressions:
#if
and #elif
" in this guide.
section
MarkerOften you may want to include HTML files in a larger document. For example, a single logical HTML document often consists of a number of separate files. This can make it very difficult for a client to download and print the entire document. To alleviate this problem you could form a single document concatenating all the pieces, but that can be problematic when the document is altered or updated as it is necessary to change both the piece and the concatenation.
WN offers a solution to this by allowing you to write a small
skeleton document which includes all the pieces, forming a "virtual"
document which is the concatenation. But if we just used the mechanism
above the concatenation document would include the
<head>
elements of all the pieces. To remedy this you
use the marker:
<!-- #section -->
instead of "<!-- #include -->
" and the
server will include only the portion of the HTML document between the special
comments "<!-- #start -->
" and
"<!-- #end -->
" inserted in that document.
This requires that these starting and ending comments occur in the HTML document on lines by
themselves.
Suppose you have a large number of files and want a standard header or
footer placed on all of them (perhaps a standard disclaimer). You could,
of course, make many copies of the standard header with a different
"Includes=
"
directive for each, but this is cumbersome. It has other drawbacks as
well, such as the fact that grep and context searches do not search associated
include files, only the main file.
To deal with this case WN uses wrappers. Wrapping a
file is the inverse of including it. If you have an index.wn
file entry like:
File=foo.html
Wrappers=bar.html
then the server will send the file bar.html
looking for the
marker "<!-- #include -->
" and inserting
foo.html
at the line where it is found. So this is just
like the "Includes=
" directive
except the role of which file is included in the other is reversed. If
the include marker is never found the entire wrapper,
bar.html
, is sent first and the main file,
foo.html
, is appended at the end. Thus the "Wrappers=
" directive can
be used to prepend a file without the need of any marker in the main
file.
A search wrapper is a special kind of wrapper. Its function is to allow you to customize the results of user searches. It is normally associated with an entire directory in which case it applies to all title, keyword, field, grep and context searches of that directory. See the chapter "Setting Up Searches on the WN Server" in this guide.
A line like:
SearchWrapper=foo.html
in the directory record of an index.wn
file will cause any search
of that directory to return an unordered list of matches wrapped with the
file foo.html
. The list of matches will be inserted into
foo.html
at a point where the marker
"<!-- #include -->
" is found. You can also
insert the user supplied search term by using the marker
"<!-- #query -->
". Both of these markers
must occur on a line by themselves.
Wrapping and including can be arbitrarily nested (though this is not true of search wrappers). It can be a little complicated to get the effect you want.
Here is how to do it. Think of all your files, wrappers, includes and
the main file arranged as you wish them to be combined for the final
served document. Now imagine inserting an opening (or left) parenthesis
at the beginning of each file and insert a closing (or right) parenthesis
at the end of each file. You should have a legally nested and balanced
collection of parentheses. To each of the opening parentheses attach the
name of the file which begins at that point. Then write down the list of
all the file names in the order their corresponding opening parentheses
occur. All the files which come before the main file should be wrappers
and should be listed in the "Wrappers=
" line in the
order in which they occur in this list. All the files after the main
file should be in the "Includes=
" line and should
occur in the order they occur in this list.
Here's a simple example. Suppose we have a main file M
and
other files A
, B
, C
, and
D
which we want to have nested like:
(D...(B...B)...(M...(A...(C...C)...A)...M)...D)
Then the entry in the index.wn
file should look like:
File=M
Wrappers=D,B
Includes=A,C
Often it is useful to insert items like the main document's title into a wrapper, or the search item provided by the client into a searchwrapper for the responses. This is possible with WN parsed documents using the lines:
<!-- #title -->
or:
<!-- #query -->
in the file at the point where the title or search term should be inserted. Both of these markers must occur on a line by themselves.
In addition to the title and query user supplied fields can be inserted in documents. This is done by including a marker such as:
<!-- #field3 -->
in the file at the point where the value should be inserted.
You can also insert the value of any environment variable into your text with a line like:
<!-- #environ = "HTTP_REFERER" -->
which will be replaced by the contents of the environment variable
HTTP_REFERER
.
Note: If there is no "Wrappers=
" or "Includes=
" line in theindex.wn
file for this entry then there must be a line like:
Attributes=parse
so the server knows it is to parse the file to look for the marker. See
the section "Set File
Attributes" in this guide for more details. Also if you wish to
insert the value of a CGI/1.1 environment variable
(as in the example above) you must have an "Attributes=parse
"
line.
Strictly speaking this feature has nothing to do with parsed
documents, but it fits here as well as anywhere. If a URL has
the parameter meta
appended (with a semi-colon
separator), as in
http://host/path/foo;meta
for example, then it will return meta data about the resource
rather than the resource itself.
Here is the meta data returned
by this document when the meta
parameter is used.
The maintainer may associate an arbitrary field value to any document in the WN hierarchy. It is then possible to do fielded searches.
A line like:
Field3=any text here
in the index.wn
file
associates the text "any text here
" with user
defined Field3
. The '3
' in this example can be
replaced with any number from 0 to 19. A fielded search is invoked by a
URL like:
http://host/dir/search=field3
In addition to fielded searches the user supplied fields can be inserted in documents. This is done by including a field marker such as:
<!-- #field3 -->
in the file at the point where the value should be inserted. If there is
no Wrappers=
or
Includes=
line in
the index.wn
file for this
entry then there must be a line like:
Attributes=parse
so the server knows it is to parse the file to look for the marker. See the section "Set File Attributes" in this guide for more details.
An arbitrary "filter" can be assigned to any file to be served. A filter
is a program which reads the file and has the program output served
rather than the content of the file. One common use of this feature is
for on-the-fly decompression. For, example a file can be stored in its
compressed form and assigned a filter like the UNIX zcat(1)
utility which uncompresses it. Then the client is served the
uncompressed file but only the compressed version is stored on disk. As
another example, you might use "nroff -man
" as a filter to
process UNIX man(1)
utility for formatting man pages before serving. There are many other
interesting uses of filters. Be creative!
A filter is assigned to a file by use of the "Filter=
" directive in the
file's record in its index.cache
file. For example, the
lines:
File=foo.gz
Content-type=text/plain
Filter=/usr/local/bin/zcat
Content-encoding=none
cause the compressed file foo.gz
to be uncompressed on the
fly and served to the client as a text/plain
document.
Notice that it is necessary to have the content-encoding line to override
the default action of wndex
which is to infer
from the ".gz
" suffix that the content-encoding is
x-gzip
. If the compressed file were named simply
"foo
" then the content-encoding line would be unnecessary.
The "Filter=
"
directive takes the value of a path to a file in three different forms.
If the path begins with a '/
' then it is relative to the
system root. If it begins with '~/
' then it is relative to
the WN hierarchy root, and otherwise it is relative to the
directory containing the index.wn
file.
If a file has a filter only that file will be filtered, not any wrappers or includes.
The ability to filter files can be restricted in several ways. If
WN is invoked with the -e
option then no includes,
filters, or CGI/1.1
programs will be executed.
The -E
option in
conjunction with the -t
or -T
option restricts
the use of filters to those listed in index.cache
files
owned by trusted users or groups. The -u
option allows only the
use of filters owned by the owner of the index.cache
file
which lists them.
The directory directive:
Default-Filter=/path2/filter
specifies that files in this directory should all be treated as if the
"Filter=
" file
directive had been set to /path2/filter
. To override this
setting and specify no filter use the "Filter=<none>
"
directive.
The filter mechanism built into WN has most of the functionality of the Common Gateway Interface (CGI) standard and has some features CGI/1.1 does not. Filters are also in some ways more secure. This makes them a good choice for applications where compatibility with the CGI/1.1 standard is not required.
The first thing to note is that there is no requirement that the filter program actually make use of the file being filtered. This file must exist though.
Thus if an empty file "foo
" is created and has an index.wn
file entry like:
File=foo
Content-type=text/html
Filter=someprogram
Attributes=parse,cgi
then the output of the program "someprogram
" will be served.
A program used in this way differs somewhat from a CGI/1.1 program in that no
headers should be supplied by the program as WN will
automatically provide them. For example, while a CGI/1.1 program typically
starts with printing "Content-type: text/html
" followed by a
blank line, this should not be done for "someprogram
" in the
index.wn
entry above, because WN will automatically
provide the appropriate HTTP/1.1 headers based on the
"Content-type=text/html
"
line in the index.wn
file.
An important difference between filters and CGI/1.1 programs is that the
output of filters can be parsed while CGI/1.1 output cannot. The
fact that you want the output parsed must be signalled by the use of an
"Attributes=parse
"
line in the directory's index.wn
file.
If you wish to have all the standard CGI/1.1 environment variables made available to the filter program you can do so by adding the line:
Attributes=cgi
to the file record. A list of these environment variables can be found
in appendix "CGI and other Environment Variables
for the WN Server" in this guide. Also see the sample CGI program which is located in the
file /docs/examples/sample.cgi
which accompanies the
WN distribution.
One difference between CGI/1.1 programs and filters
is that with filters there is no way to have a non-empty PATH_INFO
environment variable since anything appended to the path part of the URL
will be interpreted as a path to an actual file. Of course the "query"
part of a URL (everything after a '?
') will work for filters
as well as CGI/1.1
programs and its contents will be put in the QUERY_STRING
environment variable.
Another difference between CGI/1.1 and filter programs
is in the handling of POST
or PUT
data. A CGI/1.1 program reads the
data provided by the client on its UNIX stdin(3)
stream. This is not possible for a filter since its stdin(3)
is attached to the file it is supposedly filtering. To use the
PUT
or POST
method with a filtered file, the
"Attributes=post
"
directive must be used since otherwise the server will not permit a
POST
or PUT
. It is then possible to read the
POST
ed data by opening and reading the temporary file
containing this data. The name of this file changes with each request,
but if "Attributes=cgi
" is
used then the name is given in the WN specific CGI environment
variable HTTP_POST_FILE
or HTTP_PUT_FILE
depending on the method used to submit the data.
One advantage of using a filter instead of a CGI/1.1 program is that it
may have slightly better security. With a filter the name of the
executed program is never visible outside the server. It is not in any
URL and it is not in any served file. Perhaps a more important feature
is that no arguments can be supplied to a filter except those listed in
the index.wn
file filter
entry. Unlike CGI/1.1
programs, it is not possible for a remote user to supply any arguments
whatsoever to the program.
If the server receives a request like:
http://host/dir/foo;lines=20-30
and foo
is any document with content-type
text/*
it will return a text/plain
document
consisting of lines 20
through 30
of file
foo
. This is very useful for structured text files like
address lists or digests of mail and news. A WN utility called
wn_mkdigest
will
produce an HTML file with a list
of links to separate sections (line ranges) of the structured file. It
is described below.
The option ";lines=[first]-[last]
" can be appended to the
URL of any text file served by the WN server. It has the effect
of causing the server to return line [first]
through line
[last]
of the document to which it is attached. Line 1 is
the first line of a file. This parameter is ignored and an error is
logged if it is used with a binary file. When used with a text file the
content type of the returned document is set to text/plain
since, for example, a range of lines from an HTML document is unlikely to be a
valid HTML document.
For any file, including binaries, but not CGI/1.1 output or parsed or filtered documents, the server will honor a byte range. A request for:
http://host/dir/foo;bytes=256-1024
will cause the server to return bytes 256
to
1024
inclusive. The first byte is numbered 0 so a range of
0-100 represents the first 101 bytes. Note that while file lines are
numbered starting with 1, bytes are numbered starting with 0. The
content type of the returned document will be text/plain
if
the document foo
is a text document and
application/octet-stream
otherwise.
A request for:
http://host/dir/foo;bytes=256-
will cause the server to return bytes 256
through the end of
the file, while:
http://host/dir/foo;bytes=-1024
requests the last 1024
bytes of the file.
Range:
" Header
Requests for a specific range can also be made with the proposed HTTP/1.1 header
"Range:
". For example, if the server receives a request
with the header:
Range: bytes=100-200
it will send the 101 bytes starting with the 100th and ending with the
200th. The header "Range: bytes=300-
" requests all
bytes starting with number 300 and continuing until the end of the file.
And "Range: bytes=-300
" requests the last 300 bytes of
the file. This is normally done automatically by the client in
appropriate circumstances and is not under the control of the server
administrator.
wn_mkdigest
Utility
The wn_mkdigest
utility is executed with two regular expressions: one to match the
section separator and the other to match the section title. For example,
for a mail digest named foo
, the command:
wn_mkdigest foo "^From" "^Subject:" foo
will produce an HTML file called
foo.index.html
which contains an unordered list of anchors.
Each has as a title the content of the Subject line of a mail message in
foo
and href
the range of lines in
foo
containing that particular message. More detailed
information is available in the section "wn_mkdigest
" in this
guide.
There are two ways to limit access to your hierarchy. You can restrict access by hostname or IP address and you can restrict access to users whose name and password are in a file on your server (authentication). You can, of course, do both. To restrict access to an entire hierarchy you must restrict access to each of its subdirectories.
Warning: If access to a directory is restricted by either of the ways described here the restrictions affect only that one directory and not its subdirectories.
If you have opted to limit access to your server in this way you do so by
setting the value of the Accessfile=
in the index.wn
file for a directory.
In the directory directive part of an
index.wn
file, a line like:
Accessfile=~/dir/.access
specifies that the the access control file
wnroot/dir/.access
contains restrictions on what sites are
allowed to access this directory. The Accessfile=
directive
takes the value of a path to a file in different forms. If the path
begins with a '/
' or with '~/
' then it is
relative to the WN hierarchy root, and otherwise it is relative
to the directory containing the index.wn
file in which the
directive occurs. In particular the access file must be located within
your WN hierarchy.
Warning: If theAttributes=serveall
directive is used in a directory with restricted access be sure the access file is not serveable. You can do this by giving it a name starting with '.
' or ending with '~
', or better, put it in a directory from which nothing is served.
Also note that limiting access to this directory does not limit access to
subdirectories. The Accessfile=
line must
occur in the index.wn
file
of each directory you want restricted. Of course, they can all refer to
the same file. To use the same file for several directories be sure to
use the "Accessfile=~/dir/.access
"
form of the directive so the line can be the same for every index.wn
file.
This will limit access to the server to those clients with an IP address
or subnet address listed (and not excluded) in the file
.access
listed in the Accessfile=
directive.
If a recursive title search or keyword search is requested and some
directories have restricted access only those directories which have the
same access file as the directory where the search started will be
searched. In fact the path must be the same in the Accessfile=
directive
for both directories (and must necessarily be of the form "Accessfile=~/dir/.access
"
or "Accessfile=/dir/.access
"
rather than "Accessfile=.access
").
There are three possible formats for lines in the access file. First you
may list the domain names of the machines using wild cards provided the
machines all have proper PTR
DNS resource record. For
example the line:
dogbert.widget.com
allows access to one host. To allow access to all machines in the
widget.com
domain, use the line:
*.widget.com
Note that this will not allow access to a machine called
widget.com
if it exists. One would need to add in the line
widget.com
to allow it access.
You can also allow access by IP address and, in general, this is somewhat
more secure than using the hostnames. There are two line formats for IP
addresses. The first is to explicitly list an IP address like
129.111.222.123
or a subnet address like
129.111.222.
or 129.111.
. In case a subnet
address is listed it must end with a period like:
129.111.222.
or
132.123.
but complete IP addresses like 129.111.222.123
should not
end with a period. If a subnet address is listed any client with an IP
address beginning with that subnet address will be allowed access.
The second format for IP address restriction uses a net address, net mask
pair with the two parts separated by a '/
'. For example:
129.111.222.0/255.255.255.0
The presence of the '/
' indicates to the server that this
format is being used. The part before the '/
' is the "net
address" and the part after is the "net mask". The server will then take
the IP address of the remote client, do a logical "and" of each of its
four parts with the corresponding four parts of the net mask
(255.255.255.0
in this example) and check that the four
results agree with the four parts of the net address
(129.111.222.0
). So the access file line above will match
(and allow access to) precisely those machines with IP address of the
form 129.111.222.x
because the 'x
" part is
"anded" with 0
and hence becomes 0
, while the
first three parts are "anded" with 255
and hence unchanged,
so they must equal 129
, 111
, and
222
respectively.
Note that if you have #define NO_DNS_HOSTNAMES
in the config.h
file you
must use one of the IP address formats above and not the format using a
domain name. This is because #define NO_DNS_HOSTNAMES
causes WN never to convert IP addresses to hostnames.
You can also exclude IP addresses or domain names by prefixing them with
an '!
', so if the access file contained only the lines:
!speedy.acns.nwu.edu
*
Access would be permitted to every machine except speedy (the
*
matches, and allows access to, anything). Likewise:
!129.111.
!129.222.0.0/255.255.0.0
*
would allow access to everyone except those on subnet
129.111
or on subnet 129.222
. In general
prefixing a line (in any of the three formats) with '!
'
causes immediate denial of access to any matching host. The first
matching line (with or without leading '!
') for a host is
the one which takes effect. Once a match is found access will be granted
(or denied if a '!
' is present) and no subsequent lines in
the access file will be considered.
A line in an access file cannot exceed 255 characters in length and every
line must end with a newline (some editors don't guarantee this and the
last line of a file may not have a newline). A blank line at the end is
fine. If these conditions are not met an error of type "Access
file line overflow
" will be generated.
You may also designate "privileged sites" in your access files. If you
list a site in an access file with a '+
' prefix like:
+hopf.math.northwestern.edu
+123.123.123.1
+111.111.111.0/255.255.255.0
then requests from that site will be exempt from any password requirements (as described below). In other words, no username/password pair will be required for requests from these sites, even if they are required from other sites.
Obviously the '+
' and '!
' prefixes for access
file lines are mutually exclusive.
It is possible to specify a URL referring to a customized document intended as an error message when access is denied. The easiest way to do this is to place the line:
Access-denied-URL=http://host/dir/foo.html
or the line:
Access-denied-URL=/dir/foo.html
at the beginning of the access file. When this is done and a request is
denied because of failure to meet the restrictions in that access file,
the browser will be redirected to the URL
"http://host/dir/foo.html
" or "/dir/foo.html
".
Access-denied-URL=
is also a legal directory directive
which may be placed in an index.wn
file.
You can also maintain a password file (or files) on your system and restrict access to those users who can supply a valid user name and password. This is the so-called "Basic" authentication described in the HTTP/1.1 protocol.
Warning: I would strongly advise against using basic authentication described here to protect sensitive information on a server which runs on system on which untrusted users have accounts.
Notice that if none of the options -t
, -T
and -u
are used then a user
with his own home page can make a symbolic link to any file readable by
the server and that document will be served. This is true even if the
linked to document is in a password protected directory with limited
access or is outside the server data hierarchy.
The use of basic authentication with WN involves three additional
programs which can be found in the /bin
directory of the
distribution. The first of these is wn_mkpasswd
which is a
perl utility for creating and altering
password files. It should be run the first time with the command:
wn_mkpasswd -n filename
This prompts you for a realm, a username and password and then creates a
password file called "filename
" with that entry. On subsequent
uses the -n
argument should be omitted so that entries will be
added to the existing file instead of starting a new one (the
-n
is for "new"). If a subsequent entry is made with the
same user name the entry for that user will be replaced. If the
"filename
" argument is omitted then the default name of
wnpasswd
is used. There is another optional argument which
may be used with this program. The command:
wn_mkpasswd -D filename
causes a UNIX NDBM
database to be created or used instead of
a simple flat file. This is may be useful if you have a very large
number of password entries. Depending on your system, the database may
reside in the two files filename.dir
and
filename.pag
, or in a single file filename.db
.
The -n
option has no effect when combined with the
-D
option. To create a new database you must remove or
rename the .pag
and .dir
or .db
files. To remove a single entry from a password file use the command
"wn_mkpasswd -d filename
" or "wn_mkpasswd -D
filename
" for an NDBM
database.
The second is the perl script wn_md5passwd
which is
functionally the equivalent of wn_mkpasswd
but which
uses an MD5 hash of the password, the realm, and random data to encode
the password. The realm and random data are created when
wn_md5passwd
is run with the -n
option.
This is generally preferable as it is more secure and in particular passwords
with more than eight characters are significant. The options are
the same as those of wn_mkpasswd.
Note: To enable theNDBM
features ofwnauth
you will have to uncomment the lines inwnauth/Makefile
starting with#DBMFLAG
and#DBMLIB
and recompile thewnauth
program by running the UNIXmake(1)
utility in the/wnauth
directory.
Once you have created your password file and made sure that it is
readable by the user id under which the server will run, you are ready to
set up the WN authentication module, called wnauth
. This is done
on a per directory basis by three entries in directory record of the index.wn
file. Entries like:
Authorization-realm=myrealm@host.domain
Authorization-module=~/cgi-bin/wnauth "~/dir/wnpasswd"
Authorization-type=basic
in the directory record specify that the authentication module wnauth
is being used to
check user's passwords and that it should consult the password file
"wnpasswd
" in wnroot/dir/
. If instead of the
password file "wnpasswd
" you are using a NDBM
database "wnpasswd.dir
" and "wnpasswd.pag
"
created with "wn_mkpasswd -D
" as described above (or created
some other way), then you should use the line:
Authorization-module=~/cgi-bin/wnauth -D "~/dir/wnpasswd"
The password file can also be specified with the -P
option
as in:
Authorization-module=~/cgi-bin/wnauth -P wnpasswd
The name of the password file can be given in three different formats:
beginning with a '/
meaning it is relative to the system
root, beginning with '~/
' indicating it is relative to the
WN hierarchy root, or something else indicating it is relative
to the directory containing this index.wn
file. If you use the
'~/...
' form it is a good idea to put the file name in
double quotes as shown above to prevent the shell from trying to
interpret the '~
'.
A final option for wnauth
is the -l filname
option
as in:
Authorization-module=~/cgi-bin/wnauth -P wnpasswd -l /path2/logfile
With this option the wnauth
program will do its own logging
in the file logfile
, listing all transactions and any error
conditions. This is especially valuable for diagnosing problems in
authentication.
Warning: If theAttributes=serveall
directory directive is used in a directory with access restricted by password, be sure the password file is not serveable. You can do this by giving it a name starting with '.
' or ending with '~
', or better, put it in a directory from which nothing is served.
Note that if you designate a privileged site in your access control file then any users from that site will not be requested to supply a user name and password.
For security reasons when you use wnauth
or any Authorization-Module=
you are required to use either the -t
or -T
option or the -a
or -A
option when the
server is run and to have the index.cache
file in the
protected directory owned by the trusted user or group. This is to guard
against counterfeit authentication modules. Note that the four command
line arguments -a
, -A
, -t
and -T
all take a numeric
argument. Thus the command should be
"./wnsd -t 203
" and not
"./wnsd -t joe
" if user joe
has user
id 203
.
The Authorization-Realm=
line is to notify the client that for any document on this server with
the same realm as this one, the same username/password combination will
be valid, so the client need not ask the user for a username and
password, but can reuse the one supplied for the first document with this
realm. For security reasons it is a good idea to put your host and
domain name in the realm. This may at least discourage attempts at other
sites to forge your realm in order to collect user passwords. Your users
should also be warned never to enter their password if the realm
displayed when they are prompted for a password contains a different
hostname than the one in the URL they are trying to access.
If you use different realms on the same server you should be aware that popular browsers are somewhat cavalier in their treatment of realms. In particular once a username/password pair has been accepted a browser might well continue to use it on the same site without checking the realm until authentication fails. This practice of trying to guess the username/password is more efficient if the guess is correct and most of the time it is.
Also note that password protecting a directory does not protect its
subdirectories. The three "Authorization
" lines must
occur in the index.wn
file
of each directory you want to protect. Of course, these lines can all be
identical for different directories if you use the:
Authorization-module=~/cgi-bin/wnauth ~/dir/wnpasswd
form to specify locations relative to your WN root.
There is also support for a "group
" file with
authentication. This feature is invoked by using the -g
and
-G
options with the wnauth
authentication
module. The line:
Authorization-module=wnauth -g grpname -G foo -P wnpasswd
means to use the group name "grpname
" and the group file
"foo
". The group file is a file in the format of a UNIX group(5)
configuration file. That is, it has lines of the form:
grpname:*:99:user1,user3,user5
where the fields are separated by colons, the first field is a group
name, and the fourth field is a comma separated list of user names. wnauth
will ignore the
second and third fields. If the line above is in the file
foo
and wnauth
is invoked as
above then a user will be granted access provided the supplied password
matches that in the wnpasswd
file and the user's username is
in the list after the second ':
' in the line starting with
the group name. Thus, in this example users user1
,
user3
, and user5
will be given access if they
provide valid passwords and other users will not.
The format of a group file used by Apache is also supported. This format has lines of the form:
grpname: user1 user3 user5
which is the group name, a single colon and a space separated list of user names.
It is possible to specify a custom error message to be sent when password authentication fails because of an incorrect password or username as in:
Auth-denied-file=~/dir/foo.html
This specifies that any request for a document in this directory which is
denied because of an authorization module restriction results in the file
~/dir/foo.html
being sent instead. A default value for all
directories can be set by uncommenting the #define AUTH_DENIED_FILE
"
line in config.h
and
recompiling. Note that this is not a URL but the name of a file whose
content is to be sent as error text when authentication is denied. If
the file name starts with '~/
' as above it is assumed to be
relative to the WN root directory. Otherwise it is assumed to
be a path relative to the directory containing the index.wn
file.
The "Basic" authentication scheme is flawed in that it involves the transmission of essentially unencoded passwords over the network. It is relatively easy for unscrupulous people to obtain "sniffer" software which allows eavesdropping on all local network traffic. This means, in particular, that it is possible to intercept passwords.
This particular problem is remedied by the HTTP/1.1 Digest Authentication scheme. Digest authentication is supported experimentally by WN, but has the rather severe drawback that no publicly available clients currently support it. It is experimental, because I have no client to test it and hence it has barely been tested.
The WN server allows you to give users on the server their own private hierarchy for their home page and accompanying documents. The URL for a document in such a hierarchy contains a special string (which the maintainer may choose) to indicate that an alternate hierarchy is being used. There are three more or less mutually exclusive methods in WN to do this. One uses user names and consults the system password file to calculate a "home page" directory in the appropriate user's home directory. The second uses a table lookup to change the server root directory to one specified by the maintainer. Both of these set the data hierarchy based on information in the URL. A third method can be used on systems which support multiple IP interfaces on a single host and will use a different data root for different IP addresses by which the server is accessed.
A common idiom on the Web is to use a URL like
"http://hostname/~john/foo.html
" to request a document in a
subdirectory of the home directory of user "john
".
WN implements this using the first of the mechanisms above.
The maintainer uses the macro #define TILDE_USER_PWFILE
in the configuration file config.h
, then chooses
'/~
' to be the special string and the server consults the
system password file and changes the root data directory to
"/home/john/public_html
" (assuming "/home/john
"
is "john
's" home directory) and simultaneously deletes the
"/~john
". Thus, the URL becomes
"http://hostname/foo.html
" with root directory
"/home/john/public_html
" so the file accessed is
"/home/john/public_html/foo.html
". Both the string
"/~
" and the subdirectory "public_html
" are
configurable by changing values in config.h
. They are called #define TILDE_USER_STRING
and #define PUB_HTML
respectively.
Thus, to avoid the problematic character '~
' a maintainer
might set #define TILDE_USER_STRING
to "/people/
" and might set #define PUB_HTML
to "wwwstuff
". Then the URL
"http://hostname/people/john/foo.html
" would result in the
equivalent of a URL request for "/foo.html
" with a data root
of "/home/john/wwwstuff
". Note that #define TILDE_USER_STRING
must be everything before the user name at the beginning of the URL.
Thus it needs to be "/people/
" and not
"/people
".
As a security measure the maintainer can set a variable #define LEAST_UID
(with default value 100) in config.h
and any attempt to use
"/~user/
" will fail if the user id of "user
" is
less than this value.
A second method of establishing an alternate data hierarchy does not use
the password file but a table supplied by the maintainer. This method is
enabled by defining the #define TILDE_TABLE
variable in config.h
.
This should be the complete system pathname of a file containing names
and their corresponding data hierarchy directories.
For example, if config.h
contains the
line:
#define TILDE_TABLE /usr/local/etc/wn.dir.table
and the file "wn.dir.table
" contains lines like:
john:/home/john/public_html
bob:/home/bob/public_html
then the URL "http://hostname/~bob/foo.html
" will be
translated to a request for "/foo.html
" with the data root
directory set to "/home/bob/public_html
". The file
consulted should have lines consisting of a name followed by a
':
' followed by the full system path of the desired root
data directory. Of course, the name need not be the name of a user. One
difference with this method is that the #define PUB_HTML
value from config.h
is not
used and if you want it to be part of the path it should be in the path
part of each entry in your table.
This method of changing hierarchies uses the #define TILDE_USER_STRING
in precisely the same way that the password file method does. In other
words, if #define TILDE_USER_STRING
is defined to be "/people/
" instead of '/~
'
then "http://hostname/people/john/foo.html
" will be
translated to a request for "/foo.html
" with root data
directory "/home/john/public_html
".
More precisely when the #define TILDE_USER_STRING
is at the start of the URI it is deleted and everything after it up to
the next '/
' is taken to be the name to be looked up in the
table or password file. This name is also discarded and the remainder of
the URI is taken to be the path of the desired document relative to the
new data hierarchy root.
When you use the table lookup method to retarget data root directories of
user hierarchies, you can specify a directory that is not even on the
local server host. To do this, specify a lookup table entry consisting
of a name, a double colon (rather than a single colon), and the URL to
the user's data root directory. For example, if user "jim
"
has a data root that is accessed as "~jim
" on
"otherhost
", you can specify the lookup table entry like
this:
jim::http://otherhost/~jim
Then when a client sends a request like
"http://hostname/people/jim/path/file.html
" to your server,
the server will return a redirect to the client for
"http://otherhost/~jim/path/file.html
".
The ability to specify non-local user hierarchies is useful when you run
a primary web server (e.g. "www.your.site.edu
") and want all
your users to be able to advertise home page URLs that begin with
"http://www.your.site.edu/
" whether or not their accounts
are actually on that host.
Many systems allow multiple IP addresses for a single host. For systems
with this capability it is possible to use a different data hierarchy for
each IP address to which your host responds. Sometimes this is referred
to as "virtual hosts" or a "multi-homed" server since when combined with
DNS aliasing it permits a single
server to appear to be multiple servers on different
hosts. Unfortunately, I cannot help you in configuring your system to
respond to multiple IP addresses. (Some information about this can be
found at http://www.thesphere.com/%7Edlp/TwoServers/)
However, if your system supports this capability you can have different
data hierarchies for each IP address if you run the stand-alone version
of the server (wnsd
).
There are two ways to have multiple virtual hosts. Assuming
that you actually have more than one network interface
(i.e. more than one NIC), the easiest is to run multiple
instances of the server each with a different IP address indicated
on the command line via the -i
option.
For example running the commands:
wnsd -i 192.168.1.1 -L abc.log /abc/root
wnsd -i 192.168.1.2 -L def.log /def/root
will start two instances of the server, both on the same port (80) but responding to different IP addresses.
Alternatively,
wnsd -h www.abc.com -L abc.log /abc/root
wnsd -h www.def.com -L def.log /def/root
will start two instances of the server, both on the same port (80) but
responding to different IP addresses. The server will ascertain the
appropriate IP address by using the UNIX gethostbyname(3)
system call for the name supplied on the command line.
Note: To use this method make sure the "#define USE_VIRTUAL_HOSTS
" line inconfig.h
is commented out.
This method has the disadvantage of using somewhat more resources than the alternate method (described below) because at least one instance of the server for each host name is running at all times and thus using memory. On the other hand with this method it is possible to specify a different log file for each host, while the method below uses only one log file and a utility to separate it into the contributions of each host.
The second method to implement a multi-homed server involves only a
single initial instance of the server which responds to all requests and
changes the document root as appropriate based on the IP address or host
name to which the request was sent. To use this method you need to edit
the config.h
file and
uncomment the line containing "#define USE_VIRTUAL_HOSTS
".
Then you have two choices. You can either create a file containing a
list of the virtual hosts or you can build this list into the compiled
version of the sever by editing one of the source files and recompiling.
If you have only a few virtual hosts and you don't anticipate adding new
ones, I recommend compiling in the list. This is easy to do and much
more efficient if you use wnd
with the UNIX inetd(8)
system utility. If you use wnsd
there is little difference
in efficiency.
You can either compile in the list of virtual hosts or keep this list
in a file which will be read each time the server starts. The second
option is described here and the first is described in the next section.
To use a virtual host file you must uncomment the line containing "#define VIRTUAL_HOSTS_FILE
"
in the file config.h
and
change the quoted path to the full path of a file containing virtual host
information in a format described below. Then you must recompile the
server. Once this has been done you may change the file you use with the
-V
option to the server.
But to turn this feature on, it is necessary that "#define VIRTUAL_HOSTS_FILE
"
be uncommented when the server is compiled. Each time this file is
changed, in order for the change to take effect, you will need to restart
the server or send it the SIGHUP
signal with the UNIX
kill(1)
utility.
The format of this file is one line per virtual host. Each such line should have the form:
hostname IP_address root_path vhost_nickname user group flag
with the four parts separated by white space. For example entries might be:
myhost.com 111.222.222.111 /usr/wn/myroot myhost bob mygroup 0
client.com 111.222.222.111 /usr/wn/client client george clientgroup 1
In particular the hostname should be a fully qualified domain name.
The user
and group
fields are the name of the user and
group to be passed to suexec
if it is used. If it is not these
can be set to the string NULL
. However, if suexec
is used, both the user and group must be defined for each virtual
host -- they could have the value of the user and group under which your
server runs. The flag
field currently
has only two allowable values, 0 and 1. The value 1 indicates that
user home directories are not permitted
for this virtual host.
Lines in this file which are empty or start with '#
' are
ignored. When using verbose logging the nickname will be enclosed in
angle brackets and used as the last field of each log entry. If the
nickname is omitted a number will be used representing the position of
virtual host in the virtual host file.
By default there is a maximum of 64 virtual hosts allowed in a virtual
host file. But this can be increased by editing the file
wn/vhost.c
and changing the line
"#define MAXVHOSTS (64)
". Just replace the
64
with the value you wish.
WN supports the "Host:
" header implemented by
some browsers (e.g. Netscape
Communicator) and so-called "full URL requests". For browsers that
support either of these features it is now possible to have multiple
virtual hosts with a single IP address. The HTTP/1.1 protocol requires
browsers to support the "Host:
" header.
Using this feature requires nothing beyond setting up the server exactly
as described above for virtual hosts. Of course, all your virtual hosts
will have the same IP number if your system only has one. Then if a
browser provides the "Host:
" header (which should contain
the hostname and port it is trying to access) the WN server will
use the root data directory you specified for that host name. Similarly
if a full URL request like
"GET http://host.abc.com/dir/foo.html
" is used the
server will use the root data directory corresponding to
"host.abc.com
". If the browser provides neither of these
the server will use the first root data directory whose IP number matches
(which will be the first in your list if you have only one IP address).
To compile in the list of virtual hosts you need to edit the file
wn/vhost.h
. This file is quite short and contains something
like:
#ifdef USE_VIRTUAL_HOSTS WN_CONST char * WN_CONST vhostlist[][7] = { { "realname.com" , "123.123.121.1", ROOT_DIR, "nickname0", "usr0", "grp0", "0" }, { "virtual1.com" , "123.123.121.1", "/var/data1", "nickname1", NULL, NULL, NULL }, { "virtual2.com" , "123.123.121.1", "/var/data2", "nickname2", "usr2", "grp2", "1" }, { "another.ip.com", "123.123.123.2", "/var/data3", "nickname3", "usr3", "grp3", "1" }, { NULL, NULL, NULL, NULL, NULL, NULL, NULL } }; #endif
The line containing
{ "virtual2.com", "123.123.121.1", "/var/data2", "nickname1" }
,
etc., should be replaced by a line containing one of the IP addresses of your
host instead of "123.123.121.1"
, the correct system path to
the corresponding data root instead of "/var/data1"
, one of
the hostnames that corresponds to this IP address, and a nickname to use
in logging. The "usr2"
and "grp2"
are only used with
suexec
and can have the value NULL
if you
are not using suexec
. The final entry in this line is a string
representing a flag for use with virtual hosting.
For the moment it should only be NULL
or one of the quoted strings
"0"
or "1"
. The quoted string "1"
disallows
the use of any user home directories for
URL's referring to this virtual host. The values NULL
or "0"
are
equivalent and give the default behavior which allows serving from user home directories
only if the server was compiled with this feature enabled.
The hostname in these lines is used by the server in only two
ways: when a redirect header is sent, and to pass the correct server host
name to CGI/1.1 programs
in the environment variable. The nicknames are used as a field of a verbose log entry. When using verbose
logging the nickname (enclosed in angle brackets) will be the last field
of each log entry. If the nickname is NULL
then a number will be used for its value
representing the position of virtual host in the virtual host file.
All the lines of this file should be changed in a similar fashion. You
may have more as many lines as you wish and you should remove any of the
lines you don't need. Don't change anything else. In particular make
sure that the
{ NULL, NULL, NULL, NULL, NULL, NULL, NULL }
line is unchanged and that you change only the parts inside quotation marks. If
the IP address by which the server is accessed does not match any of the
addresses listed in this file then the server will use the default data
root (as specified when you ran the configure
program
or edited the value of "#define ROOT_DIR
" in config.h
).
The server will not produce separate log files for each IP address.
However, if the server is configured to produce verbose logs then each
entry is tagged at the end with the nickname of the virtual host which
received the request. The wnv2c
utility can then be used
to produce separate log files for each IP address.
The main utility program used by WN is wndex
which
is used to produce the index.cache
files from index.wn
files. Its use is
described in detail in the section on "Using the wndex
Utility" in
this guide. In this chapter we consider some other utilities, mostly perl programs, which are useful in
maintaining your server.
wn_mkdigest
wn_mkdigest
is a perl
program which can be found in the bin directory of the distribution. This
program is designed to work with the range
feature of the WN server and with list searches. It produces a list of anchors
or links to sections of a structured plain text document like an address
list or a mail file.
Here is how it works. The wn_mkdigest
utility is executed
with three (or more) arguments. The first two arguments are regular
expressions. The first regular expression should match the section
separator of the structured file and the second should match the
beginning of the line to be used as the section title. (More about this
below.) The next argument is the name of a structured file, like a mail
file, news digest or address list. Instead of a single structured file
several files can be listed and wn_mkdigest
will process
their concatenation.
Now more about the regular expressions: Suppose our structured file is a
mail file in its usual format with a number of messages. The first
regular expression should match just the lines which are the beginning of
each section (in this case each message). For a mail file a good choice
would be "^From
" which matches the word
"From
" followed by a space at the beginning of a line.
The second regular expression matches start of the line which you would
like to be the title of the section. It is convenient to have the link
text be everything after the occurrence of the matching pattern
for this regular expression. So for the mail file we would choose
"^Subject:
" for this regular expression. Then the program
will produce a list of links one for each message with the text in the
anchor the contents of the message Subject line (minus the word
"Subject:
"). Each link when accessed will produce a plain
text document containing just that mail message.
So if our mail file is named foo
we should execute the
command:
wn_mkdigest "^From " "^Subject:" foo
Note the quotation marks which are needed to get the space after
"From
". It produces a file named
foo.index.html
which consists primarily of an unordered
list. Each item in the list is an anchor referring to a line range in
foo
-- the ranges being delimited by lines which match the
first regular expression argument. In this case that means each range
will start with a line beginning with "From
" which is
the marker in a mail file designating the start of a new message. The
anchor label for each range is taken from the first line in the range
which contains a match for the second regular expression and, in fact, as
mentioned above, it will consist of everything on that line
after the matched regular expression.
The first line of each range or section is a line which matches the first
regular expression and the next matching line will begin the next
section. Normally the search for the match for the anchor title regular
expression begins with this first line. However, it is sometimes useful
to skip this first line in the search for a title match. This can be
done by starting the second regular expression with the character
'$
'. For example the command:
wn_mkdigest ^$ $^ foo
is a common one. It says to divide foo
into sections (line
ranges) which are separated by blank lines (the regular expression
'^$
' matches a blank line). To obtain an anchor title for
each section the blank line is skipped (since the second regular
expression starts with '$
') and then everything on the next
line is taken as the title (since '^
' matches the beginning
of the next line). The regular expressions of this example would be
useful, for example, for an address list foo
which consisted
of multi-line records separated by blank lines with an individual's name
on the first line of each record. The wn_mkdigest
utility
would then produce a foo.index.html
file with an unordered
list of anchors, one for each individual in the list. Selecting an
anchor would present the record for that individual. Using a list search for this file would allow a form
user to enter a name or regular expression and obtain a list of anchors
for matching items.
The wn_mkdigest
command can have any number of files listed
after the regular expressions and it will produce a single file whose
name is the name of the first file with ".index.html
"
appended. This file will contain a list of links to all the sections of
all the files given on the command line.
When wn_mkdigest
writes the index file (e.g.,
foo.index.html
), it adds two HTML comments to mark the start
and end of the lines containing links to the records in your structured
document. The markers look like this, where VERSION
is the
current version of wn_mkdigest
:
<!-- Range list generated by wn_mkdigest/VERSION -->
<!-- End of range list generated by wn_mkdigest/VERSION -->
The first time wn_mkdigest
writes an index file, it writes a
default leader and trailer before and after the link lines. If
wn_mkdigest
finds an existing index file when it runs, it
uses the information preceding the first marker and following the second
marker as the leader and trailer for the new index file. This means you
can run wn_mkdigest
to create the initial index file, then
edit the beginning and/or end of the file to modify the leader and
trailer. Subsequent invocations of wn_mkdigest
will retain
your modifications each time the index file is recreated.
If you add the -b
argument when you use
wn_mkdigest
(i.e. run the command
"wn_mkdigest -b regexp1 regexp2 foo
"
then it will produce a file foo.index.html
which uses byte
ranges rather than the default line ranges. This functions the same
except the server will log the number of bytes actually sent when a
request is served (the server won't bother to count the bytes in a line
range request).
There are fancier tools than wn_mkdigest
for displaying mail
archives, but this utility has great flexibility for dealing with a wide
variety of structured files.
wnpnuts
PNUTS (pronounced "peanuts") is an acronym for previous, next, up, top,
search. wnpnuts
is a perl program which takes as argument the
name of a file describing the hierarchical structure of a group of HTML
files constituting a single virtual document. The wnpnuts
program then searches these files for lines which begin with optional
whitespace followed by the string:
<!-- pnuts -->
which it replaces with this string followed by a sequence of anchors like:
[previous]
[next]
[up]
[top]
[search]
[index]
with links to the relevant files in the virtual document. Actually it
replaces this line with a single line starting with
<!-- pnuts -->
, followed by the anchors.
That way the next time it is run, say after inserting a new chapter in
your document, the <!-- pnuts -->
line will
be replaced by a new one with the appropriate links.
The wnpnuts
program is run with a command like:
wnpnuts -s dosearch.html -i docindex.html foo.pnuts
The argument "-s dosearch.html
" is optional and
supplies a URL for the "[search]
" anchor to be substituted.
Thus if just "dosearch.html
" is used this will be an anchor
linking to a relative URL. Instead you could use a full URL like
"http://hostname/dir/file
". If there is no
'-s
' argument then there will be no search item in the list
of items inserted by wnpnuts
. The optional argument
"-i docindex.html
" is similar to the '-s
'
option except it provides the URL (relative or absolute) which should be
anchored to "[index]
". This URL typically points to an an
HTML document created with wnindexmaker.
The file foo.pnuts
contains the information by which
wnpnuts
knows which files to process and what the order of
those files should be. It consists of a list of files relative to the
current directory, one per line, in the order which should be reflected
in the "[next] [previous]
" links. If a file is
hierarchically one level lower than the previous file this should be
indicated by preceding its name with one more "<tab>
"
character than the preceding file. Here is an example:
top.html
second.html
<tab>firstsub.html
<tab><tab>subsub.html
<tab>secondsub.html
third.html
If this list is supplied to wnpnuts
it will insert anchors
into all these files wherever <!-- pnuts -->
occurs. All those named [top]
will point to the file
top.html
. In firstsub.html
and
secondsub.html
the [up]
link will point to
second.html
. The [previous]
and
[next]
links will reflect the order top.html
,
second.html
, firstsub.html
,
subsub.html
, secondsub.html
,
third.html
.
wnindexmaker
This is a perl program whose function
is to produce an index (in the usual sense not the WN sense) for
a virtual document consisting of a number of HTML files in a single
directory. The index to this guide is a good
example of how an index produced by wnindexmaker
works.
The wnindexmaker
program is run with a command like:
wnindexmaker -d path -t "Index Title" -o outputfile words
Here the -d
, -t
and -o
arguments
are optional. The -t
option supplies the title for the HTML
document produced. If no -t
argument is given then
"Index
" is used as the title. The -o
option
provides a name for the output HTML file -- the default being
docindex.html
.
The -d
option should be the directory containing the files
being indexed. It should either begin with a '/
' and be
relative to the WN root directory or not begin with a
'/
' and be relative to the directory which will contain the
docindex.html
file. If there is no -d
option
then the docindex.html
file must reside in the same
directory as the files being indexed. If this is done then it is a good
idea to add an Attributes=nosearch
to the docindex.html
record in the index.wn
file for the
directory. Otherwise docindex.html
will index itself in
addition to the other files in the directory.
The final argument to wnindexmaker
is the file
words
. It is a list of words or phrases, in alphabetical
order, one per line, which you wish to appear in the index. One way to
produce it is to use UNIX utilities to produce a list of all words in the
files, then run UNIX sort(1)
utility with the options -dfu
on it and remove unsuitable
words from the list.
What the wnindexmaker
program does is produce a long list of
anchors, one for each word in the words file. Each word is linked to a
context search for itself.
wn_uncache
wn_uncache
is a perl
program which reverses the action of wndex. It will convert an
index.cache
file to an index.wn
file. It read from its
UNIX stdin(3)
stream and writes to its UNIX stdout(3)
stream.
Thus when invoked with:
wn_uncache <index.cache >index
it will create a file named "index.wn
" (overwriting any other
file of that name). This file may not be identical to the original
index.wn
file used to create index.cache
, but when
wndex is run on this new
index.wn
file it should produce an index.cache
identical to the one used as input for wn_uncache
.
wnv2c
The perl program wnv2c
converts log files produced by the
server in the verbose format to files in
the common log format handled by most server statistics utilities. It
also can extract the entries for each virtual host of a multi-homed server which uses different data roots
for different IP addresses or different DNS names:
wnv2c [-v] [-i nickname] <verboselog >commonlog
By default this program reads from the UNIX stdin(3)
stream a WN log file produced in the verbose format and writes a
non-verbose one in the "common log format" to UNIX stdout(3)
stream. With the "-i nickname
" option it writes only
those entries from the virtual host with
specified nickname (e.g. if you have edited the file
/wn/vhost.h
to contain:
#ifdef USE_VIRTUAL_HOSTS WN_CONST char * WN_CONST vhostlist[][4] = { { "realname.com" , "123.123.121.1", ROOT_DIR, "nickname0" }, { "virtual1.com" , "123.123.121.1", "/var/data1", "nickname1" }, { "virtual2.com" , "123.123.121.1", "/var/data2", "nickname2" }, { "another.ip.com", "123.123.123.2", "/var/data3", "nickname3" }, { NULL, NULL, NULL, NULL } }; #endif
then
"wnv2c -i nickname2 <logfile >log2
"
will create log2
, the file of log entries for the virtual
host with DNS name virtual2.com
. If you have used the empty
string in place of nicknames in the file wn/vhost.h
the
virtual hosts are numbered consecutively in the order they are listed,
starting with 0
. So you would use
"wnv2c -i 2 <logfile >log2
" to get
the log entries for the second virtual host.
Using the "-v
" option along with the "-i
"
option gives the verbose form of log entries for specified virtual host.
wnredir
The perl program wnredir
is a very simple Cache-Module
for use with WN. It's function is to automatically redirect
requests for documents in one directory to requests for other URL's. If
you put:
Cache-module=redir http://host/dir/foo.html
in the index.wn
file of a
directory then every request for something in that directory will be
redirected to a request for http://host/dir/foo.html
.
If you put:
Cache-module=redir http://host/dir/
then a request for http://this_host/this_dir/whatever.html
will be redirected to http://host/dir/whatever.html
. These
can be useful if you move the contents of an entire directory.
WN provides some features by auxiliary modules. These are separate C programs called by the server to provide some function like authorization (authentication) or indexed searches or database access. The current release of WN has a module for authorization using the HTTP/1.1 Basic Authentication scheme supported by most clients and future releases will have a module to do index searches.
The current version also has the support infrastructure or "hooks" for several services which maintainers may wish to create.
The index.cache
in each directory is in fact a small flat
database containing characteristics of that directory and the files
contained in it. This can be replaced on a per directory basis by a user
supplied module to provide the same information. In the index.wn
file for such a
directory you should create a directory
directive entry like:
Cache-Module=/full/path/to/foo
The program "foo
" should be an interface to your
database. It is permissible to follow "foo
" in this line
with any desired arguments to be passed to it. The program
"foo
" should read the environment variable WN_KEY
which will
contain everything after the final '/
' in the URL being
served. Based on this key it should return a single line in the format of an index.cache
file. Any
"File=
" directives in
an index.wn
file which
contains a "Cache-Module=
"
directive will be ignored.
You may wish the cache module function to have access to more information
about the request. If you put the directive "Default-attributes=cgi
"
before the "Cache-Module=
"
directive then all the CGI environment
variables will be set.
If the file cache module encounters an error, such as no entry corresponding to the supplied key, it should return nothing and exit with a negative status. This signals the server to return an appropriate error message.
It is perhaps more useful to have the files being returned reside in a
database. To use such a database interface called "bar
", in
the index.wn
file for its
directory you should create a directory
directive entry like:
File-Module=/full/path/to/bar
The program "bar
" should be an interface to your database.
It should read the environment variable WN_KEY
which will
contain everything after the final '/
' in the URL being
served. Based on this key it should return a document to be served
(perhaps after processing with wrappers or includes). When a file module
is used it is also necessary to either use a "Cache-Module=
" or have
an actual index.cache
file with the name of all documents
which might be returned by the database or to have an
index.cache
file created from an index.wn
file containing the
directory directive "Attributes=serveall
".
If the file module encounters an error, such as no document corresponding
to the supplied key, it should return an appropriate error message of the
same content-type
as the requested document.
As with cache modules, you may wish the file module
function to have access to more information about the request. If you
put the directive "Default-attributes=cgi
"
before the "File-Module=
" directive
in your index.wn
file then
all the CGI environment variables will be
set.
Authorization requests in WN are normally handled by an authorization module. The current release contains a Basic Authentication module and a prototype implementation of Digest Authentication, but users are invited to provide their own.
Warning: I would strongly advise against using basic authentication described here to protect sensitive information on a server which runs on system on which untrusted users have accounts.
The contents the Authorization:
request header is passed to
the module on the UNIX stdin(3)
stream (as of version 1.10).
The server expects this module to exit with a status indicating that
access is granted, denied, or that an error occurred. Anyone writing an
authorization module should consult the file
/wnauth/wnauth.h
. The "#define
s" listed there
specify the error status to use for granting, denying or indicating
certain errors.
To use a user supplied authorization module named authmod
the index.wn
file of each
directory requiring authorization should contain lines like:
Authorization-realm=myrealm
Authorization-module=/full/path/to/authmod
Authorization-type=basic
The program authmod
should indicate the granting or denial
of authorization by its exit status as described above. The "Authorization-realm=
"
is a designation which is sent to the client with the authorization
request allowing the client to decide whether the user has already
supplied a password which can be reused (if the realm is the same as when
the password was supplied) or if a new password needs to be entered.
Indexed searches can be supported in WN by auxiliary modules.
Two such modules are provided as examples and maintainers may wish to
create others. To use such a module you should have an HTML <form>
action be something like:
http://host/dir/search=index
Then in the index.wn
file
in the directory record you should have a line like:
Search-Module=/full/path/to/searchmod
The program searchmod
should read the CGI environment
variable QUERY_STRING
and
return a partial HTML document. The typical case would be the program
returns an unordered list of anchors to documents containing a match to
the query string. This list can be wrapped by including a "Searchwrapper=
"
directory directive. If it is not, a default wrapper with text like
"Here are the matches for your search.
" is supplied.
Two simple examples of a search-module (written in perl) are included in the distribution in
the files bin/wnseven_m
and bin/wnsectsearch
.
As of version 1.07 the WN server does not require the NCSA
C imagemap.c
program to support clickable images. This
functionality is now built into the server. Try clicking on this image
for an example.
Clickable images like this make use of a "map file" telling the server which coordinates in the image correspond to which document. The format of this file (described below) is the same as that used with the NCSA imagemap program (with a few minor enhancements).
There are a number of image creation and editing programs that can be
used. One (which I have not tried) is xpaint. The
image file must be in png
, jpeg
or
xbm
format.
This is a file containing lines specifying regions in the image and the URL to which the client should be directed after clicking in one of these regions. The format of this file is described below in detail, but roughly a line contains the name of a shape (like a rectangle) the URL to go to and the pixel coordinates necessary to specify the rectangle.
There is a commercial program called mapedit and a free tcl/tk program mapmarker (neither of which I have used) that are designed to automate this task.
In the index file for the directory containing the image (let's call it
image.png
) put the lines:
File=image.png
File=image.map
Attributes=imagemap
If your imagemap file redirects the client to other documents on your
server make sure they are also installed. Run wndex
where necessary to
update your index.cache
.
Note: A file namedfoo.map
(i.e. with suffix ".map
") will automatically be considered to have "Attributes=imagemap
" set whenwndex
is run unless a "Content-type=something
" line is supplied for it.
This will also work if the "Attributes=serveall
"
directive is specified for this directory and there is no entry in the index.wn
file for
foo.map
. But it is necessary to run wndex
since the server will
not recognize foo.map
as an imagemap
file if wndex
has not been run on
the directory containing it.
In the HTML file which you want to contain the clickable image must put a link to it which looks like:
<a href="image.map">
<img src="image.png" ismap>
</a>
assuming that "image.map
", "image.png
" and the
file containing this link are all in the same directory.
An imagemap file is a plain text file consisting of lines, each of which tells the server what to do if a user clicks in a certain region. You will need to know the dimensions of your image and you will have to have a way of reading the coordinates of points on your image when you click on it. I use the program xv to determine the coordinates.
A complete functioning example of imagemap use, illustrating all the methods is included with the WN distribution.
An imagemap can contain comments which are lines starting with the
character '#
'. These lines are ignored by the server, as
are lines which are blank.
All other lines must start with one of the imagemap "method" keywords:
default
, rect
, circle
,
poly
,
or point
. Here is a
description of each type of method line.
default
keyword
This line contains only the keyword "default
" followed by
white space and the URL to be referenced if none of the other other
regions is selected. An example is:
default http://webhost/dir/default.html
rect
keyword
This line contains the keyword "rect
" followed by white
space and the URL to be referenced and more white space followed by the
coordinates of the upper left and lower right corners of the rectangle.
When the coordinates of the clicked on point lie within the rectangle
specified by these coordinates the given URL will be referenced. An
example is:
rect http://webhost/dir/rect.html 20,20 70,100
Note the coordinates of the rectangle corners consist of the X coordinate
(two numbers separated by a comma) followed by whitespace followed by the
Y coordinate (two more numbers separated by a comma). This URL will be
returned if the clicked on point lies in the rectangle with opposite
corners (20,20)
and (70,10)
. In screen
coordinates (as opposed to the Cartesian plane) these are the upper left
and lower right corners respectively.
circle
keyword
This line contains the keyword "circle
" followed by white
space and the URL to be referenced and more white space followed by the
coordinates of the center of the circle and the coordinates of a point on
the circle. When the coordinates of the clicked on point lie within the
circle specified by these coordinates the given URL will be referenced.
An example is:
circle http://webhost/dir/circle.html 70,70 70,100
This URL will be returned if the clicked on point lies in the the circle
with center at (70,70)
and radius 30
(i.e. the
circle with center (70,70)
containing the point
(70,100)
).
poly
keyword
This line contains the keyword "poly
" followed by white
space and the URL to be referenced and more white space followed by the a
sequence of coordinates of the vertices of a polygon, in order. The
polygon so determined is the one you would obtain by connecting each of
the vertices in order with a straight line and then connecting the last
vertex to the first. The polygon need not be convex.
Technical note: The curve formed in this way may cross itself -- a point will be considered inside if the winding number of the curve around the point is non-zero.
An example is:
poly http://webhost/dir/triangle.html 10,20 70,100 5,150
This URL will be returned if the clicked on point lies in the triangle
with vertices (10,20)
, (70,100)
and
(5,150)
.
point
keyword
This line contains the keyword "point
" followed by white
space and the URL to be referenced and more white space followed by the a
sequence of coordinates of one or more points in the the image. This
method replaces the default
. The default URL
referenced is the one associated with the point in a "point method" line
which is closest to the clicked location. An example is:
point http://webhost/dir/point1.html 10,20 70,100
point http://webhost/dir/point2.html 100,110 80,5
If no other method has returned a URL then the returned URL will be to
point1.html
or point2.html
depending on whether
the clicked on location is closer to one of the points
(10.20)
or (70,100)
or to one of the points
(100,110)
or (80,5)
.
Note: Allowing multiple points on a single point method line is a WN enhancement and will not be portable to other imagemap programs.
nocoords
keyword
This line contains the keyword "nocoords
" followed by white
space and the URL to be referenced when a user with a non-graphical
browser selects the the substitute text for the image. More precisely,
if any browser makes a request for the document and does not include the
X and Y coordinates of the point clicked on in the image then this URL
will be referenced. An example is:
nocoords http://webhost/dir/message.txt
Note: The nocoords
keyword and functionality is a
WN enhancement and will not be portable to other imagemap
programs.
The URL in the lines listed above can be in one of the following three forms:
http://remote_host/dir/rect.html
/dir/rect.html
rect.html
Here are the differences. The first,
"http://remote_host/dir/rect.html
" is primarily used to link
to a document on a remote host. On seeing this the server sends an HTTP/1.1 redirect to the client
causing the client to automatically access this URL.
The use of "/dir/rect.html
" is equivalent to
"http://local_host/dir/rect.html
". Any URL specified in a
method line which begins with a '/
' is assumed to be on the
local local host and the path is assumed relative to the WN root
directory. An HTTP/1.1
redirect is sent because simply sending the file would confuse the client
(it would think that the document rect.html
is in the same
directory as the imagemap file which might not be true in which case
relative URLs would fail to work).
Warning: The WN root directory is changed by use of
http://host/~user/
type URLs.
The use of "rect.html
" or anything which does not contain
':/
' (i.e. which does not start with http://...
or ftp://...
) and does not start with '/
'
causes the server to assume that the document is relative to the
directory containing the imagemap. In this case the server will
immediately serve that document with no redirection. This is the most
efficient way to set up an imagemap, but it requires that the response
files like "rect.html
" be in the same directory as the
imagemap file. There is one exception to this. If the relative URL
contains a '#
' the server prepends the appropriate path to
the URL and sends an HTTP/1.1
redirect to the client. This is necessary for the client to be able to
handle the "name" anchor indicated by the '#
'.
Note: Allowing relative URLs of this type is a WN enhancement and will not be portable to imagemap programs run on other servers. Also note that using a relative URL like "dir/rect.html
" referring to a document in a subdirectory can confuse a browser about the location of documents referred to in "rect.html
". For this reason it should only be used when the document is not an HTML document or contains a<base>
tag specifying its location.
There is one special URL which may be used in map files, namely
<null>
. It has the effect of not sending a redirect
or a document to the client but instead sending a header with status 204.
This instructs the client to do nothing and in particular not to reload
or alter the document it is displaying. For example, the line:
default <null>
in a map file will have the effect of making clicks in the default area behave as if the click had been somewhere outside the image, i.e. nothing will happen.
Note: the <null>
URL is a WN
enhancement and will not be portable to imagemap programs run on other
servers.
CGI stands for Common Gateway Interface. It provides a standard for Web servers to interact with programs which are not part of the server but may produce output which you wish to serve.
Many functions which are done by CGI programs on other servers are built in features of WN. If your needs can be met by these features then not only will you save yourself considerable effort in creating, setting up, and maintaining programs, but the built in feature will perform much more efficiently and much more securely than a CGI program.
These features include the ability to respond with different text or entirely different documents based on the the client request, the client's hostname, IP address, user-agent, or the "referer", the document containing the link. For information about this see the chapter "Parsed Text and Server Side Includes on the WN Server" in this guide. Also support for "imagemaps" or clickable images is built in so there is no need to use CGI for this. See the chapter "Clickable Images and Imagemap files on the WN Server" in this guide. Finally WN supports a variety of methods of searching your data including by title, keyword, or full text. See the chapter "Setting Up Searches on the WN Server" in this guide.
If these features do not meet your needs and something like a CGI program will, then you may wish to consider using a WN filter. These have most of the functionality of CGI programs, but are somewhat more secure and have one advantage: the output of filters can be parsed while CGI output cannot.
It would be nice if one could simply indicate in the appropriate index.wn
file that a
particular file is a CGI program which should be executed rather than
served. Unfortunately, the CGI protocol makes it impossible to implement
this in an efficient way.
There are two mechanisms in fairly common use with other servers for
indicating that a file is a CGI program and WN supports them
both. The first is to give the file name a special extension (by default
it is ".cgi
") which indicates that it is a CGI program.
Thus any file you serve with the name "something.cgi
" will
be treated as a CGI program. The special extension ".cgi
"
can be changed by redefining the macro "#define CGI_EXT
" by
editing the file config.h
and recompiling servers.
The second mechanism is to have specially named directories with the
property that any file in that directory will be assumed to be a CGI
program. The default for this special name is "cgi-bin
".
Thus, if you have a directory /cgi-bin
in your hierarchy the
server will assume that any file served from that directory is a CGI
program. Of course, as always, only files listed in that directory's index.wn
file will be
servable. No files in subdirectories of /cgi-bin
can be
served. This is because the server will alway interpret a request for
"/cgi-bin/foo/bar
" as meaning run the program
"/cgi-bin/foo
" with the PATH_INFO
CGI
environment variable set to "bar
". Thus if
"foo
" is actually a directory and "bar
" a file
in it, the request will fail.
There is no need for /cgi-bin
to be at the top of your
hierarchy. It could be anywhere in the hierarchy. And, in fact, you can
have as many directories named "cgi-bin
" as you like. They
will all be treated the same. The special name "cgi-bin
"
can be changed by redefining the macro "#define CGI_BIN
"
by editing the file config.h
and recompiling servers.
It is beyond the scope of this document to provide an extensive tutorial in writing CGI programs. There is an online tutorial at WDVL.internet.com and another available from NCSA. A collection of links to CGI information is available at www.stars.com.
We will provide only a simple example of a CGI program written in perl. More examples can be found in the
/docs/examples
directory of the WN distribution.
#!/usr/local/bin/perl
# Simple example of CGI program.
print "Content-type: text/html\r\n";
# The first line must specify content type. Other
# optional headers might go here.
print "\r\n";
# A blank line ends the headers. All header lines should
# end with CRLF ("\r\n"), but other lines don't need to.
# From now on everything goes to the client
print "<body>\n";
print "<h2>A few CGI environment variables:</h2>\n\n";
print "REMOTE_HOST = $ENV{REMOTE_HOST}<br>\n";
print "HTTP_REFERER = $ENV{HTTP_REFERER}<br>\n";
print "HTTP_USER_AGENT = $ENV{HTTP_USER_AGENT}<br>\n";
print "QUERY_STRING = $ENV{QUERY_STRING}<br>\n";
print "<p>\n";
print "</body>\n";
Notice that the first thing the program does is provide the HTTP/1.1
"Content-type:
" header line. It may be followed by other
optional headers you want the server to send. The end of these headers
is indicated by a blank line. Of course the server will add additional
headers.
By default the WN server assumes that the output of any CGI
program is "dynamic" or different each time the program is run and is
also "non-cachable". Hence the server behaves as if the "Attributes=dynamic,non-cachable
"
directive had been used. The "Attributes=dynamic
"
causes the server not to send a last modified date or a content length
since they might be constantly changing. The "Attributes=non-cachable
"
attempts to dissuade clients and proxies from caching the output by
sending an appropriate HTTP header.
If, in fact, the output of your program is always the same, you can use
the "Attributes=nondynamic
"
directive. Also if you wish it to be cached you must use the "Attributes=cachable
"
directive. In particular, if you want the browser "back" button to
return users to a a CGI generated page after they have followed a link
you may need "Attributes=cachable
"
(especially with an HTML "<form action="post">
")
since otherwise the browser may not even cache the page in memory.
The program above is a good example of one which should not be cached as it prints out the client's hostname, user agent and the URL of the document which contains the link to this CGI program. The CGI program gets this information about the client from environmental variables set by the server. A complete list of the standard CGI environment variables and a description of what they contain plus a description of some additional non-standard ones supplied by the WN server can be found in the appendix "CGI and Other Environment Variables on the WN server" in this guide.
In addition to setting these environment variables appropriately the server will change the current working directory of the CGI process to the directory in which the CGI program is located.
Note: In general a CGI program has complete control over its output, so it is responsible for doing things which the server might do for a static document. This means that you cannot use many of the WN features with CGI output. In particular the server will not use a filter or parse it for "<!-- #include -->
", etc. The CGI program must do these things for itself. Also the server will not provide ranges specified in the "Range:
" header. Instead the contents of this header is passed to the program in the environment variableHTTP_RANGE
, so the program can do the range processing.
One thing you should be aware of in writing programs is that the
WN server does not send the UNIX stderr(3)
stream
to the error log file, but leaves its
default the terminal from which the server is invoked. This allows the
maintainer to set it to a file of her choice or leave it directed to the
console window in which wnsd
was invoked. To redirect it to
a file called "my.errs
" simply run wnsd
with a
command like:
wnsd <options> 2>my.errs
if you are using a UNIX sh(1) Borne-like
shell. This can be useful when debugging CGI programs because their
errors are typically sent to the UNIX stderr(3)
stream so
you can easily view them with the UNIX tail(1)
utility like:
tail -f my.errs
rather than have them buried in a log file.
Sometimes you may have a number of files which are to be processed by the same CGI program or program. In that case you might consider designating a "handler" for these files instead of putting the the name of the CGI program in the URL for each of them.
The file directive:
CGI-Handler=bar.cgi
causes the program "bar.cgi
" to be run and its output to be
served in place of the document requested. This is a way to designate a
CGI program to handle a file somewhat like a filter. The name of the
program need not be in the URL since it is in the index.wn
file. So when
http://host/foo.html
is requested this will cause the
handler, bar.cgi
, to be run with the CGI environment
variable PATH_INFO
set to
/path2/foo.html
. In normal use the program
bar.cgi
will do something to the file foo.html
and serve the output. It is useful if you want a number of files in a
directory to be handled by the same CGI program. Note the file
foo.html
need not be used in any way by the program, but it
must exist or else the server will treat it as a non-existent file.
The directory directive "Default-CGI-Handler=handler.cgi
"
specifies that all files in the directory should be treated as if the
"CGI-Handler=
"
file directive had been set to handler.cgi
. To override
this setting and specify no CGI handler use the "CGI-Handler=<none>
"
directive.
This is an extremely important issue, but one which is beyond the scope of this document. I highly recommend the Safe CGI Programming maintained by Paul Phillips and the WWW Security FAQ maintained by Lincoln Stein.
Unfortunately, I do not have time to answer all the questions sent to me by WN users. I recommend that you subscribe to the mailing list of WN maintainers and ask your questions there. It is read by many knowledgeable WN users (including me).
To subscribe with the address you're sending mail from, send a message to majordomo@cyberwerks.com containing the line:
subscribe wn-maint your.email.address
The people reading this list seem to be very helpful.
I would like to thank the many people who have aided in the creation of the WN package. These people include Earle Ake, David Capshaw, Henry Cejtin, Mike Crowley, Chris Davis, Paul DuBois, Mike Gallaher, Doug Hardie, Stephen Hebditch, Hamish MacEwan, Jishnu Mukerji, Marko Nordberg, Jeff Norden, Willem Pinckaers, Ron Pool, Kenji Rikitake, Jim Rees, Craig Milo Rogers, Chip Rosenthal, Sander Schippers, Kurt Siegl, Doug Simpkinson, Stephen Trier, Rico Tudor, Peter van Dijk, Pim van Riezen, and William Wells.
Special thanks are due to Jean Pierre LeJacq, who has worked extensively on the documentation and Paul Hoffman who maintains the mailing list.
These are the command line options for both the UNIX inetd(8)
system utility based daemon, wnd
, and the stand-alone
daemon, wnsd
.
-a numeric_uid
-- Set
trusted numeric uid for authentication
The server will only allow password authentication if one of
the options -a
, -A
, -t
, or -T
is used. Then the index.cache
file for a
password protected directory must be owned by the user id specified
with the -a
(or -t
) option. The
-a
option only affects authentication while -t
has other effects. The four
command line arguments -a
, -A
, -t
, -T
all take a numeric argument.
Thus the command should be "./wnsd -a 203
" and
not "./wnsd -a joe
" if user
"joe
" has user id "203
".
-A numeric_gid
-- Set
trusted numeric gid for authentication
This is similar to the
option except
the group owner (gid) of the -a
index.cache
file is
used instead of the uid.
-d
-- Really verbose log
When the server is run with this option it will include copies of all the client headers plus some additional information in the log file. This is only useful for debugging or if you are curious about what clients actually send.
-e
-- Forbid CGI and execs
When the server is run with this option it will not execute a CGI program, a filter
or any executable in an include. In
addition the server will reject any request with the HTML <form action="post">
method.
-E
-- Restrict CGI and execs to
trusted users
This is similar to the -e
option except that index.cache
files owned
by a trusted user id or trusted group id (set with the -t
or -T
option) are exempt from the
restrictions. That is, only programs listed in index.cache
files owned
by the trusted user or group will be executed. If neither the -t
or -T
are used then no programs,
filters or includes will be executed. This option takes precedence
over the -u
option, i.e. if
both are used execution of programs and programs behaves as if the -u
were not present (although
the -u
option still grants
permission for the serving of ordinary files).
-F
-- Run server in foreground
The server (wnsd
only) will run in the foreground
rather than the background which is the default. This is useful
for things like daemontools
.
-h hostname
-- Set
hostname
Tell the server that its name is
hostname
. This is only used for a few
things: to pass to CGI programs and
for redirects generated when a trailing '/' is omitted in a
URL path to a directory. Also if the -i
option is
not used then the server will ascertain the IP address on
which to listen for connections by using the first IP
address obtained from the UNIX gethostbyname(3)
system call with this name.
-i Listen addresses
--
Set the IP addresses and ports on which the server listens
(wnsd
only).
Tell the server to listen on a set of address/ports. Since IPv6 has multiple IP addresses for each interface, multiple address/port combinations can be specified. The format for each entry is 'address/port' where address can be a IPv4 or IPv6 address, or a name resolvable via DNS. Port can either be a port number or the port name from the /etc/services files. Entries are separated by commas and generally should be enclosed in quotes. An example:
-i '2001:a:b:c::5/80, server.org.com/http-alt, 10.0.2.3/http'
In additon, "all" may be used for the address which will then listen on all addresses for IPv4 and all non-local IPv6 addresses: example: 'all/80'.
-L logfile
-- Set
log file name
Write log information to the file logfile
. This
will override the value specified when you ran the configure
program or by setting the macro #define WN_LOGFILE
in config.h
. If the
empty string ""
is used as the value of this
option then no transaction logging will be done.
-l error_log
-- Set
error log file name
Write information about errors to the file
error_log
. This will override the value
specified when you ran the configure
program or by
setting the macro #define WN_ERRLOGFILE
in config.h
. If the
empty string ""
is used as the value of this
option then the log file given with -L
is used for errors as well
as regular transactions.
-n effective_uid
-- Set
effective user numeric id (wnsd
only)
This option only has an effect when the server is run by
root
in the stand-alone version,
i.e. wnsd
. When invoked with the -n
option and a numeric user id (not a user name) the server will change
the user id under which it runs to the specified numeric value. If
you do not use this option the user id is changed from
root
to the value you set when you ran the configure
program (this is
usually the numeric id of the user "nobody
"; the
'n
' in -n
is a mnemonic for
"nobody
") or edited #define USERID
in config.h
. The
-n
option allows you to override that default
when you run the server. In normal use this option is not necessary
and should not be used unless you have a specific need.
-N effective_gid
-- Set
effective group numeric id (wnsd
only)
This option only has an effect when the server is run by
root
in the stand-alone version, i.e. wnsd
.
This option is identical to the -n
option except it sets the
numeric group id under which the server runs rather than the user id.
-p Removed
-- Replaced
by the -i option above.
PUT, MOVE
and DELETE
methods on the server.
If this option is not used the server will reject any requests
for the HTTP methods PUT, MOVE
and DELTE.
In the directory where objects are to be PUT, the "
Put-Authorization-Module, Put-Authorization-Realm, and
Put-Authorization-Type
directives must be used.
Finally in this same directory, the "
Default-Attributes=put
" directive must be used
(or in the case of a single file the " Attributes=put
"
directive.)
-q pid_file
-- Set file
name which contains the (wnsd
) server process id.
Write the process id or "pid" of the main server process to the file
"pid_file
". This file name should be the full
path relative to the system root. If this is not set either with the
-q
option, via the configure
program, or by editing the macro #define SWN_PID_FILE
in the config.h
file
then the pid will be written to the UNIX stdout(3)
stream when the server is started.
-S
-- Use the UNIX syslogd(8)
system utility for logging (deprecated).
This option is deprecated; use the -v option
instead.
Instead of writing log messages to a file specified with -L
or in config.h
, use the
UNIX syslog(3)
facility. If #define WN_ERRLOGFILE
is set to ""
and the server is run with the
-S
option then error logging will be
handled by the syslog(3)
facility. If #define WN_ERRLOGFILE
is given a value or the -l
option is used then
errors will be logged there rather than using the UNIX syslogd(8)
system utility.
-t trusted_uid
-- Set
trusted numeric uid
When invoked with the -t
option alone
wnd
or wnsd
will not serve a document
unless the index.cache
file
listing it has the prescribed owner (uid). This numeric uid should
be that of the maintainer not the one under which wnd
or
wnsd
runs if started by root
. Indeed, for
security reasons the server will refuse to use an index.cache
file whose
owner is the uid under which the server is running, in this case. If
on your server all index.cache
files are
created by a single user or a single group this option or the -T
option are highly
recommended. This added security is weakened somewhat if you also
use the -u
option which
allows index.cache
files owned by untrusted users to permit the serving of files owned
by the same user. If both -u
and -t
are used the trusted user specified by -t
is
exempt from the restrictions imposed by the -u
argument. The four command
line arguments -a
, -A
, -t
, -T
all take a numeric argument.
Thus the command should be "./wnsd -t 203
" and
not "./wnsd -t joe
" if user
"joe
" has user id "203
".
-T trusted_gid
-- Set
trusted numeric gid
This is similar to the -t
option except the numeric group owner (gid) of the index.cache
file is
used instead of the numeric uid.
-u
-- Restrict untrusted users
When this option is invoked the server requires that every file
served (including wrappers and includes) have the same owner as the
index.cache
file
which grants it permission to be served. This means that untrusted
users can only serve files which they own. If the -t
or -T
option is used with
-u
then index.cache
files owned
by the trusted user or trusted group are exempt from this requirement
and they may grant permission to serve any file the server can read.
If the -u
and -E
options are used together
then the -E
takes precedence
for execution of CGI program, a filter and executable includes, but the -u
still
has effect for ordinary files being served.
Notice that if none of -t
,
-T
, and
-u
are used then a user with his own home page can
make a symbolic link to any file readable by the server and that
document will be served. This is true even if the linked to document
is in a directory with limited access or is
outside the server data hierarchy.
-v log_format
-- Set log
format
The legal values for this option are "nolog
",
"common
", "verbose
",
"ncsa
", "syslog
", and
"vsyslog
". The first causes no logging to take
place. The next three cause the log file to be written in
the so-called common log format, or WN's verbose
format including user agent, referrer, virtual server
nickname, and cookies, or in the NCSA extended format which
includes referrer and user agent. The last two cause
logging to be done by the syslogd(8)
daemon and in the case of vsyslog
to include
the information in the verbose
format.
Each base log type may be optionally followed by a colon and
either nodns
or revdns.
If neither
of these is present then the default server action is to do
a DNS lookup on the client's IP address to obtain the
hostname of the client for logging purposes. If the
:revdns
extension is present the server will
additionally do a reverse DNS lookup on this name as a check
against name spoofing. If the :nodns
extension
is present the server will do no DNS lookup and will use the
IP address in the log instead of the host name. For example
the option "-v verbose:nodns
" indicates that
the server should use the verbose
log format,
but should use IP addresses rather than host names in log
entries. Obviously, use of :nodns
is more
efficient and :revdns
is less efficient than
the default.
The default can be set with #define VERBOSELOG
macro in config.h
.
-V virtual_host_file
--
Set file name which contains the list of virtual hosts
The file "virtual_host_file
" should be the name
of the file containing the list of "virtual hosts" and their
corresponding IP numbers and root directories. The format of this
file is one line per virtual host. Each such line should have the
form:
hostname IP_address root_path
with the three parts separated by white space. For example an entry might be:
myhost.school.edu 111.222.333.444 /var/wn
In particular the hostname should be the fully qualified domain name.
Lines in this file which are empty or start with '#
' are
ignored.
If the virtual host file is changed you will need to restart the server for the change to take effect.
wndex
Utility
These are the command line options for the wndex
utility
used to process index.wn
files.
-a
-- Serveall (mnemonic
"all").
Using the -a
option is equivalent to putting the
directive Attributes=serveall
at the beginning of the index.wn
file (or files)
being processed, with one important exception. If
wndex
is used with this option in a directory which
contains no index.wn
file, it will behave as if there were an index.wn
file containing
only the line Attributes=serveall
.
This allows the creation of an index.cache
file
permitting access to all files in a directory without first creating
an index.wn
file. This
can be used with the -r
option to grant access to a directory hierarchy without the need to
create an index.wn
file
in each directory.
-c file_name
-- Use
"file_name
" instead of "index.cache
".
Causes wndex
to write its output to the file
"file_name
" rather than the default
"index.cache
".
-d dir_name
-- Process
directory "dir_name
".
When wndex
is run with this option it will process the
directory "dir_name
" rather than the current
directory. It will use the index.wn
file in
"dir_name
" and write the file
"dir_name/index.cache
".
-i file_name
-- Use
"file_name
" instead of "index.wn
".
Causes wndex
to read its input from the file
"file_name
" rather than the default
"index.wn
".
-q
-- Use quiet mode.
Causes wndex
to suppress any warning messages (except
fatal errors) sent to UNIX stderr(3)
stream as well as other informational messages.
-r
-- Recursively process
subdirectories.
Causes wndex
to recursively descend through some
subdirectories and process them as well. By default it processes
only those subdirectories listed in the Subdirs= directory directive.
However, this can be changed to be either all subdirectories or all
subdirectories containing an index.wn
file by use of
the -s
option. If the
-r
and -a
are used together then the
default is to use all subdirectories.
-s all|index
-- Set the
value of the Subdirs=
directory directive automatically.
Using the "wndex -s index
" command with an index.wn
file containing
no Subdirs= directory
directive is equivalent to having such a directive whose value is a
list of all subdirectories which contain an index.wn
file. Using the
"wndex -s all
" command with an index.wn
file containing
no Subdirs= directory
directive is equivalent to having such a directive whose value is a
list of all subdirectories of the directory being processed. If the
-r
and -a
are used together then
"wndex -s all
" is automatically assumed. This
can be overridden by using a real Subdirs= directory directive
or using "wndex -s index
".
-v
-- Use verbose mode.
Causes wndex
to be extra verbose.
-V
-- Print version.
Causes wndex
to print its version and exit.
-x
-- Use UNIX stdout(3)
stream.
Causes wndex
to read from the UNIX stdin(3)
stream instead of the file index.wn
and write to the UNIX
stdout(3)
stream rather than the file index.cache
.
This is a list of the items which may be placed in an index.wn
file to be processed
by wndex
. This file
consists of a collection of records each of which consists of a
group of lines pertaining to single file. Each line of a record begins
with a directive like "Title=
" which indicates that the remainder
of that line is to be take as the title of the document whose record
contains this line. The "File=
"
directive is special in that it indicates the beginning of a new record.
The value of the "File=
" directive
is the name of the file whose record will follow. Letter case is not
significant in directive keywords.
When the character '#
' is encountered in an index.wn
file it is assumed to
be the start of a comment and everything after it on that line is
ignored. To include the '#
' character in, for example, a
document title, it must be escaped with the '\
' character.
That is. when "\#
" is encountered it does not signify a
comment and the character '#
' (without the backslash) is
treated as a normal character. In fact, since all directives contain the
character '=
', all lines which do not contain this character
are silently ignored. Also a single conceptual line of an index.wn
file can be spread
over several actual lines by ending all but the last line with the
'\
' character. That is, if a line ends with
'\
' that character is removed and the contents of the next
line is considered a continuation of the current line. The maximum
allowed length of a line (including continuation) is 1024 characters.
The maximum allowed length of all the records corresponding to one
document is 8192 characters.
The first record in an index.wn
file is special and
is intended to describe attributes of the entire directory rather than
individual files. It contains lines with directives specifying
attributes of the directory as a whole or all the files in it. The next
section is a complete list of these directory directives.
The line:
Accessfile=/dir/accessfile
specifies that the file /dir/accessfile
is to be used to
determine access privileges (by hostname or IP address) for this
directory. If this line is omitted access is allowed for everyone.
Both the path /dir/accessfile
and the path
~/dir/accessfile
are taken relative to the WN
root directory. In particular the accessfile must be in the
WN hierarchy (unlike includes or filters, for example.) If
the path does not begin with a '/
' or a '~
'
then it is relative to the directory containing the index.wn
file. See the
chapter "Limiting Access to Your WN
Hierarchy" in this guide.
Access-denied-URL
--
Set URL for requests for which access is denied.
The line:
Access-denied-URL=http://host/dir/foo.html
or the line:
Access-denied-URL=/dir/foo.html
specifies that any request for a document in this directory which is
denied because of an "Accessfile=
" restriction should be
redirected to the given URL. A default value for all directories can
be set by uncommenting the "#define ACCESS_DENIED_URL
"
line in config.h
and
recompiling. If you use this directive be sure that the file
foo.html
does not have restricted access or you can
create an infinite loop. This line has the special feature that it
can also be placed as the first line of the "Accessfile=
" controlling the
directory. A line in the accessfile will override any value set in
the index.wn
file.
Attributes
-- Set
directory attributes.
Currently there are only two directory attributes,
viz. "nosearch
" and "serveall
".
Letter case is not significant in the attribute value.
Attributes=serveall
Specifies that any file, with a few exceptions, in this
directory may be served not just those listed in the index.wn
file. The
server will attempt to set the content type correctly based on
the file name suffix using the same default correspondences
between type and suffix that wndex
uses. The
exceptions are that files whose name starts with
'.
' or ends with '~
' as well as the
files "index.wn
" and "index.cache
" will
not be served.
Note: When this directive is used in a directory protected by an "Accessfile=
" or a password file be sure that these files have names that start with '.
', or contain a '~
'. Or better, put these files in a different directory from which nothing is served.
Attributes=nosearch
Specifies that the index.cache
databases in the
current directory and its subdirectories should not be searched
when the server does a title, keyword or user supplied field search.
Likewise context and grep searches will not be allowed in
this directory. In this case when an attempt is made to do so an
error message is returned to the client. It is also possible to
exclude only some files from searching with the "Attributes=
" file directive.
Authorization-Module
,
Authorization-Realm
,
Authorization-Type
-- Specify authorization module.
Currently WN includes a "basic" authorization
module called
wnauth
. Its use is described in the chapter
"Limiting Access to Your
WN Hierarchy". Alternatively you can make your
own module to handle authorization. Data is passed via
standard input to this module. More specifically such a
module should expect to read Basic
user:password
on standard input for Basic
authentication, where "user" is the client supplied user
name and "password" the client supplied password. The
WN server expects this module to exit with status 0
if authorization is granted and with status 1 if access is
denied. See the source of wnauth in /wnauth/wnauth.c for a
detailed example.
For security reasons when you use an
"Authorization-Module=
" you are required to use either
the -t
or -T
options or the -a
or -A
options and to have
the index.cache
file in the protected directory owned by
the trusted user or group. This is to guard against counterfeit
authorization modules.
Auth-denied-file
-- Specify the name of an HTML file to be used as the error message
when an authentication attempt for a password protected directory
fails.
The line:
Auth-denied-file=~/dir/foo.html
specifies that any request for a document in this directory which is
denied because of an authorization module restriction results in the
file ~/dir/foo.html
being sent instead. A default value
for all directories can be set by uncommenting the "#define AUTH_DENIED_FILE
"
line in config.h
and
recompiling. Note that this is not a URL but the name of a file
whose content is to be sent as error text when authentication is
denied. If the file name starts with '~/
' as above it
is assumed to be relative to the WN root directory.
Otherwise it is assumed to be a path relative to the directory
containing the index.wn
file.
Cache-Module
--
Specify program to be used as interface to database for
index.cache
entries.
If this line specifies a program then instead of looking for file
entries in the index.cache
file this program is executed
after putting the base name of the URL in the environment variable WN_KEY
. This
provides a mechanism to use a real database rather than the file
index.cache
. Note that the directory directives are
still obtained from index.cache
. The output of this
module must be in the format of an index.cache
line. Title, keyword and grep are not supported since that would
require reading the entire database.
Default-Attributes
-- Specify the default value of file
attributes directive for every file served from this directory.
This directive should not be confused with the directory attributes directive.
The line:
Default-Attributes=parse,dynamic
specifies that files in this directory should be parsed and marked as dynamic documents unless they have an attributes directive specifying the contrary.
Default-CGI-Handler
-- Specify a default value for the "CGI-Handler=
" file directive.
The line:
Default-CGI-Handler=/dir/handler
specifies that files in this directory should all be treated as if
the "CGI-Handler=
" file
directive had been set to /dir/handler
. To
override this setting and specify no CGI handler use the
"CGI-Handler=<none>
" directive.
Default-Charset
-- Specify the default character set to be added to the MIME type
for documents with content type text/*
.
The line:
Default-Charset=iso-8859-1
specifies that documents in this directory which have a MIME
type of text/*
should have a character set parameter
with this value appended. E.g. a document of type text/html
will be sent with a content type header
Content-type: text/html; charset=iso-8859-1
The default value for this is iso-8859-1
.
Default-Content
--
Specify the default MIME
content type for items in this directory.
The line:
Default-content=text/html
specifies that files in this directory which do not end in a suffix
recognizable to wndex
should be given
the type "text/html
". Any legitimate MIME
type may be used as the value.
Default-Cookie
--
Specify the default cookie or cookie generating script for
items in this directory.
The line
or the lineDefault-Cookie=name=value
Default-Cookie=!my_cookie_script
specify that files in this directory with no
Set-Cookie
directive
should be treated as if they had such a directive with
the value name=value
or !my_cookie_script
.
Default-Document
-- Specify the default document for this directory.
The line:
Default-Document=foo.html
specifies that a URL pointing to this directory like
http://host/dir/
will result in serving the document
wnroot/dir/foo.html
instead of
wnroot/dir/index.html
. Uses of this include making the
default document a CGI program with
"Default-Document=foo.cgi
" or having a directory with
HTML files all ending with the suffix ".htm
" and using
the directive "Default-Document=foo.htm
". This
directive applies only to the directory containing the index.wn
file, not to any
subdirectories.
Default-Filter
--
Specify a default value for the "Filter=
" file directive.
The line:
Default-Filter=/path2/filter
specifies that files in this directory should all be treated as if
the "Filter=
" file directive
had been set to /path2/filter
. To override this setting
and specify no filter use the "Filter=<none>
" file
directive.
Default-Includes
--
Specify a default value for the "Includes=
" file directive.
The line:
Default-Includes=footer.html
specifies that this line should be used as the "Includes=
" directive for any
document in this directory which does not have an "Includes=
" directive explicitly set.
To override this default value simply specify an explicit "Includes=
" directive or use "Includes=<none>
" to have none.
Default-List-Includes
-- Specify a default value for the "List-Includes=
" file directive.
The line:
Default-List-Includes=header.html,footer.html,disclaimer.html
specifies that this line should be used as the "List-Includes=
" directive for any
document in this directory which does not have an "Includes=
", "Wrappers=
", or "List-Includes=
" directive
explicitly set. To override this default value simply specify an
explicit "List-Includes=
" directive or
use "List-Includes=<none>
" to
have none. Note that the example above grants permission for the
inclusion of the three files listed. It does not require their
insertion. However, it does cause all files in the current directory
to be parsed for includes unless this "Attributes=
" is overridden.
Default-Max-Age
--
Specify the default value for the "Max-Age=
" file directive.
The line:
Default-Max-Age=2 weeks
specifies the Cache-Control and Expires headers of all documents served from this directory should be set to expire the document 2 weeks after it is served.
The line:
Default-Max-Age=2 weeks after last-mod
specifies the Cache-Control and Expires headers of all documents
served from this directory should be set to expire the document 2
weeks after the last-modified date of the document. For more details
see the "Max-Age=
" file
directive.
Default-Wrappers
--
Specify a default value for the "Wrappers=
" file directive.
The line:
Default-Wrappers=wrapper.html
specifies that this line should be used as the "Wrappers=
" file directive for any
document in this directory which does not have a
Wrappers=
directive explicitly set. To override this
default value simply specify an explicit "Wrappers=
" directive or use
"Wrappers=<none>
" to
have none.
File-Module
--
Specify program to be used as interface to database for obtaining
files.
If this line specifies a program then instead of looking for a file
in the current directory this program is executed after putting the
base name of the URL in the environment variable WN_KEY
. The output of
this program is served as if it were a file. This provides a
mechanism to use a real database rather than the file
index.cache
.
If you wish the file module to have access to all the standard CGI environment variables then use the
directive "Default-Attributes=cgi
"
with the File-Module=
directive
Logtype
-- Set
the type of log entries to be used for this directory
matches.
As an example, the line:
Logtype=verbose
will cause logging to be done with WN's verbose log format
with no DNS hostnames only IP addresses in the log entries.
The base values possible for this directive are nolog,
common, verbose, ncsa, syslog
and
vsyslog
which respectively have the effect of
turning off logging or using the common log format, WN's
verbose logging, or the NCSA log format in the log file or
using the syslogd(8) daemon with a standard or verbose
format. See the "Managing Log
Files" section of this manual. The default value, used
when this directive is not present is set in the source file
config.h when the compile configure
script
is run.
Each base log type may be optionally followed by a colon and
either nodns
or revdns.
If neither
of these is present then the default server action is to do
a DNS lookup on the client's IP address to obtain the
hostname of the client for logging purposes. If the
:revdns
extension is present the server will
additionally do a reverse DNS lookup on this name as a check
against name spoofing. If the :nodns
extension
is present the server will do no DNS lookup and will use the
IP address in the log instead of the host name. For example
the line
Logtype=verbose:nodns
indicates that the server should use verbose
log format, but should use IP addresses rather than host
names in log entries. Obviously, use of :nodns
is more efficient and :revdns
is less efficient
than the default.
Nomatchsub
-- Set
substitute file for searches on this directory which result in no
matches.
The line:
Nomatchsub=foo.html
specifies that the HTML file foo.html
in the current
directory should be used for the output of all searches (title, keyword, context, grep, etc.) on this directory which
return no matches. It can only be used in conjunction with the
"Searchwrapper=
" file
directive. If Nomatchsub=
is used and a "Searchwrapper=
" has not been
defined an error is logged and the nomatchsub file is ignored. The
file foo.html
must be in the directory being searched
and its name must not contain a '/
'. See also "Nomatchsub=
" for files.
No-Such-File-URL
-- Set substitute URL for requests for non-existent or unservable
files.
The line:
No-Such-File-URL=http://host/dir/foo.html
or the line:
No-Such-File-URL=/dir/foo.html
specifies that any request in this directory for a non-existent file
or a file not listed in the index.wn
file of this
directory should be redirected to the given URL. A default value
for all directories and non-existent directories can be set by
uncommenting the "#define NO_SUCH_FILE_URL
"
line in config.h
and recompiling. The value set here will also be used if an
index.cache
file does not exist. If you use this
directive be sure that the file foo.html
does
exist or you can create an infinite loop.
Owner
-- Specify owner
of directory items.
This should be a line like:
Owner=mailto:maintainer@host
The "mailto:maintainer@host
" may be replaced with a URL
referring to the individual who is responsible for the documents in
this directory. This information is used in an HTTP header. It is
not possible to designate the owner of a single file in a file
directive. However, if the file is an HTML file this can be done
with a <link>
tag
in the header of that file.
Put-Authorization-Module
,
Put-Authorization-Realm
,
Put-Authorization-Type
-- Specify authorization module for PUT, MOVE and DELETE methods.
In order for the server to honor the PUT
,
MOVE
, and DELETE
methods of HTTP,
the server must be run with the -P
option and
in addition an authorization mechanism is required. This
can be the same mechanism used for access authorization,
or different. Currently WN includes a "basic"
authorization module called wnauth
. Its use is
described in the chapter "Limiting Access to Your
WN Hierarchy". Alternatively you can make your
own module to handle PUT authorization. Data is placed in
CGI environment variables.
WN expects this module to exit with status 0 if
authorization is granted and with status 1 if access is
denied.
For security reasons when you use a
"Put-Authorization-Module
" you are required to use either
the -t
or -T
options or the -a
or -A
options and to have
the index.cache
file in the protected directory owned by
the trusted user or group. This is to guard against counterfeit
authorization modules.
Finally in the directory where the
Put-Authorization-Module
directive is used the
"Default-Attributes=put
"
directive must be used (or in the case of a single file the
"Attributes=put
"
directive.)
Put-handler
-- Specify
a module to handle the PUT, MOVE and DELETE methods.
of directory items.
The line
Put-handler=~/dir/puth
tells the server to use the module ROOTDIR/dir/puth
to handle the HTTP methods PUT, MOVE
and
DELETE.
A sample put-handler called puth
is provided with this distribution. The precise specification
of what a put-handler should do is still evolving. The
examples/roam directory shows an example of its use for support
of the Netscape browser roaming feature.
Search-Module
--
Specify program to be used as a search engine.
This directive allows you to create your own search engine. It is invoked with a line like:
Search-Module=/full/path/to/searchmod
The program searchmod
should read the environment
variable QUERY_STRING
and return an HTML fragment. In the typical case the program returns
an unordered list of links to documents containing a match to the
query string. This list can be wrapped by including a "Searchwrapper=
" in the
directory record. If it is not, a default wrapper with text like
"Here are the matches for your search.
" is supplied.
To use this module you should have a form action which is something
like http://host/dir/search=index
. Two simple examples
of a search-module (written in perl) are included in the distribution
in the files bin/wnseven_m
and
bin/wnsectsearch
.
Searchwrapper
--
Set wrapper file for searches on this directory.
The line:
Searchwrapper=swrap.html
specifies that the HTML file swrap.html
in the current
directory should be used as a wrapper for the output of all searches
on this directory.
To specify a wrapper for searches on an individual file use the file
directive "Searchwrapper=
".
Subdirs
-- Specify
subdirectories for searching and recursive use of wndex
.
When you run the wndex
utility with the
-r
option (for
recursive), it must know in which subdirectories it should descend
to create a new index.cache
database file. Likewise
when the server does a title, keyword or user defined field search it
recursively descends the data hierarchy and must know for each
directory which subdirectories are part of the hierarchy.
The maintainer provides this information in a line like:
Subdirs=subdir1,subdir2,subdir3
in the directory directives giving a comma separated list of
subdirectories of the directory containing the current index.wn
file.
There are two special forms of the "Subdirs=
" directive.
Using:
Subdirs=<index>
is equivalent having a "Subdirs=
" directive whose value
is a list of all subdirectories which contain a file named
"index.wn
" (or the name specified with the -i
option to wndex
).
Using:
Subdirs=<all>
is equivalent having a "Subdirs=
" directive whose value
is a list of all subdirectories.
A collection of lines in the index.wn
file containing
information about a single file in the directory of the index.wn
file is called a file
record. A new file record begins with a line starting with "File=
" and ends with the start of a new file
record. Each line in a record begins with a file directive. Here is the
complete list:
Attributes
-- Set file
attributes.
Currently several possible attributes are possible including
imagemap
, nosearch
, parse
,
noparse
, post_only
, nopost
,
dynamic
, nondynamic
, cachable
,
no-keepalive
, non-cachable
, put
,
and cgi
.
Multiple values, separated by commas can be put on a single
"Attributes=
" line, as in
"Attributes=parse,dynamic,nosearch
". Letter case is not
significant in the attribute value. Also "Attribute=
"
(without the 's
') is synonymous with
"Attributes=
".
See also the directory "Attributes=
" directive.
Attributes=cachable
causes the server not to send the
"Pragma: no-cache
" and
"Cache-control: no-cache
" headers when it
otherwise might. For example these headers are sent by default
for CGI output. If you want the browser
"back" button to return users to a a CGI generated page after
they have followed a link you may need
"Attributes=cachable
" since otherwise the browser
may not even cache the page in memory. (See also "Attributes=non-cachable
".)
Attributes=cgi
indicates that the standard CGI
environment variables should be set up before processing this
request. This is may be useful if there is a "Filter=
" directive for this
document or if the document has a "Include=
" which is the output of
a program. In these cases the filter program or include program
can access the CGI environment variables. This line is not
necessary if the document it refers to is actually a CGI program since in that case this attribute
is automatically set. If the document is not actually a CGI program then the environment variable PATH_INFO
will
always be empty. This is because the server always interprets a
request without a ".cgi
" suffix or a
"cgi-bin
" directory in it as the longest possible
sequence of directories and a terminating file, i.e. a request
without PATH_INFO
.
Attributes=dynamic
indicates that the document may change each time it is sent.
This causes the server not to send headers with a content length
or a last modified date. It also will cause the server to ignore
any "If-Modified-Since
" date sent by the client and
always resend the document. It is not necessary to set
Attributes=dynamic
for CGI
programs as it is set by default for them. If you do not
wish this done for a CGI program then use
the directive "Attributes=nondynamic
".
Attributes=imagemap
Indicates that the file is an imagemap used to support clickable images.
Attributes=MD5
Indicates that wndex
should
calculate an MD5
digest or checksum for this file and store it in the
index.cache
file for use as in a
"Content-MD5
" header for this document. If the
document is subsequently modified you must re-run wndex
to
recalculate this digest value. If this is not done and the
document is newer than the calculated MD5 digest, the server will
omit the "Content-MD5
" header and log an error.
Attributes=no-keepalive
indicates that the server should immediately close the connection after fulfiling a request for this document and not honor a request from the client to use a persistent connection.
If this is used as a default attribute and you wish to allow a persistent connection for a specific document then use the line:
Attributes=keepalive
which will override the default.
Attributes=non-cachable
indicates that the server should send the
"Pragma: no-cache
" and
"Cache-control: no-cache
" headers attempting
to encourage clients and proxies not to cache this document. It
is not necessary to set this for CGI
programs or any document requiring authentication as it is
set by default for them. If you wish to allow the output of a
CGI program or authenticated document to
be cached then use the line:
Attributes=cachable
which will override this default. This may be necessary if you want the browser "back" button to return users to this document after they have followed a link, since otherwise the browser may not even cache the page in memory.
Attributes=noget
indicates that the file referenced by this directive may not be
accessed with the GET
method. This might be used
with a CGI program designed only to be accessed via
the POST
method.
Attributes=nondynamic
overrides the default CGI setting of "dynamic". If this is done
the "Last-Modified
" date header of the document
will be that of the program.
Attributes=noparse
indicates that the file referenced by this directive should not be parsed for server includes. This is used to override a default attributes setting to parse all documents. Also this might be done to improve efficiency when, for example, a document has a wrapper but nothing is included in it. Since it has a wrapper parsing will be turned on by default, but it is not necessary since nothing is actually included.
Attributes=nopost
indicates that the file referenced by this directive may not be
accessed with the POST
method. If the item referenced is an ordinary file this
directive is assumed and need not be set. For CGI programs, if this is set and an attempt
to POST
to the object is made by a client an error will be returned.
Attributes=nosearch
indicates that the file referenced by this directive should not be searched when the server does a context or grep search of the current directory.
Attributes=parse
indicates that the file referenced by this directive should be
parsed for conditional text or server-side includes. This line
is not necessary if there is also a "Wrappers=
" line or an
"Includes=
"
line since in that case the parse attribute is assumed. If you
do not wish a document to be parsed when it otherwise would be
the "Attribute=noparse
" can
be used.
Attributes=post_only
indicates that the file referenced by this directive may
only be accessed with the POST
method. If the item referenced is a CGI
program and an attempt is made to access it with the GET
method an error will be returned. This directive may useful for files
which are filtered or "include" an executed program. In that
case the POST
ed data will be in placed in a
temporary file. The name of the temporary file can be found by
using "Attributes=cgi
" which will
cause the name to be placed in the environment variable WN_POST_FILE
.
Attributes=put
indicates that the file referenced by this directive may be
accessed with the PUT
method. It must be handled by
your program. The PUT
data will be in placed in a
temporary file. The name of the temporary file can be found by
using "Attributes=cgi
" which will
cause the name to be placed in the environment variable WN_PUT_FILE
.
CGI-Handler
-- Specify the
CGI program with which a file is to be
processed.
The line:
CGI-Handler=/path/to/bar
causes the program "/path/to/bar
" to be run and
its output to be served in place of the document requested.
This is a way to designate a CGI
program to handle a file somewhat like a filter. The name of the program need
not be in the URL since it is in the index.wn
file.
So when http://host/path2/foo.html
is requested
this will cause the handler, say /path/to/bar
, to be
run with the CGI environment variable PATH_INFO
set to /path2/foo.html
. In normal use the
program /path/to/bar
will do something to the file
foo.html
and serve the output. It is useful if
you want a number of files in a directory to be handled by
the same CGI program. Note the file
foo.html
need not be used in any way by the
program, but it must exist or else the server will treat it
as a non-existent file.
There is no reason for the CGI-handler program
bar
to be in the served data hierarchy and it
is good security practice to have it be elsewhere in a place
where it cannot itself be served. This prevents its
execution with the CGI environment variable PATH_INFO
been set by the client. It should never be located
in a cgi-bin
directory. If handler name begins
with a '/
' the name is considered as a path
relative to the system root directory. If it begins with
'~/
' as in ~/dir/foo
it is assumed
to be relative to the WN root directory. Otherwise
it is assumed to be a path relative to the directory
containing the index.wn
file.
Charset
-- Specify the character set to be added to the MIME type
for this document (which should have content type text/*
).
The line:
Charset=iso-8859-1
specifies that this document, which should have a MIME
type of text/*
, should have a character set parameter
with this value appended to its content type.
E.g. a document of type text/html will be sent with a content type header
Content-type: text/html; charset=iso-8859-1
If no value is specified for this and the
Default-Charset directive is not used then the value
iso-8859-1
is used. Note, that if a
content-type file directive is also used, if must occur AFTER
the charset directive.
Content-encoding
--
Specify the content encoding for a file.
The line:
Content-encoding=x-gzip
specifies "x-gzip
" as the content encoding for the file
described by this record. Only two types of content encoding are
supported by common browsers. They are "x-gzip
" and
"x-compress
". They indicate that the file has been
compressed with the GNU gzip(1)
utility or the UNIX compress(1)
utility. The file is then sent by the server in the compressed
format and will be decompressed automatically by the browser, if it
supports this functionality.
In many cases this is unnecessary to specify this explicitly as the
wndex
program will
automatically assign the the content-encoding x-gzip
to
a file whose name ends with ".gz
" and the
content-encoding x-compress
to a file whose name ends in
".Z
". Supplying the value "none
" for the
"Content-encoding=
" will prevent the server from making
this automatic assignment.
Content-type
-- Specify
the MIME
content type for a file.
The line:
Content-type=audio/basic
specifies "audio/basic
" as the MIME
type for the file described by this record. In many cases this is
unnecessary as the wndex
program will
automatically assign the MIME
type if the file name ends in a suffix listed in the file
lib/mime.types
with a corresponding type. If this line
is supplied it will override the default value of the content type
determined by the suffix.
The mime.types
file should be installed in a known
location. The default location is in the WN
src
hierarchy, but this can be changed by specifying a
different value when the configure
program is
run or by editing the value of "#define MIME_TYPES_FILE
"
in config.h
. The
mime.types
file exists so that you can add to it if you
wish to add new kinds of documents to your server. The format of the
file is explained in the file. A default version of the file is in
lib/mime.types
. The internal defaults are the same as
what is currently in this file. The mime.types
file is
read whenever wndex
is run so wndex
always knows the latest additions. This file is also read by
wnsd
(but not wnd
) on startup for use with
directories with the "Attributes=serveall
".
The wnsd
stand-alone server reads this file when it is
started or restarted, but only takes note of new suffixes and their
mime types
. You cannot change the mime type
corresponding to one of the standard suffixes (as listed in the
default mime.types
file). To do that you need to change
the server source and recompile.
Expires
-- Specify the
expiration date of a document or file.
The line:
Expires=Mon, 01 Sep 1997 14:11:01 GMT
specifies date and time which a document expires. Current practice
is to use the format specified by RFC
850 and illustrated above. In particular, GMT should be used.
More information about HTTP date formats can be found at RFC
1123. For HTML documents the this information is automatically
extracted from the document by wndex
. This requires a
"<meta>
"
line in the head of the HTML document like:
<meta http-equiv="Expires" content="Tue, 10 Oct 1994 14:11:01 GMT">
If the "Expires=
" directive is also supplied in the index.wn
file it will
override the expiration date in the document. See also the "Max-age=
" file directive.
Field#n
-- Specify a user
supplied field associated with a file.
The line:
Field3=string
specifies "string
" user supplied field 3
associated with the current document. These are used for field searches. The digit
3
can be replaced with any other single digit allowing
a total of 10 user supplied fields.
File
-- File name.
The line:
File=foo
begins a new file record for the file foo
. It
indicates that permission is granted for this file to be
served. Other file directive lines will apply to
this file until a new file record or text segment is started or the
end of the index.wn
file is reached. The presence of this line causes an entry for this
file to be written in the index.cache
file created by
wndex
.
Filter
-- Specify the filter
with which a file is to be postprocessed.
The line:
Filter=/dir/foo
causes the contents of the file whose record contains this line to be
used as the UNIX stdin(3)
stream of the program foo
and the the UNIX stdout(3)
stream of that program to be sent to the client instead of the file
itself. A common use of this is to specify a decompressing program
like the UNIX zcat(1)
utility as the filter so that a compressed version of a file can be
stored on disk and then be decompressed on the fly before being sent
to the client. Another example would be
"Filter=/usr/bin/nroff -man
" which would convert a
UNIX nroff(1)
utility to convert a man(1)
page to an ASCII text document on the fly.
If a listed file name begins with a '/
' the name is
considered as a path relative to the system root directory. If it
begins with '~/
' as in '~/dir/foo
' it is
assumed to be relative to the WN root directory. Otherwise
it is assumed to be a path relative to the directory containing the
index.wn
file.
Header
-- Add a line to the HTTP/1.1 header for this
document.
The line:
Header=[some legal HTTP header]
causes the line "[some legal HTTP header]
" to be added
to the HTTP/1.1 header
for this item. This directive can be used multiple times to add
multiple lines to the header.
Note: Don't do this unless you know what you are doing!
HTTP-Status
-- Return a
given HTTP/1.1 status
value.
The line:
HTTP-Status=404 Not Found
causes the response line of the HTTP/1.1 header to be
"HTTP/1.1 404 Not Found
". This is primarily
of use when redirecting requests for non-existent files to an error
message which should be returned with status 404
so
robots understand.
Note: Don't do this unless you know what you are doing!
Includes
-- Specify the files
to be included in a text document.
The line:
Includes=file1,file2,file3
causes the file whose record contains this line to be parsed for
lines like "<!-- #include -->
". When
such a marker is found one of the files listed with the
"Includes=
" file directive is inserted. Subsequent
occurrences of the marker cause the inclusion of subsequent files in
the order in which they occur in this directive.
If a listed file name begins with a '/
' the name is
considered as a path relative to the system root directory. If it
begins with '~/
' as in "~/dir/foo
" it is
assumed to be relative to the WN root directory. Otherwise
it is assumed to be a path relative to the directory containing the
index.wn
file. See
the section of the user guide on includes and
wrappers for more information.
Keywords
-- Specify the
keywords associated with a document or file.
The line:
Keywords=pink, elephant, HTTP
specifies a list of keywords associated with the current document.
These are used for keyword
searches. For HTML documents the keywords are automatically
extracted from the document by wndex
. This requires
a <meta>
line in the head of the HTML document like:
<meta http-equiv="Keywords" content="pink, elephant, HTTP">
If the "Keywords=
" file directive is also supplied in
the index.wn
file it
will override the keywords in the document.
List-Includes
-- Specify
files which may be included in a text document.
The line:
List-Includes=file1,file2,file3
causes the file whose record contains this line to be parsed for
lines like
'<!-- #include "file2" -->
'. When
such a marker is found the contents of file2
is
inserted. The order of the files listed in the directive is not
significant. Note that the example above grants permission for the
inclusion of the three files listed. It does not require their
insertion.
If a listed file name begins with a '/
' the name is
considered as a path relative to the system root directory. If it
begins with '~/
' as in "~/dir/foo
" it is
assumed to be relative to the WN root directory. Otherwise
it is assumed to be a path relative to the directory containing the
index.wn
file. See
the section of the user guide on includes and
wrappers for more information.
Max-Age
-- Specify the HTTP/1.1
Cache-Control
and Expires
headers for an
entry.
The line:
Max-Age=10 days
specifies that a HTTP/1.1
Cache-Control
header should be sent to expire the
document in the specified time. If no "Expires=
" file directive has been set
elsewhere in the index.wn
file or in the
file itself, if it is an HTML file, then the HTTP/1.1
Expires
header will also be sent with a value equal to
the current time plus the time period of the HTTP/1.1
Max-Age
header. The time period in the
"Max-Age=
" file directive can be specified in units of
seconds, minutes, hours, days or weeks, but more than one unit (as in
2 weeks and 3 days) is not allowed.
The line:
Max-Age=10 days after last-mod
specifies that a HTTP/1.1
Cache-Control
header and the Expires
header
(if none is set elsewhere) should be set to expire the document in
the specified amount of time after the last-modified
date of the document. Negative time values for the
Cache-Control
header will be ignored, but
Expires
headers with dates in the past will be used.
Nomatchsub
-- Set
substitute file for searches on this file which result in no matches.
The line:
Nomatchsub=foo.html
specifies that the HTML file foo.html
in the current
directory should be used for the output of all searches on this file which return no matches.
It can only be used in conjunction with the "Searchwrapper=
" file directive.
See also "Nomatchsub=
"
for directories.
Redirect
-- Send an HTTP/1.1 redirect to a new
URL.
The lines:
File=foo
Redirect=http://host/path/bar
cause a request for foo
to be answered with an HTTP/1.1 redirect response.
The client will then automatically request the new URL. The file
foo
need not exist.
The redirection always sends a HTTP/1.1
"301 Moved Permanently
" status header followed
by a "Location:
" header whose value is
"http://host/path/bar
". This means that the value of a
"Redirect=
" file directive should always be a complete
URL, starting with "http://
" or "ftp://
"
etc. The one exception is that you may use
"Redirect=<null>
". This causes the server to send
a status 204
"no response" which tells the client to do
nothing and leave the display alone. The page won't be reloaded and
won't change.
Refresh
-- Set a "Refresh"
header for use with "client-pull".
The line:
Refresh=60
adds an HTTP/1.1 header at the beginning of the transmission of this document. If the client receiving this header supports "client-pull" (currently only Netscape browsers support this) then it will automatically reload the document after 60 seconds. This is useful for documents that are updated very frequently, a stock ticker, for example. If the directive:
Refresh=30; URL=http://host/path/foo
is used then after 30 seconds the URL
http://host/path/foo
is loaded. This can be used to
create an automatic slide show. The Refresh
header is
not part of an HTTP/1.1
standard and hence may evolve. If it does this directive will be
subject to change!
Searchwrapper
-- Set
wrapper file for searches on this file.
The line:
Searchwrapper=swrap.html
specifies that the HTML file swrap.html
in the current
directory should be used as a search wrapper for the output
of all searches on this file.
To specify a wrapper for all searches on a directory use the
directory directive "Searchwrapper=
".
Set-Cookie
-- Set a "Cookie"
header value.
The lines:
Set-Cookie=name1=opaque1
Set-Cookie=name=xxx; Expires=Wed, 19 Jan 2000 08:49:37 GMT
add an HTTP/1.1 header
at the beginning of the transmission of this document. If the
client receiving this header supports cookie caching (currently only
Netscape
browsers browsers support this) then it will save the name=value
pairs and include them in the request headers when documents in the
same directory or sub-directories are accessed. The server will put
the name=value pairs in the CGI environment variable HTTP_COOKIE
for
access by CGI programs. This is useful for
"shopping basket" type applications.
If the value of this directive begins with an '!', as in
Set-Cookie=!my_cookie_script
then it will be interpreted as the name of a program to be run to generate the value of the cookie. The program should not generate the "Set-Cookie:" part of this header, just the value. As elsewhere if the program name starts with '/' it will be taken relative to system root; if it begins with '~' it will be taken relative to the document root; and otherwise it will be assumed to be in the directory where this directive is found. The standard CGI environment variables will be available to the program.
Normally the client will discard the cookie at the end of a session.
However, if an Expires
parameter like the one above is
provided the cookie will be saved between sessions and only
discarded when it expires.
More information about the HTTP/1.1
Set-Cookie
header is available at http://home.netscape.com/newsref/std/cookie_spec.html.
Title
-- Specify the title of a
document or file.
The line:
Title=This is the title
specifies the text "This is the title
" as the title of
the file. If the file is an HTML document this is not necessary as
wndex
will attempt
to read the title from the document itself. If this line is
supplied anyway it will override the title in the document. If this
line is not supplied and the file is not an HTML document the
default title "File <filename>
" is used.
Wrappers
-- Specify the files
to be included in a text document.
The line:
Wrappers=file1
causes "file1
" to be parsed for lines like
"<!-- #include -->
". When such a
marker is found the file whose record contains this line is inserted
and the combined document is sent to the client. It is possible to
list multiple files in this directive. The semantics of this are
explained in the section of the user guide on server-side includes and wrappers.
If a listed file name begins with a '/
' the name is
considered as a path relative to the system root directory. If it
begins with '~/
' as in "~/dir/foo
" it is
assumed to be relative to the WN root directory. Otherwise
it is assumed to be a path relative to the directory containing the
index.wn
file. See
the section of the user guide on includes and
wrappers for more information.
This is a list of all parsing instructions recognized by WN
while parsing an HTML document. Note that only documents with MIME type
text/html
can be parsed for the purposes described here.
All parsing instructions use one of the two equivalent forms:
<!-- #something -->
or:
<?WN something>
There is a maximum allowed size of 2K bytes for the entire
"<!-- #something -->
" expression. Current
versions of WN no longer require this expression to be on a line
by itself.
The second form is considered more SGML/XML friendly by many as
"<?WN something >
" indicates a processing
instruction specific to WN rather than a comment. For
historical reasons this manual describes the other form, but either may
be used. With the first form the '#
' is required but with
the other you may use either:
<?WN #something>
or:
<?WN something>
Also "<?wn #something>
" is fine. The case of the
WN
is not significant.
#if
and
#elif
This section describes the use of conditionally included text of the form:
<!-- #if some_condition -->
Some conditional text goes here.
<!-- #elif another_condition -->
Some other conditional text goes here.
<!-- #else -->
Alternate text.
<!-- #endif -->
Which will insert the first conditional text only if
some_condition
is satisfied. The
"<!-- #elif another_condition -->
" and
"<!-- #else >
" are optional. There may be
multiple "#elif
" lines.
In all the examples below the use of the equal-tilde string
'=~
' to indicate a matching regular expression can be
replaced with the two characters '!~
' in which case the if
clause will be true when the regular expression fails to match.
Also in the examples of the form
"<-- #if accept file="foo" -->
"
the file foo
is assumed to be relative to the current
directory unless it begins with a '/
' in which case it is
taken relative to the WN data hierarchy root. The format of
these files is a list of grep(1)
like regular expressions,
one per line with any white space being taken as part of the expression.
Lines beginning with '#
' are taken to be comments. If a
regular expression is preceded with the character '!
' then
that character is skipped but the truth value of any matches with the
expression is reversed.
The regular expressions recognized by the WN server are the same
as those of the UNIX grep(1)
utility (though this utility is not used as the server has its own
regular expression functions). The more general regular expressions used
for example in the UNIX egrep(1)
utility are not supported by WN.
The condition in the "#if
" or "#elif
" tags can
be made more complex than those described above by combining simple
conditions using the logical operations '&&
' for
'and
', '||
' for 'or
' and
'!
' for 'not
'. Parentheses may be used for
grouping. For example:
<!-- #if cond_1 && cond_2 -->
Text to show if cond_1 and cond_2 are satisfied.
<!-- #endif -->
Other examples are:
<!-- #if cond_1 || cond_2 -->
<!-- #if !cond_1 -->
<!-- #if (cond_1 || cond_2) && !cond_3 -->
The '&&
' and '||
' operations have equal
precedence and associate from right to left.
#if
and
#elif
Conditions#if accept
-- Match
client's Accept
headers
The lines:
<!-- #if accept =~ "regexp" -->
or:
<!-- #if accept file = "foo" -->
specify that this text segment should be served if (in the first
case) the UNIX grep(1)
utility like regular expression "regexp
" matches any of
the Accept
headers supplied by the client. Or for the
second line if the file "foo
" contains a regular
expression matching any of the Accept
headers.
#if accept_charset
-- Match
client's Accept-Charset
headers
The lines:
<!-- #if accept_charset =~ "regexp" -->
or:
<!-- #if accept_charset file = "foo" -->
specify that this text segment should be served if (in the first
case) the UNIX grep(1)
utility like regular expression "regexp
" matches any of
the Accept-Charset
headers supplied by the client. Or for the
second line if the file "foo
" contains a regular
expression matching any of the Accept-Charset
headers.
#if accept_encoding
-- Match
client's Accept-Encoding
headers
The lines:
<!-- #if accept_encoding =~ "regexp" -->
or:
<!-- #if accept_encoding file = "foo" -->
specify that this text segment should be served if (in the first
case) the UNIX grep(1)
utility like regular expression "regexp
" matches any of
the Accept-Encoding
headers supplied by the client. Or for the
second line if the file "foo
" contains a regular
expression matching any of the Accept-Encoding
headers.
#if accept_language
-- Match client's Accept-Language
headers
The lines:
<!-- #if accept_language =~ "regexp" -->
or:
<!-- #if accept_language file = "foo" -->
specify that this text segment should be served if (in the first
case) the UNIX grep(1)
utility like regular expression "regexp
" matches any of
the Accept-Language
headers supplied by the client. Or
for the second line if the file "foo
" contains a regular
expression matching any of the Accept-Language
headers.
#if after
and
#if before
-- Select text based on date
The lines:
<!-- #if after "date" -->
or:
<!-- #if before "date" -->
specify that this text segment should be served if the current time is after (or before) the specified date. That is, the line:
<!-- #if after "22 Oct 1996 17:41:26" -->
will cause the text segment to be served only after
"22 Oct 1996 17:41:26
" local time. The
date format is rather rigid. It must be in precisely the format
shown above (specified by RFC
1123) and with a single space between each field. Only local
time of the server is supported.
#if cookie
--
Match client's Cookie
headers
The lines:
<!-- #if cookie =~ "regexp" -->
or:
<!-- #if cookie file= "foo" -->
specifies that this text segment should be served if the UNIX
grep(1)
utility like regular expression regexp
matches any of
the Cookie
headers supplied by the client.
More information about the proposed HTTP Set-Cookie
header is available at http://home.netscape.com/newsref/std/cookie_spec.html.
#if environ VAR
--
Match client's environment variable VAR
The lines:
<!-- #if environ VAR =~ "regexp" -->
or:
<!-- #if environ VAR; file= "foo" -->
specifies that this text segment should be served if the UNIX
grep(1)
utility like regular expression regexp
matches any of
the the contents of the server's environment variable VAR
.
#if field
--
Match document's user defined field
The lines:
<!-- #if field3 =~ "regexp" -->
or:
<!-- #if field3 file= "foo" -->
specify that this text segment should be served if the UNIX grep(1)
utility like regular expression "regexp
" matches the
contents of the user defined field number 3
(in the first case) or if the file "foo
" contains a
matching regular expression (in the second) case. Any valid field
number may be used in place of 3.
#if hostname
-- Match client's hostname
The lines:
<!-- #if hostname =~ "regexp" -->
or:
<!-- #if hostname file= "foo" -->
specify that this text segment should be served if the UNIX grep(1)
utility like regular expression "regexp
" matches the
hostname of the client (in the first case) or if the file
"foo
" contains a matching regular expression (in the
second) case. For an alternate method of doing this see the "#if accessfile
" syntax
described below.
Be aware that the character '.
' (dot) has a special
meaning in regular expressions and must be escaped with a
'\
' to have its usual meaning.
#if host_header
-- Match server's virtual hostname from client's HTTP Host header
The line:
<!-- #if host_header =~ "regexp" -->
specify that this text segment should be served if the UNIX grep(1)
utility like regular expression "regexp
" matches the
contents of the HTTP "Host:
" header supplied by the
client in its request.
Be aware that the character '.
' (dot) has a special
meaning in regular expressions and must be escaped with a
'\
' to have its usual meaning.
#if IP
-- Match
client's IP address
The lines:
<!-- #if IP =~ "regexp" -->
or:
<!-- #if IP file= "foo" -->
specify that this text segment should be served if the UNIX grep(1)
utility like regular expression "regexp
" matches the IP
address of the client (in the first case) or if the file
"foo
" contains a matching regular expression (in the
second case). For an alternate method of doing this see the "#if accessfile
" syntax
described below.
Be aware that the character '.
' (dot) has a special
meaning in regular expressions and must be escaped with a
'\
' to have its usual meaning.
#if language
-- Match client's Accept-Language
headers
The lines:
<!-- #if language =~ "regexp" -->
or:
<!-- #if language file = "foo" -->
specify that this text segment should be served if (in the first
case) the UNIX grep(1)
utility like regular expression "regexp
" matches any of
the Accept-Language
headers supplied by the client. Or
for the second line if the file "foo
" contains a regular
expression matching any of the Accept-Language
headers.
These forms are deprecated. The preferred form is to use
#if accept_language
in place of #if language.
#if query
--
Match query string supplied in request URL
The lines:
<!-- #if query =~ "regexp" -->
or:
<!-- #if query file = "foo" -->
specifies that this text segment should be served if the UNIX
grep(1)
utility like regular expression "regexp
" matches the
query string supplied by the client in the URL (in the first case) or
if the file "foo
" contains a matching regular expression
(in the second case).
#if referer
--
Match client supplied Referer:
header
The lines:
<!-- #if referer =~ "regexp" -->
or:
<!-- #if referer file = "foo" -->
specify that this text segment should be served if the UNIX grep(1)
utility like regular expression "regexp
" matches the
contents of the Referer:
header supplied by the client
or if the file "foo
" contains a matching regular
expression (in the second case). The Referer:
header
contains the URL of the document containing the link accessed to
obtain the current document.
#if request
--
Match client's request
The lines:
<!-- #if request =~ "regexp" -->
or:
<!-- #if request file = "foo" -->
specify that this text segment should be served if the UNIX grep(1)
utility like regular expression "regexp
" matches the
contents of the full text of the request supplied by the client (in
the first case) or if the file "foo
" contains a matching
regular expression (in the second case). The full request contains
the "method" (GET
or POST
) followed by the
URL requested with the "http://host
" part having been
removed (by the client).
#if TE
-- Match
client's TE
header
The lines:
<!-- #if TE =~ "regexp" -->
or:
<!-- #if TE file = "foo" -->
specify that this text segment should be served if (in the first
case) the UNIX grep(1)
utility like regular expression "regexp
" matches
the TE
header supplied by the client. Or for the
second line if the file "foo
" contains a regular
expression matching the TE
header. The TE
header specifies the transfer encoding
.
#if true
and
#if false
-- Include or exclude text segment
The line:
<!-- #if false -->
specifies that the corresponding text segment should not be served.
It may be useful for "commenting out" a part of a document which is
under construction. The "#if true
" construct is
present for logical completeness.
#if UA
-- Match
client's User-Agent:
header
The lines:
<!-- #if UA =~ "regexp" -->
or:
<!-- #if UA file = "foo" -->
specifies that this text segment should be served if the UNIX
grep(1)
utility like regular expression "regexp
" matches the
User-Agent:
header supplied by the client (in the first
case) or if the file "foo
" contains a matching regular
expression (in the second case).
The normal access control files used by WN to limit access to a directory can also be used to conditionally permit or deny access to text segments.
#if accessfile="filename"
-- Check access control file
The line:
<!-- #if accessfile="/dir/accessfile" -->
specifies that the file /dir/accessfile
is to be used to
determine access privileges (by hostname or IP address) for this text
segment. The path /dir/accessfile
is relative to the
server root directory. If this path does not begin with a
'/
' then the path is relative to the directory
containing the file with this text. See the chapter "Limiting Access to Your WN Hierarchy"
in this guide.
#include
-- Insert the
contents of a file
The line:
<!-- #include -->
specifies that the contents of next file listed in the includes or wrappers should be inserted at this point. It is permissible to add the name of this file, as in:
<!-- #include foo.txt -->
but this acts only as a comment. The actual file inserted depends
only on the "Includes=
" and "Wrappers=
" directives in
the index.wn
file (or
more precisely the index.cache
file created from it).
#section
-- Insert part of
the contents of a file
The line:
<!-- #section -->
specifies that part of the contents of next file listed in the includes or wrappers should be inserted at this point. It is permissible to add the name of this file, as in:
<!-- #section foo.txt -->
but this acts only as a comment. The actual file inserted depends
only on the "Includes=
" and "Wrappers=
" directives in
the index.wn
file (or
more precisely the index.cache
file created from it).
The part of the file actually included is that portion of the
document between the special comments
"<!-- #start -->
" and
"<!-- #end -->
" inserted in that
document. This requires that these starting and ending comments
occur in the HTML document on lines by themselves. For more
information see the section "More on
Including: the section
Marker" in this guide.
#start
and #end
--
Mark the beginning and end of text to be included
The lines:
<!-- #start -->
and:
<!-- #end -->
mark the beginning and end of the portion of the text to be inserted
from an include or wrapper in response to encountering
"<!-- #section -->
" in the text of a
document being parsed. There can be more than one
"#start
/#end
" pair in a document. For more
information see the section "More on
Including: the section
Marker" in this guide.
#title
, #query
, and
#field
-- Insert the title, current search string, or
a user defined field
The lines:
<!-- #title -->
<!-- #query -->
or:
<!-- #field 3 -->
in a parsed document instruct the server to include the title of the
current document, the current search term from the client or the
value of user defined "field #3
" for the current
document. All of these markers must occur on a line by themselves.
For more information see the section "Including Title, Query, Fields and
Environment Variables" in this guide.
#environ
-- Insert the
contents of an environment variable
The lines:
<!-- #environ = "WHATEVER" -->
in a parsed document instructs the server to include the contents of
the environment variable WHATEVER
. Remember to use an
"Attributes=parse
"
line when using this construct and to use an "Attributes=cgi
"
when it is a CGI variable like HTTP_REFERER
which is to be included.
#redirect
-- Redirect to a
different URL
The line:
<!-- #redirect = "url" -->
specifies that if no text has yet been sent the server should send an HTTP redirect to the given URL. This might be used as follows. If the text:
<!-- #if hostname =~ "\.uk$" -->
<!-- #redirect = "UK_mirror_url" -->
<!-- #endif -->
is included at the beginning of an HTML document then any request
from a uk
host will automatically be redirected to the
specified URL, the UK_mirror_url
in this case. This
mechanism could also be used to redirect text only browsers to a text
only alternative page, etc.
There must be no text sent before the
'<!-- #redirect = "url" -->
'
is encountered (not even blank lines) since the server cannot send an
HTTP redirect while in the middle of transmitting a document. Thus
the example above would be an error if there are any blank lines
before the "#if hostname
" line or any blank lines
after it before the "#redirect
" line. When such an
error occurs it is logged in the error
file and the "#redirect
" line is ignored.
Note however that:
<!-- #if hostname =~ "\.uk$" -->
[Lots of text here]
<!-- #else -->
<!-- #redirect = "some_URL" -->
<!-- #endif -->
is correct since when the #redirect
line is encountered
no text has been sent.
Normally the URL in the
"<!-- #redirect = "URL" -->
"
line is fully qualified, like "http://host/path/foo
".
However, it can also be simply "foo
" referring to a file
in the same directory as the file being parsed. In this case an HTTP
redirection is not sent, and instead the file "foo
" is
returned immediately to the client.
This is a list of environment variables set by WN. These
variables are set automatically when a CGI program is called. They will
also be set for use by filters or server-side includes which executes files if an
"Attribute=cgi
" line
is added to the file record of a document in the index.wn
file.
This is a list of standard CGI/1.1 environment variables set by WN.
AUTH_TYPE
Contains the type of authentication being used to limit access to
the current document. For example, "Basic
" or
"Digest
".
CONTENT_LENGTH
Contains the length of information provided by a client POST
.
The length is an ASCII string representing the number of bytes.
CONTENT_TYPE
Contains the content type supplied by the client with POST
data.
GATEWAY_INTERFACE
Contains "CGI/1.1
".
HTTP_ACCEPT
Contains the contents of any "Accept:
" headers supplied
by the client.
HTTP_ACCEPT_CHARSET
Contains the contents of any "Accept-Charset:
" headers
supplied by the client.
HTTP_ACCEPT_LANGUAGE
Contains the contents of any "Accept-Language:
" headers
supplied by the client.
HTTP_COOKIE
Contains the contents of any "Cookie:
" header supplied
by the client.
HTTP_FROM
Contains the contents of the "From:
" header supplied by
the client. This may contain the e-mail address of the client user.
This is generally unreliable, as usually users choose not to supply
this information. If they do give it they can choose any e-mail
address they want -- there is no guarantee that this is, in fact,
the real e-mail address of the client user.
HTTP_HOST
Contains the contents of the "Host:
" header supplied by
the client. This should contain the one of the aliases for the host
on which the server is running. It should be the hostname from the
URL that the client is requesting. Thus a client seeking
"http://www.serverhost.com:8000/foo.html
" should supply
"www.serverhost.com
" in this header. Many browsers do
not do this. It is required in HTTP/1.1.
HTTP_RANGE
Contains the contents of any "Range:
" header supplied
by the client.
HTTP_REFERER
Contains the contents of the "Referer:
" header supplied
by the client. This is supposed to contain the URL of the document
from which a link to this program activated.
HTTP_USER_AGENT
Contains the contents of the "User-Agent:
" header
supplied by the client. This is supposed to contain the name of the
browser which the client is using to make this request
PATH_INFO
Extra information in the "path" of a CGI URL. More precisely
everything after the actual name of the program. For example, the
URL "http://host/dir/foo.cgi/stuff1/stuff2/stuff3
"
would have "/stuff1/stuff2/stuff3
" in the
PATH_INFO
variable.
PATH_TRANSLATED
This assumes that PATH_INFO
is the path of a file relative to your WN root and provides
the path relative to the system root. For example, the URL
"http://host/dir/foo.cgi/stuff1/stuff2/stuff3
" would
have "wnroot/stuff1/stuff2/stuff3
" in the
PATH_TRANSLATED
variable.
QUERY_STRING
The contents of the "query" part of an HTTP/1.1 GET
request. This is whatever comes after the '?
' in the
URL. For example, the URL
"http://host/dir/foo.cgi/stuff1/stuff2/stuff3?dingbat
"
would have "dingbat
" in the QUERY_STRING
variable. If the request was generated by an HTML "<form>
"
this contains all the form information encoded as
"name1=value1&name2=value2...
".
REMOTE_ADDR
Contains the IP address of the client or proxy making the request.
REMOTE_HOST
Contains the hostname of the client or proxy making the request, if
the server can determine it. Otherwise it contains the IP address.
If "#define NO_DNS_HOSTNAMES
"
is defined in config.h
before compilation then no attempt will be made to determine the
hostname and REMOTE_HOST
will contain the IP address.
(This would reduce the load on the server.)
REMOTE_IDENT
Contains the user name supplied by a RFC
931 identd(8)
server on the client (if there is one and you have compiled the
server with "#define RFC931_TIMEOUT
"
macro enabled in the config.h
file).
REQUEST_METHOD
Contains either "GET
", "POST
" or
"PUT
", depending on the method of the client request.
REMOTE_USER
If the CGI program is password protected this will contain the user name provided by the client.
SCRIPT_FILENAME
The name of the CGI program being executed and its path relative to
the system root. For example, the URL
"http://host/dir/foo.cgi/stuff1/stuff2/stuff3
" would
have "wnroot/dir/foo.cgi
" in the
SCRIPT_FILENAME
variable.
SCRIPT_NAME
The name of the CGI program being executed and its path relative to
the WN root data directory. For example, the URL
"http://host/dir/foo.cgi/stuff1/stuff2/stuff3
" would
have "/dir/foo.cgi
" in the SCRIPT_NAME
variable.
SERVER_NAME
Contains the name of the "virtual host" currently being accessed, if
virtual hosting is in use. Otherwise it
contains the name of host on which server is running. The value of
this variable will be identical to the value of HTTP_HOST
if the client
supports the "Host header:
". Otherwise it will be
determined by the IP address to which the request was made and the
corresponding entry in the virtual hosts table.
SERVER_PORT
Contains port on which server is running.
SERVER_PROTOCOL
Contains "HTTP/0.9
", "HTTP/1.0
" or
"HTTP/1.1
" depending on which protocol the client is
using.
SERVER_SOFTWARE
Contains "WN <version>
". For example,
"WN 2.0.0
". Recall that version numbers of the
form "a.b.c
" with 'b
' even are intended to
be stable releases while 'b
' odd indicates a release
with new and less tested features.
The following are not standard CGI/1.1 environment variables but are provided by the WN server when CGI variables are set:
DOCUMENT_ROOT
Contains the path to your WN root data directory relative to
the system root. For example, the URL
"http://host/dir/foo.cgi/stuff1/stuff2/stuff3
" would
have "wnroot
" in the DOCUMENT_ROOT
variable.
HTTP_POST_FILE
Contains the path to the temporary file containing the data sent by
the client via the POST
method. This file is removed after the CGI program terminates. A
CGI program can move it or copy it.
HTTP_PUT_FILE
Contains the path to the temporary file containing the data sent by
the client via the PUT
method. This file is removed
after the CGI program terminates. A CGI program can move it or copy
it.
HTTP_VIA
Contains the contents of any "Via" header supplied by the client.
HTTP_X_FORWARDED_FOR
Contains the contents of any "X-Forwarded-For" header supplied by the client.
URL_SCHEME
Contains "http
" normally or "https
" in
case the server has been modified to use the Secure Sockets
Layer (SSL) protocol.
WN_DIR_PATH
Contains the path to the directory containing the program relative to
the system root. For example, the URL
"http://host/dir/foo.cgi/stuff1/stuff2/stuff3
" would
have "wnroot/dir
" in the WN_DIR_PATH
variable.
WN_ROOT
Deprecated. See DOCUMENT_ROOT
.
One further environment variable is used by WN in conjunction with external databases.
WN_KEY
WN_KEY
is used in conjunction with external databases
and either "Cache-Module=
" or
"File-Module=
"
directory directives. For more details on these see the chapter "WN Auxiliary Modules" in this guide.
This document is for the ordinary user, not the system "webmaster". It is intended as a short description of how to use the WN server to set up your home page (and related documents if you wish). It is not a primer on writing HTML. There are many of those -- some are listed below. Instead it deals with what to do with a home page document after it is created to put it "on the Web". We assume that your system administrator has already set up the WN server in such a way that users are permitted to have home pages in their own directories. If that isn't the case you will have to persuade your local administrator to do so or read the full "User's Guide for the WN Server" to learn how to do it yourself.
But let's be optimistic and suppose everything is set up and working properly. If you haven't created a home page here is a bare bones one you can copy to expand on later:
<!doctype html public "-//W3C//DTD HTML 3.2 Final//EN">
<head>
<title>John Doe's Home Page</title>
</head>
<body>
<h1>John Doe's Home Page</h1>
<p>
This is the story of my life...
</p>
</body>
Your home page and any other files you want to serve must be located in
a particular subdirectory of your home directory (your home directory is
where you are when you first login) on the UNIX system which is running
the WN server. Usually this subdirectory is called
public_html
, but your system webmaster may have chosen
something different. We'll assume the name is public_html
.
It probably doesn't yet exist in your home directory. If not create it with the command:
mkdir public_html<Enter>
The public_html
directory also must have the correct
"permissions" so that the server can look inside it. For security
reasons the WN server isn't very powerful and it can only
access files and directories which are not heavily protected. Just to
make sure the server won't have trouble accessing your home page run the
command:
chmod 755 public_html<Enter>
You must place a copy of your home page in this directory and give it
the name index.html
. One way to do this is enter the
public_html
directory and use an editor to create the home
page file there with the name index.html
. Alternatively,
you could create it elsewhere and copy it to the
public_html
directory. If you need to rename it the proper
UNIX command is:
mv oldname newname<Enter>
Of course the server must also be able to read your home page in order to serve it. If documents you create are automatically protected from viewing by others you will have to change that for your home page. The command:
chmod 644 index.html<Enter>
gives anyone (including the server) permission to read but not change your home page.
There is one more step before your home page is available to the world.
The WN server is very security conscious. It is much more
careful about refusing to serve documents which users did not intend to
be served. This means that in addition to serving only documents in
designated directories (such as your "public_html
") it will
only serve documents which have been explicitly designated for serving
in some way.
Here are two different ways to grant the server permission to serve your home page. You only need to do one of these.
Create a file in the public_html
directory named
index.wn
which contains the line:
Attribute=serveall
Then execute the command:
wndex<Enter>
which will create the file "index.cache
". It grants the
server permission to serve anything in this directory. If you put
documents to be served in subdirectories of public_html
,
copy the file index.wn
to them also and rerun
wndex
in these subdirectories to grant permission to serve
anything from them (actually the file "index.wn
" itself is
excluded from being served as are files beginning with the
'.
' character).
Create a file in the public_html
directory named
index.wn
which contains the line:
File=index.html
If you have other files you want to serve add additional lines like
"File=foo.html
" where foo.html
is the name of
the file you want to serve. Then execute the command:
wndex<Enter>
Files which are HTML files should have names which end with the suffix
".html
". Files which are ASCII text files should end with
".txt
" and PNG image files should end with
".png
". Executing the "wndex
" (pronounced
"windex") command will create the file "index.cache
" as
before, but now the server will only have permission to serve those
files whose names are listed explicitly in your index.wn
file.
If you put documents to be served in subdirectories of
public_html
, you must create an index.wn
file in
each of them also which lists the files which can be served.
The "better way", in addition to providing better security, also enables
many of the advanced features of the WN server, such as
searching capabilities. In the long run if you serve multiple documents
you will probably want to use it. The trade off is that the "lazy" way
is easier since it only needs to be done once, while the "better" way
requires that you edit the index.wn
file and rerun
wndex
each time you add a new file to
public_html
to be served.
So once your home page is functional, how do you access it? Well, your
home page address (called a URL which stands for "Uniform Resource
Locator") is made up from the name of your host and your user name. For
example, if your user name is jdoe
and the computer on
which the server is running is named www.school.edu
then
the URL for your home page is probably:
http://www.school.edu/~jdoe/
The "probably" refers to the fact that there are many possible variations
which may have been chosen by your webmaster. For example, some people
don't like the '~
' character in URL's so they might use
"http://www.school.edu/homepages/jdoe/
". In any case there
are a couple of things to notice about your URL.
First it has a synonym:
http://www.school.edu/~jdoe/index.html
This is because whenever a URL ends in '/
' the server knows
it is referring to a directory (in this case your
public_html
directory) and looks for the default file
"index.html
" in that directory. Thus if you want to put
some files in subdirectories of public_html
that is fine.
But keep in mind that if you create the directory "mydir
" in
public_html
then
"http://www.school.edu/~jdoe/mydir/
" will really refer to
"http://www.school.edu/~jdoe/mydir/index.html
" and to refer
to a file "foo.png
" in mydir
you should use
"http://www.school.edu/~jdoe/mydir/foo.png
". You should not
use the URL "http://www.school.edu/~jdoe/mydir
" referring to
a directory but with no '/
' at the end.
There are many advanced features you can use with the WN server. To learn to use them you will need to read parts of the manual designed for system administrators. A good place to start is the chapter "An Overview of the WN Server" in this guide. It will give you a good idea of some of the things the server can do. Then you will likely also want to consult the chapter "Creating Your WN Data Directory" in this guide. Neither of these chapters is too long.
To find out more about composing HTML documents consult "A Beginner's Guide to HTML" (at http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html) or see "Creating Web Sites" (at http://home.netscape.com/browsers/createsites/index.html). "http://www.htmlhelp.com" is an excellent site on the various web authoring standards.
These are the MD5 digests of recent releases of WN. The MD5 digesting program is available from "ftp://ftp.cert.org/pub/tools/md5/". My PGP public key is available from my home page.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 MD5 (wn-2.3.8.tar.gz) = 95c9fc262a11d165c8a682aa0eddc9fd MD5 (wn-2.3.7.tar.gz) = 011a46ce203c35b57d56078eacb984d4 MD5 (wn-2.3.6.tar.gz) = 8e572e06acfe6a1cdb3e326916b13ed9 MD5 (wn-2.3.5.tar.gz) = 32627fc73fbf2bc86cbed45748c720b2 MD5 (wn-2.3.4.tar.gz) = 5af9cc1497563819c33c095aadc03be2 MD5 (wn-2.3.3.tar.gz) = bf81789ddf326f7df40085961d48fab2 MD5 (wn-2.3.2.tar.gz) = bccbbc6651fb148c536c0184f1735a05 MD5 (wn-2.3.1.tar.gz) = a118459246a479aa9ac2ba3823c811fd MD5 (wn-2.3.0.tar.gz) = 46a240cb6bca73302fcf16161c17f5c4 MD5 (wn-2.2.9.tar.gz) = 4219ac1c9b6511fd239e2f5b9464c9c6 MD5 (wn-2.2.8.tar.gz) = cfe5d853501b828ab593887a3ec519d8 MD5 (wn-2.2.7.tar.gz) = d864acabb2777b89932824e02f3a733c MD5 (wn-2.2.6.tar.gz) = 248695156c29f86dcafb4fc9183b605a MD5 (wn-2.2.5.tar.gz) = 91d9a527256d090578d54f76efaf6cfb MD5 (wn-2.2.4.tar.gz) = 6c43b9a231bc8d8f9f021e30817f5d26 MD5 (wn-2.2.3.tar.gz) = f6c1e3b6ec4c78c0679c441eb04c68de MD5 (wn-2.2.2.tar.gz) = 63167df9366de648a633b35f13caf395 MD5 (wn-2.2.1.tar.gz) = a622801c8a0dcce91b91f8bde81b94f5 MD5 (wn-2.2.0.tar.gz) = 8d9081585d9b689d8c394d592daf2698 MD5 (wn-2.1.7.tar.gz) = 9348438468529e38b2f6ea313d89d4eb MD5 (wn-2.1.6.tar.gz) = 63d9523aee8808b272ef3d49b5bab830 MD5 (wn-2.1.5.tar.gz) = 30821b8e63e053e73c924ab46f082728 MD5 (wn-2.1.4.tar.gz) = 9c3a0c39f188c4b06e1ca41385a28317 MD5 (wn-2.1.3.tar.gz) = 9dd9e00d9e1627f215c3e4b13f3b76af MD5 (wn-2.1.2.tar.gz) = 5af0446f734dd180e2f2da66125c82fd MD5 (wn-2.1.1.tar.gz) = 84f34b550d0026c6cfa75ce20f6c4e01 MD5 (wn-2.1.0.tar.gz) = f44adec2a03294775af1597fea0321c6 MD5 (wn-2.0.11.tar.gz) = 4a3c5cafe5b30b284d759d43fcb9c689 MD5 (wn-2.0.10.tar.gz) = 44583c7ecae9486cb180ac4b39254cd7 MD5 (wn-2.0.9.tar.gz) = 5fadecb3d64db2a54e40674c06752a13 MD5 (wn-2.0.8.tar.gz) = d0eeec9de4fdfa479bb35d3172d45c9f MD5 (wn-2.0.7.tar.gz) = 9b4fc7ccf2139563c0de2b654de54384 MD5 (wn-2.0.6.tar.gz) = b1f9505b95f753671c276e0781f501ab * MD5 (wn-2.0.6.tar.gz) = 1a9f1e43faaad3b7beba3509c35ed999 * MD5 (wn-2.0.5.tar.gz) = 6d5fb3d7a04aaed2f80a30f20e71c99b MD5 (wn-2.0.4.tar.gz) = 58ef80d1d49917e88f145cbb727bc79e MD5 (wn-2.0.3.tar.gz) = 1d2928d67bc56a58449f8feea33a89e2 MD5 (wn-2.0.2.tar.gz) = e9c4edf7535a9465c1f4fee12fb98c85 MD5 (wn-2.0.1.tar.gz) = dfa7d8d9bf83006d04901c494e915ed1 MD5 (wn-2.0.0.tar.gz) = 2fe3889850566ea82decc55729cb1ae4 MD5 (wn-1.19.9.tar.gz) = c92e4009f59738ae521c3fc1a63c8f30 MD5 (wn-1.19.8.tar.gz) = 1544eedfbf54121879bdc51ae9b7bd68 MD5 (wn-1.19.7.tar.gz) = c3233797269691b6289eee0598f8217f MD5 (wn-1.19.6.tar.gz) = 265453d433ca5c65e0c31ff6d9ac0f67 MD5 (wn-1.19.5.tar.gz) = 4b6b87136758624d41c2b5010a50d13e MD5 (wn-1.19.4.tar.gz) = 5caedab5efdf43b42745b654f64e83ed MD5 (wn-1.19.3.tar.gz) = 96c34cad1d8d18b0296dd097740ce2a7 MD5 (wn-1.19.2.tar.gz) = bc052237a751b7225b725f97fa028954 MD5 (wn-1.19.1.tar.gz) = a24d81c0f1e11c5985607c89994da0b0 MD5 (wn-1.19.0.tar.gz) = 0ac7d1bb18086b2348a36180766013c3 MD5 (wn-1.18.7.tar.gz) = 8dfacb70b044e206cebc4271bef01cdc MD5 (wn-1.18.6.tar.gz) = 90a8352e383462ac5f68852f80fe854d MD5 (wn-1.18.5.tar.gz) = c8a55826e453eb17b6a72e064dd01987 MD5 (wn-1.18.4.tar.gz) = f7d1807be26031063300c3591281384c MD5 (wn-1.18.3.tar.gz) = 0f61cf0c7de4bd1d518fe18723dcf2b2 MD5 (wn-1.18.2.tar.gz) = 3e89198e22dcfd0513b26389fc3cb022 MD5 (wn-1.18.1.tar.gz) = 63f5e294cb3c297bcda621585cb0be94 MD5 (wn-1.18.0.tar.gz) = 19aa0a483d4c4d7ead4198d98b8d0e76 (*) There were two versions of 2.0.6 differing in trivial ways. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.1 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE5NdM8muNM+ZEudwYRAiCIAJ9XK/sZ7xhs/zBO0t+48sw6v1umYQCdGvfl ySSq/zYfVcMr3D9XJfz/lqg= =ni83 -----END PGP SIGNATURE-----
This section summarizes some of the important UNIX cpp(1)
utility macros defined in the file config.h.dist that controls the configuration of
WN.
#define ACCESS_DENIED_URL
Any request for a document in a directory which is denied because of an
"Accessfile=
"
restriction should be redirected to the given URL. This can be
overridden by with the "Access-denied-URL=
".
The default is to give this the value "" which disables the feature.
#define AUTH_DENIED_FILE
This is the document returned when password authentication fails because of an incorrect password or username
The default is to give this the value "" which disables the feature.
#define CGI_BIN
Directory name to identify CGI programs.
The default is "cgi-bin"
.
#define CGI_EXT
File extension to identify CGI programs.
The default is ".cgi"
.
#define DEFAULT_PORT
Default port to run the stand-alone servers wnsd
on. This
can be overridden at runtime by using the -p
command line option.
The default is (80)
.
#define DEFAULT_CHARSET
Default character set to use in the Content-type header for
text/*
MIME types.
The default is iso-8859-1
.
#define BUILTIN_CONTENT_TYPE
This is the MIME Content-type header value used for documents generated by the server, such as error messages or search results.
The default is "text/html; charset=iso-8859-1"
.
#define DEFAULT_URI
This is the document returned in response to a request with only the
hostname, (something like "http://hostname.edu/
" with no
file name at the end). You would need to change it, for example, if you
wanted to have the default server response be to run a CGI program.
The default is "/index.hmtl"
.
#define FORBID_CGI
Define this if you wish to forbid the use of the use of CGI/1.1 programs on your host. This will enhance security but will, of course, decrease functionality. See the chapter "Using CGI Programs on the WN Server" in this guide.
The default is to comment out this macro.
#define GROUPID
Same as for "#define USERID
" but
for the group id.
#define INDEXFILE_NAME
Default file name if the requested URL does not include a file name.
The default is "index.hmtl"
.
#define LEAST_UID
Minimum allowable uid for users with personal home directories.
The default is (100)
.
#define MIME_TYPE_FILE
The full path name of the file containing information permitting wndex
to translate the MIME
"Content-type
" from the suffix of a file name. If this file
is not present, running wndex
with the -v
option will issue a
warning but use internal default values. The file exists so that you can
add to it if you wish to add new kinds of documents to your server. The
format of the file is explained in the file. The default version of the
file is in /lib/mime.types
. The internal defaults are the
same as what is currently in this file.
The default is "/usr/local/lib/wn/mime.types"
.
#define NO_DNS_HOSTNAMES
If this is set to TRUE the server will not use hostnames in your log file, just IP addresses. This will reduce the load on your server (but probably not speed up responses since the lookups usually take place after the transaction is complete). Keep in mind that setting this TRUE will mean that none of your CGI programs will get the hostname and also that your access files cannot have hostnames in them, just IP addresses.
The default value is FALSE
#define NO_SERVEALL
Set to TRUE disable the server's ability to serve documents not listed in
an index.cache
file. See the chapter "Creating Your WN Data Directory" in
this guide for more details.
The default value is FALSE
#define NO_SUCH_FILE_URL
Specifies that any request in this directory for a non-existent file or a
file not listed in the index.wn
file of this
directory should be redirected to the given URL. Can be overridden with
the "No-Such-File-URL=
"
directory directive in the index.wn
file.
The default is to give this the value "" which disables the feature.
#define PUB_HTML
Name of public directory for home directories. See the section "Alternate Hierarchies From a Table Lookup" in this guide for more details.
The default is "/public_html"
.
#define RFC931_TIMEOUT
Timeout in seconds for the server when attempting to do RFC 931
lookups on clients using the UNIX identd(8)
system utility. This may be good for additional logging information, but
should not be trusted for authentication. Comment out if no lookup is
desired.
Because of bugs in many PC TCP/IP stacks that may cause identification connections to time out instead of being refused (certain firewall configurations may cause the same problem) this defaults to off, and the timeout should be kept low. Ten seconds is about as high as I recommend. Remember, this timeout will need to be fulfilled on every request from that client, so pages with lots of in-line images will become painful.
The default is to comment out this macro.
#define ROOT_DIR
Complete path of your root data directory.
The default is "/usr/local/wn"
.
#define SWN_PID_FILE
Full path of the file in which you wish the stand-alone server
wnsd
to deposit its process id on startup. Comment this out
or set it to the empty string ""
if you want the pid printed
to the UNIX stdout(3)
stream on startup instead of being placed in a file. This can be
overridden at runtime by using the -q
command line option.
The default is "/usr/local/wnlogs/wn.pid"
.
#define TILDE_TABLE
If defined to the path of a file containing name:directory
pairs then URLs starting with "TILDE_USER_STRINGname/foo
" will be
changed to /foo
and the root directory will be the directory
after the ':
' in the table. See the section "Alternate Hierarchies From a
Table Lookup" in this guide for more details
The default is to comment out this macro.
#define TILDE_USER_PWFILE
If defined then the user's name will be looked up in the password file
and the corresponding home directory with the string "#define PUB_HTML
" appended will be the
root directory. This will fail if the user's uid is less than "#define LEAST_UID
". See the section
"Hierarchies in User Home
Directories" in this guide for more details.
The default is to comment out this macro.
#define TILDE_USER_STRING
URL indicating a user's home page. You can also use something like
"/homepages/
" or "/people/
" for this but use
"/~
" to make "http://hostname/~user/
" work.
This appears to be a de facto standard. See the section "Hierarchies in User Home
Directories" in this guide for more details.
The default is "/~"
.
#define USE_VIRTUAL_HOSTS
If defined to the full path to the file of virtual hosts,
wnsd
will support multi-homed or virtual hosts as described
in the chapter "Multi-homed or Virtual Servers on
the WN Server" in this guide.
The default is to comment out this macro.
#define USERID
The user id you want wnd
to switch to when running under the
UNIX inetd(8)
system utility or when wnsd
is run by root
.
When running under inetd(8)
it should agree with the uid you
set in the UNIX inetd.conf(5)
configuration file.
The default depends on your system type but is generally something like
nobody
.
#define VERBOSELOG
By default WN uses the "Common Log Format" used by CERN httpd and NCSA httpd. Uncommenting this
will cause additional information to be added to the end of each log
line. In particular the User-Agent
and
Referrer
are logged.
Note: I have chosen to use the correct spelling ofReferrer
in the log file. To enable interoperability and to comply with the CGI/1.1 specification it is necessary elsewhere to use the incorrect spellingReferer
and the CGI environment variableHTTP_REFERER
.
This can be overridden at runtime by using the -v
command line option.
The default is to comment out this macro.
#define VIRTUAL_HOSTS_FILE
Full path to virtual hosts file. See the section "Using a Virtual Hosts List File" in this guide.
The default is to comment out this macro.
#define WN_ERRLOGFILE
Same as for "#define WN_LOGFILE
"
except for error messages. To override use the -l
option.
The default is "/usr/local/wnlogs/wn.log.err"
.
#define WN_HOSTNAME
If the default value of ""
is used WN will get the
default hostname from your system. If you supply a name it should be a
fully qualified domain name. WN will attempt to use the UNIX gethostbyname(3)
system call to find the IP address of your server.
The default value is ""
.
#define WN_LOGFILE
Full path of the file you want to use for a log file. The values set
here are overridden by the -L
option. To use the UNIX
syslogd(8)
system utility run wnsd
or wnd
with the -S
command line option. To
do no logging set above to ""
, the empty string.
The default is "/usr/local/wnlogs/wn.log"
.
#define WN_PAM_ENABLED
Uncomment this to enable PAM based authentication if your system supports PAM (Plugable Authentication Modules). This support exists on Linux and Solaris at least.
The file docindex.html
was
produced by the WN utility wnindexmaker
.
wnindexmaker
produces a collection of HTML unordered lists
with entries like:
<li> <a href="search=context?content-type">content-type</a>
This is an anchor to a relative URL. The complete URL would be:
http://host/docs/search=context?content-type
which asks the server to do a context
search of the 25 files in the User's Guide
with query term "content-type
". When this is selected the
server returns a list of all files in the directory with matches, with a
sublist for each list consisting of the lines that contain a match.
Selecting one of the highlighted search terms in a match line causes the
server to return document positioned at the location of the matched
item. This is part of all context searches. It is done by inserting an
anchor on the fly at the location of the match.
colophon n. An inscription placed usually at the end of a book, giving facts about its publication. [< Gk. kolophon, finishing touch.] - The American Heritage dictionary
The User's Guide for the WN Server was written by John Franks who is also the author of the WN software. Editorial changes and cleanup of the HTML source for the User's Guide was contributed by Jean Pierre LeJacq. This colophon describes the approach and tools used to write and validate the HTML source code.
We had the following goals for the HTML source code:
WN emphasizes the adherence the Internet standards such as HTML and HTTP. Basically, we want to practice what we preach in the documentation for WN.
The User's Guide was written using HTML 3.2 instead of the more recent HTML 4.0 since the former is currently better supported in both textual and graphical browsers.
As with any language, valid HTML can be written in a way that makes it difficult to maintain, non-portable and against the basic principles HTML embodies.
For example of the latter, we consider HTML primarily as a language that describes the semantic structure and not the visual representation of documents. Consequently, only minor use of HTML tags was made for formatting (e.g. centering of a documents title). Even these minor formatting decisions could be eliminated through the use of Style Sheets. However, these are currently not well supported by many browsers so they were not employed.
As an example or more minor portability issue, we check that there is no intervening white space in the start of an anchor, "<a>", tag. Browsers are inconsistent in how the display this extra white space.
One of the great strengths of HTML is the linking of widely dispersed documents to create a seamless virtual document. We have provided extensive links to relevant standards, manual pages and sites to maximize this advantage. The User's Guide itself is extensively cross-linked. We also wanted to insure that the links were valid.
One of the unfortunate consequences of the popularity of the Internet is the low quality of much of the HTML documents. Since learning by example is so important, we wanted to provide WN users an example of well written HTML.
All of the editing and validation of the User's Guide was done using the Linux Operating System developed primarily by Linus Torvalds. We used the Debian GNU/Linux distribution developed by many volunteers on the Internet. Much of the editing was done as part of the packaging of WN for Debian.
All of the editing of the User's Guide was done using the Xemacs text editor. XEmacs stems from a collaboration of Lucid, Inc. with Sun Microsystems, Inc. and the University of Illinois with additional support having been provided by Amdahl Corporation, Altrasoft and a huge amount of volunteer effort.
Much of the power of Xemacs is based on the lisp packages that are either part of the core Xemacs distribution or add-ons. We cannot list all of the packages but I do follow with several particularly important packages for editing HTML documents.
hm--html-menus
Mode
The hm--html-menus
package provides functions and various popup and pulldown menus for a
HTML mode called hm--html-mode
, a mode for writing HTML
pages. The primary author is Heiko Muenkel. It provides
also a minor mode, hm--html-minor-mode
, which can be used
together with another HTML major mode, like the psgml-html
mode which
is exactly what we did.
psgml
Mode
psgml
is a major mode for editing SGML
documents. psgml
contains a simple SGML parser and can work
with any DTD. Functions provided includes menus and commands for
inserting tags with only the contextually valid tags, identification of
structural errors, editing of attribute values in a separate window with
information about types and defaults and structure based editing. It
also provides an interface to an external SGML parser, such as SP, to rigorously validate a file.
psgml
is currently maintained by Lennart Staflin.
Emacs/W3 is a full-featured web browser, written entirely in Emacs-Lisp, that supports all the bells and whistles you will find in use on the web today, including frames, tables, stylesheets, and much more. We used as a quick validation of the basic presentation of the pages.
SP is a free object-oriented
toolkit for SGML parsing and entity management developed by James Clark. It includes the nsgmls
program which parses and validates the SGML documents whose document
entity is specified by the system identifiers sysid.
The Emacs psgml
mode supports invoking nsgmls
and parsing its output.
We used the following sysid at the beginning of all HTML documents:
<!doctype html public "-//W3C//DTD HTML 3.2 Final//EN">
Weblint is a Perl program for syntax and minimal style checking of HTML developed by Neil Bowers. We use in conjunction to SP to prevent the use of valid HTML that is either difficult to maintain or poorly supported by current browsers.
linbot is a Python program that allows webmasters to: view the structure of a site; track down broken links; find potentially outdated HTML pages; list links pointing to external sites; view portfolio of inline images; and do all this periodically and without user intervention. Results are displayed in a set of HTML pages.