WN home page

Version 2.5.0
[Previous] [Next] [Up] [Top] [Search] [Index]

Parsed Text and Server Side Includes on the WN Server


The WN server has powerful capabilities for modifying an HTML file as it is served. One such feature, called "server-side includes", is the ability to automatically insert one file in another. But the usefulness of this capability is greatly enhanced by another feature, conditional text. A simple "if - else - endif" construct allows sections of a document to be served only to certain hosts, or to clients providing an appropriate Accept header or a desired Referer or User-Agent header. Using the "else" construct allows alternate text segments for clients not meeting the desired criteria. For these features to work it is necessary for the server to parse the file looking for the location of includes or for conditionally served text. The fact that a file should be parsed in this way is indicated in its entry in its index.wn file. This is done with an entry "Attributes=parse" in the file record. This line is not necessary if the file record lists wrappers or includes since it is assumed in that case. Only documents with MIME type text/html can be parsed in this way.

6.1 Wrappers, Includes, and List-Includes

The simplest and most common use of parsed text is including additional files in one which is being served or "wrapping" a served file with a second file, i.e. inserting the requested file inside the second at a desired location. The latter is useful, for example, if you wish to place a standard message at the beginning or end (or both) of a large collection of files. All files included or used as wrappers must be listed in the index.cache file. They are not listed in the file in which they are to be included; only the location of the insertion is marked.

To ensure security various options are available including the the requirement that a served file and all its includes and wrappers have the same owner as the index.cache file listing them. This is done with the -u option.

Another important application of wrappers is to customize the HTML documents returned listing the successful search matches. If, for example, a directory is assigned a wrapper the server assumes that it contains all text describing the search and it merely supplies an unordered list of links to the matching items.

6.2 Simple Including

Suppose your server offers the file foo.html and you wish to have the file bar.html automatically inserted at some point into foo.html. You achieve this by use of the "Includes=" directive in the index.wn file in the directory containing foo.html. That index.wn file should have an entry like:

File=foo.html
Includes=bar.html

and the file foo.html should contain either of the lines:

<!-- #include -->

or:

<?WN #include >

These two forms are equivalent as far as the server is concerned. The second form is considered more SGML friendly by many as <?WN something> indicates a processing instruction specific to WN rather than a comment. For historical reasons this manual describes the other form, but either may be used. With the first form the '#' is required but with the other you may use either:

<?WN #include>

or:

<?WN include>

Any of the "<!-- #something -->" lines described in this section may be replaced using the "<?WN something>" syntax. Also with this syntax the case of the WN is not significant.

The marker "<!-- #include -->" should be the only thing on its line but may have white space before it. This is controlled by the "Includes=" line in the index.wn file. All including of files by WN is done only for text files and only in units of lines.

If the include marker is never found in foo.html, then this file is served and the file bar.html, is appended at the end. Thus the "Includes=" directive can be used to append a file without the need of any marker in the main file.

If you wanted to include two files in foo.html, say bar.html and bah.html you would place the marker "<!-- #include -->" at two places in foo.html and have an entry in your index.wn file like:

File=foo.html
Includes=bar.html,bah.html

The two files will then be included at the marked spots in the order that they are listed in the "Includes=" directive, i.e. bar.html will be inserted at the first marker and bah.html at the second. You can have as many included files as you wish. They should all be listed on the "Includes=" line separated by commas.

Sometimes it is easier to specify the order in which multiple files are to be inserted by adding the file name in the HTML document. This is also possible with WN. However, since good security requires that only files listed in the index.wn file may be served, it is still necessary to list files you want to include there.

So, another way to include two files in foo.html, say, bar and bah, is to place the markers '<!-- #include "bar" -->' and '<!-- #include "bah" -->' at the appropriate places in foo.html and have an entry in your index.wn file like:

File=foo.html
List-Includes=bar,bah

The order of the files listed in the directive is not significant. The example above grants permission for the inclusion of the three files listed. It does not require their insertion. Note the difference between the two mechanisms: with the "Includes=" directive only "<!-- #include -->" is needed in the HTML and the file inserted is taken (in order) from the list in the directive, while with the "List-Includes=" directive the HTML must contain '<!-- #include "bah" -->' so the server knows to insert file "bah" and the file list in the directive is checked to see that permission to serve this file has been granted. If you use the "List-Includes=" directive you cannot also use either the "Includes=" or "Wrappers=" directives -- the "List-Includes=" and "Includes="/"Wrappers=" mechanisms are mutually exclusive. It is fine to use the line '<!-- #include "bar.html" -->' to insert a file with the "Includes=" directive, but with that directive the "bar.html" is only a comment and has no effect on which file is actually inserted.

There is a "Default-List-Includes=" directory directive which can be used to grant permission for any HTML file in the directory to include one or more of the listed files. This also causes all HTML files in the directory to acquire the "Attributes=parse".

If a file has a filter only that file will be filtered, not any wrappers or includes.

Instead of a file it is possible to include the output of a program. To do this the program is listed in the "Includes=" directive but its name is preceded with the '!' character. For example:

File=foo.html
Includes=!/usr/bin/date,bar.html

will insert the time and date at the first "<!-- #include -->" and the contents of bar.html at the second. "Includes=" and "List-includes=" should not list arguments to programs. Thus if you wished to use the output of the UNIX date utility to get UTC instead of local time you should create a program named "mydate" containing, for example:

#!/bin/sh
exec /usr/bin/date -u

and use the directive "Includes=!mydate".

This ability to serve the output of programs can be restricted in several ways. If WN is invoked with the -e option then no includes, filters, or CGI programs will be executed.

The -E option in conjunction with the -t or -T options restricts the execution of programs to those listed in index.cache files owned by trusted users or groups. The -u option allows the execution of programs or inclusion of files owned by the owner of the index.cache file which lists them. If the -E and -u options are used together the -E option takes precedence.

If you wish to have all the standard CGI/1.1 environment variables made available to the executed include program you can do so by adding the line:

Attributes=cgi

to the file record. A list of these environment variables can be found in the appendix "CGI and other Environment Variables for the WN Server" in this guide. Also see the sample CGI program which is located in the file /docs/examples/sample.cgi which accompanies the WN distribution.

6.3 Conditional Text: If, Else, and Endif

Often a server maintainer may wish to serve different versions of a document to different clients. Here is a simple example of how this can be handled with WN.

<!-- #if accept =~ "image/jpeg" -->

<a href="picture.jpg">
Here is the jpeg version of the picture:
</a>

<!-- #else -->

<a href="picture.png">
Here is the png version of the picture:
</a>

<!-- #endif -->

This tells the server to look at the Accept: headers provided by the client and if "image/jpeg" is among them then use the jpeg image and otherwise use the png image. More precisely, the "image/jpeg" part of the "if" line is a UNIX grep(1) utility like regular expression and if there is any match for it among the Accept headers the jpeg image will be used. Of course these "if - else - endif" constructs can be nested. A similar construct allows you to make the text served depend on matching a regular expression with the contents of the User-Agent header, the Referer header, the Cookie header, the client hostname, or the client IP address. A complete list of possible test clauses for the #if statement is contained in appendix "Parsed Document Syntax for the WN Server" in this guide.

Note: For this to work the file containing the conditional text needs to be parsed by the server. The server only knows to do this if the file record in the index.wn file contains a line "Attributes=parse".

If, in an "if clause", instead of the equal-tilde string '=~' (to indicate a regular expression match) the character string '!~' is used then the truth value of the match is reversed. For example the lines:

<!-- #if referer !~ "my.host.edu" -->
Here is some text for newcomers to my site.
<!-- #endif -->

would display the "Here is some text ..." to those clients accessing this document via any link which is not on the host my.host.edu.

There is also a redirect command which can conditionally cause the server to send an HTTP/1.1 redirect to a new URL. For example if the text:

<!-- #if hostname =~ "\.uk$" -->
<!-- #redirect = "UK_mirror_url" -->
<!-- #endif -->

is included at the beginning of an HTML document then any request from a UK host will automatically be redirected to the specified URL, the UK_mirror_url in this case. This mechanism could also be used to redirect text only browsers to a text only alternative page, etc. There must be no text sent before the '<!-- #redirect = "url" -->' is encountered (not even blank lines) since the server cannot send an HTTP/1.1 redirect while in the middle of transmitting a document.

Normally the URL in the '<!-- #redirect = "URL" -->' line is fully qualified, like "http://host/path/foo". However, it can also be simply "foo" referring to a file in the same directory as the file being parsed. In this case an HTTP/1.1 redirection is not sent, and instead the file "foo" is returned immediately to the client.

Often a single regular expression is not adequate to distinguish whether or not to serve some text. For that reason WN allows you to use a file containing any number of regular expressions and serve a document based on whether any of these expressions match accept headers, referer header, user-agent, hostname, etc. For example, if the file "acceptfile" contains the lines:

image/png
image/jpeg
image/x-xbitmap

then the following conditional text might be appropriate:

<!-- #if accept file = "acceptfile" -->

I see you aren't using a text only browser...
<!-- #endif -->

The format of the file "acceptfile" is one UNIX grep(1) utility like regular expression per line. Lines beginning with '#' are taken to be comments. If a regular expression is preceded by the character '!' then that character is skipped but the truth value of the match is reversed. More information about files of regular expressions for conditional text can be found in appendix "Parsed Document Syntax for the WN Server" in this guide.

6.4 Conditional Text and Access Control Files

Another method of using conditional text is with a normal WN access control file. For example:

<!-- #if accessfile="secret/access" -->

Here is a link to a restricted directory.

<a href="/secret/stuff.html">Restricted Local Stuff</a>

Hosts not listed in the file "secret/access" can't look at it, so why show them a link to it?

<!-- #endif -->

will display the included text and the link to "Restricted Local Stuff" only to clients on hosts permitted by the access control file "secret/access".

6.5 Logical Operations in If Conditions

The condition in the "if" or "elif" tags can be made more complex than those described above by combining simple conditions using the logical operations '&&' for 'and', '||' for 'or' and '!' for 'not'. Parentheses may be used for grouping. For example:

<!-- #if hostname =~ "\.uk$" && accessfile="secret/access" -->

Something for those sites in accessfile but not in the UK.

<!-- #endif -->

This is described in greater detail in the section "Matching Regular Expressions: #if and #elif" in this guide.

6.6 More on Including: the section Marker

Often you may want to include HTML files in a larger document. For example, a single logical HTML document often consists of a number of separate files. This can make it very difficult for a client to download and print the entire document. To alleviate this problem you could form a single document concatenating all the pieces, but that can be problematic when the document is altered or updated as it is necessary to change both the piece and the concatenation.

WN offers a solution to this by allowing you to write a small skeleton document which includes all the pieces, forming a "virtual" document which is the concatenation. But if we just used the mechanism above the concatenation document would include the <head> elements of all the pieces. To remedy this you use the marker:

<!-- #section -->

instead of "<!-- #include -->" and the server will include only the portion of the HTML document between the special comments "<!-- #start -->" and "<!-- #end -->" inserted in that document. This requires that these starting and ending comments occur in the HTML document on lines by themselves.

6.7 Wrapping Files

Suppose you have a large number of files and want a standard header or footer placed on all of them (perhaps a standard disclaimer). You could, of course, make many copies of the standard header with a different "Includes=" directive for each, but this is cumbersome. It has other drawbacks as well, such as the fact that grep and context searches do not search associated include files, only the main file.

To deal with this case WN uses wrappers. Wrapping a file is the inverse of including it. If you have an index.wn file entry like:

File=foo.html
Wrappers=bar.html

then the server will send the file bar.html looking for the marker "<!-- #include -->" and inserting foo.html at the line where it is found. So this is just like the "Includes=" directive except the role of which file is included in the other is reversed. If the include marker is never found the entire wrapper, bar.html, is sent first and the main file, foo.html, is appended at the end. Thus the "Wrappers=" directive can be used to prepend a file without the need of any marker in the main file.

6.8 Search Wrappers

A search wrapper is a special kind of wrapper. Its function is to allow you to customize the results of user searches. It is normally associated with an entire directory in which case it applies to all title, keyword, field, grep and context searches of that directory. See the chapter "Setting Up Searches on the WN Server" in this guide.

A line like:

SearchWrapper=foo.html

in the directory record of an index.wn file will cause any search of that directory to return an unordered list of matches wrapped with the file foo.html. The list of matches will be inserted into foo.html at a point where the marker "<!-- #include -->" is found. You can also insert the user supplied search term by using the marker "<!-- #query -->". Both of these markers must occur on a line by themselves.

6.9 Nested Including and Wrapping

Wrapping and including can be arbitrarily nested (though this is not true of search wrappers). It can be a little complicated to get the effect you want.

Here is how to do it. Think of all your files, wrappers, includes and the main file arranged as you wish them to be combined for the final served document. Now imagine inserting an opening (or left) parenthesis at the beginning of each file and insert a closing (or right) parenthesis at the end of each file. You should have a legally nested and balanced collection of parentheses. To each of the opening parentheses attach the name of the file which begins at that point. Then write down the list of all the file names in the order their corresponding opening parentheses occur. All the files which come before the main file should be wrappers and should be listed in the "Wrappers=" line in the order in which they occur in this list. All the files after the main file should be in the "Includes=" line and should occur in the order they occur in this list.

Here's a simple example. Suppose we have a main file M and other files A, B, C, and D which we want to have nested like:

(D...(B...B)...(M...(A...(C...C)...A)...M)...D)

Then the entry in the index.wn file should look like:

File=M
Wrappers=D,B
Includes=A,C

6.10 Including Title, Query, Fields and Environment Variables

Often it is useful to insert items like the main document's title into a wrapper, or the search item provided by the client into a searchwrapper for the responses. This is possible with WN parsed documents using the lines:

<!-- #title -->

or:

<!-- #query -->

in the file at the point where the title or search term should be inserted. Both of these markers must occur on a line by themselves.

In addition to the title and query user supplied fields can be inserted in documents. This is done by including a marker such as:

<!-- #field3 -->

in the file at the point where the value should be inserted.

You can also insert the value of any environment variable into your text with a line like:

<!-- #environ = "HTTP_REFERER" -->

which will be replaced by the contents of the environment variable HTTP_REFERER.

Note: If there is no "Wrappers=" or "Includes=" line in the index.wn file for this entry then there must be a line like:
Attributes=parse

so the server knows it is to parse the file to look for the marker. See the section "Set File Attributes" in this guide for more details. Also if you wish to insert the value of a CGI/1.1 environment variable (as in the example above) you must have an "Attributes=parse" line.

6.11 Meta Information

Strictly speaking this feature has nothing to do with parsed documents, but it fits here as well as anywhere. If a URL has the parameter meta appended (with a semi-colon separator), as in

http://host/path/foo;meta

for example, then it will return meta data about the resource rather than the resource itself. Here is the meta data returned by this document when the meta parameter is used.


WN version 2.5.0
Copyright © 1998-2005 John Franks <john@math.northwestern.edu>
licensed under the GNU Free Documentation License
Last modified: Sat June 18 2005
[Previous] [Next] [Up] [Top] [Search] [Index]