WN home page

Version 2.5.0
[Previous] [Next] [Up] [Top] [Search] [Index]

Parsed Document Syntax for the WN Server


This is a list of all parsing instructions recognized by WN while parsing an HTML document. Note that only documents with MIME type text/html can be parsed for the purposes described here. All parsing instructions use one of the two equivalent forms:

<!-- #something -->

or:

<?WN something>

There is a maximum allowed size of 2K bytes for the entire "<!-- #something -->" expression. Current versions of WN no longer require this expression to be on a line by itself.

The second form is considered more SGML/XML friendly by many as "<?WN something >" indicates a processing instruction specific to WN rather than a comment. For historical reasons this manual describes the other form, but either may be used. With the first form the '#' is required but with the other you may use either:

<?WN #something>

or:

<?WN something>

Also "<?wn #something>" is fine. The case of the WN is not significant.

C.1 Matching Regular Expressions: #if and #elif

This section describes the use of conditionally included text of the form:

<!-- #if some_condition -->
  Some conditional text goes here.
<!-- #elif another_condition -->
  Some other conditional text goes here.
<!-- #else -->
  Alternate text.
<!-- #endif -->

Which will insert the first conditional text only if some_condition is satisfied. The "<!-- #elif another_condition -->" and "<!-- #else >" are optional. There may be multiple "#elif" lines.

In all the examples below the use of the equal-tilde string '=~' to indicate a matching regular expression can be replaced with the two characters '!~' in which case the if clause will be true when the regular expression fails to match.

Also in the examples of the form "<-- #if accept file="foo" -->" the file foo is assumed to be relative to the current directory unless it begins with a '/' in which case it is taken relative to the WN data hierarchy root. The format of these files is a list of grep(1) like regular expressions, one per line with any white space being taken as part of the expression. Lines beginning with '#' are taken to be comments. If a regular expression is preceded with the character '!' then that character is skipped but the truth value of any matches with the expression is reversed.

The regular expressions recognized by the WN server are the same as those of the UNIX grep(1) utility (though this utility is not used as the server has its own regular expression functions). The more general regular expressions used for example in the UNIX egrep(1) utility are not supported by WN.

The condition in the "#if" or "#elif" tags can be made more complex than those described above by combining simple conditions using the logical operations '&&' for 'and', '||' for 'or' and '!' for 'not'. Parentheses may be used for grouping. For example:

<!-- #if cond_1 && cond_2 -->
  Text to show if cond_1 and cond_2 are satisfied.
<!-- #endif -->

Other examples are:

<!-- #if cond_1 || cond_2 -->
<!-- #if !cond_1 -->
<!-- #if (cond_1 || cond_2) && !cond_3 -->

The '&&' and '||' operations have equal precedence and associate from right to left.

C.1.1 #if and #elif Conditions

#if accept -- Match client's Accept headers

The lines:

<!-- #if accept =~ "regexp" -->

or:

<!-- #if accept file = "foo" -->

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches any of the Accept headers supplied by the client. Or for the second line if the file "foo" contains a regular expression matching any of the Accept headers.

#if accept_charset -- Match client's Accept-Charset headers

The lines:

<!-- #if accept_charset =~ "regexp" -->

or:

<!-- #if accept_charset file = "foo" -->

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches any of the Accept-Charset headers supplied by the client. Or for the second line if the file "foo" contains a regular expression matching any of the Accept-Charset headers.

#if accept_encoding -- Match client's Accept-Encoding headers

The lines:

<!-- #if accept_encoding =~ "regexp" -->

or:

<!-- #if accept_encoding file = "foo" -->

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches any of the Accept-Encoding headers supplied by the client. Or for the second line if the file "foo" contains a regular expression matching any of the Accept-Encoding headers.

#if accept_language -- Match client's Accept-Language headers

The lines:

<!-- #if accept_language =~ "regexp" -->

or:

<!-- #if accept_language file = "foo" -->

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches any of the Accept-Language headers supplied by the client. Or for the second line if the file "foo" contains a regular expression matching any of the Accept-Language headers.

#if after and #if before -- Select text based on date

The lines:

<!-- #if after "date" -->

or:

<!-- #if before "date" -->

specify that this text segment should be served if the current time is after (or before) the specified date. That is, the line:

<!-- #if after "22 Oct 1996 17:41:26" -->

will cause the text segment to be served only after "22 Oct 1996 17:41:26" local time. The date format is rather rigid. It must be in precisely the format shown above (specified by RFC 1123) and with a single space between each field. Only local time of the server is supported.

#if cookie -- Match client's Cookie headers

The lines:

<!-- #if cookie =~ "regexp" -->

or:

<!-- #if cookie file= "foo" -->

specifies that this text segment should be served if the UNIX grep(1) utility like regular expression regexp matches any of the Cookie headers supplied by the client.

More information about the proposed HTTP Set-Cookie header is available at http://home.netscape.com/newsref/std/cookie_spec.html.

#if environ VAR -- Match client's environment variable VAR

The lines:

<!-- #if environ VAR =~ "regexp" -->

or:

<!-- #if environ VAR; file= "foo" -->

specifies that this text segment should be served if the UNIX grep(1) utility like regular expression regexp matches any of the the contents of the server's environment variable VAR.

#if field -- Match document's user defined field

The lines:

<!-- #if field3 =~ "regexp" -->

or:

<!-- #if field3 file= "foo" -->

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the contents of the user defined field number 3 (in the first case) or if the file "foo" contains a matching regular expression (in the second) case. Any valid field number may be used in place of 3.

#if hostname -- Match client's hostname

The lines:

<!-- #if hostname =~ "regexp" -->

or:

<!-- #if hostname file= "foo" -->

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the hostname of the client (in the first case) or if the file "foo" contains a matching regular expression (in the second) case. For an alternate method of doing this see the "#if accessfile" syntax described below.

Be aware that the character '.' (dot) has a special meaning in regular expressions and must be escaped with a '\' to have its usual meaning.

#if host_header -- Match server's virtual hostname from client's HTTP Host header

The line:

<!-- #if host_header =~ "regexp" -->

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the contents of the HTTP "Host:" header supplied by the client in its request.

Be aware that the character '.' (dot) has a special meaning in regular expressions and must be escaped with a '\' to have its usual meaning.

#if IP -- Match client's IP address

The lines:

<!-- #if IP =~ "regexp" -->

or:

<!-- #if IP file= "foo" -->

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the IP address of the client (in the first case) or if the file "foo" contains a matching regular expression (in the second case). For an alternate method of doing this see the "#if accessfile" syntax described below.

Be aware that the character '.' (dot) has a special meaning in regular expressions and must be escaped with a '\' to have its usual meaning.

#if language -- Match client's Accept-Language headers

The lines:

<!-- #if language =~ "regexp" -->

or:

<!-- #if language file = "foo" -->

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches any of the Accept-Language headers supplied by the client. Or for the second line if the file "foo" contains a regular expression matching any of the Accept-Language headers. These forms are deprecated. The preferred form is to use #if accept_language in place of #if language.

#if query -- Match query string supplied in request URL

The lines:

<!-- #if query =~ "regexp" -->

or:

<!-- #if query file = "foo" -->

specifies that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the query string supplied by the client in the URL (in the first case) or if the file "foo" contains a matching regular expression (in the second case).

#if referer -- Match client supplied Referer: header

The lines:

<!-- #if referer =~ "regexp" -->

or:

<!-- #if referer file = "foo" -->

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the contents of the Referer: header supplied by the client or if the file "foo" contains a matching regular expression (in the second case). The Referer: header contains the URL of the document containing the link accessed to obtain the current document.

#if request -- Match client's request

The lines:

<!-- #if request =~ "regexp" -->

or:

<!-- #if request file = "foo" -->

specify that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the contents of the full text of the request supplied by the client (in the first case) or if the file "foo" contains a matching regular expression (in the second case). The full request contains the "method" (GET or POST) followed by the URL requested with the "http://host" part having been removed (by the client).

#if TE -- Match client's TE header

The lines:

<!-- #if TE =~ "regexp" -->

or:

<!-- #if TE file = "foo" -->

specify that this text segment should be served if (in the first case) the UNIX grep(1) utility like regular expression "regexp" matches the TE header supplied by the client. Or for the second line if the file "foo" contains a regular expression matching the TE header. The TE header specifies the transfer encoding.

#if true and #if false -- Include or exclude text segment

The line:

<!-- #if false -->

specifies that the corresponding text segment should not be served. It may be useful for "commenting out" a part of a document which is under construction. The "#if true" construct is present for logical completeness.

#if UA -- Match client's User-Agent: header

The lines:

<!-- #if UA =~ "regexp" -->

or:

<!-- #if UA file = "foo" -->

specifies that this text segment should be served if the UNIX grep(1) utility like regular expression "regexp" matches the User-Agent: header supplied by the client (in the first case) or if the file "foo" contains a matching regular expression (in the second case).

C.1.2 Access Control Files

The normal access control files used by WN to limit access to a directory can also be used to conditionally permit or deny access to text segments.

#if accessfile="filename" -- Check access control file

The line:

<!-- #if accessfile="/dir/accessfile" -->

specifies that the file /dir/accessfile is to be used to determine access privileges (by hostname or IP address) for this text segment. The path /dir/accessfile is relative to the server root directory. If this path does not begin with a '/' then the path is relative to the directory containing the file with this text. See the chapter "Limiting Access to Your WN Hierarchy" in this guide.

C.2 Inserting the Contents of a File

#include -- Insert the contents of a file

The line:

<!-- #include -->

specifies that the contents of next file listed in the includes or wrappers should be inserted at this point. It is permissible to add the name of this file, as in:

<!-- #include  foo.txt -->

but this acts only as a comment. The actual file inserted depends only on the "Includes=" and "Wrappers=" directives in the index.wn file (or more precisely the index.cache file created from it).

#section -- Insert part of the contents of a file

The line:

<!-- #section -->

specifies that part of the contents of next file listed in the includes or wrappers should be inserted at this point. It is permissible to add the name of this file, as in:

<!-- #section  foo.txt -->

but this acts only as a comment. The actual file inserted depends only on the "Includes=" and "Wrappers=" directives in the index.wn file (or more precisely the index.cache file created from it).

The part of the file actually included is that portion of the document between the special comments "<!-- #start -->" and "<!-- #end -->" inserted in that document. This requires that these starting and ending comments occur in the HTML document on lines by themselves. For more information see the section "More on Including: the section Marker" in this guide.

#start and #end -- Mark the beginning and end of text to be included

The lines:

<!-- #start -->

and:

<!-- #end -->

mark the beginning and end of the portion of the text to be inserted from an include or wrapper in response to encountering "<!-- #section -->" in the text of a document being parsed. There can be more than one "#start/#end" pair in a document. For more information see the section "More on Including: the section Marker" in this guide.

#title, #query, and #field -- Insert the title, current search string, or a user defined field

The lines:

<!-- #title -->
<!-- #query -->

or:

<!-- #field 3 -->

in a parsed document instruct the server to include the title of the current document, the current search term from the client or the value of user defined "field #3" for the current document. All of these markers must occur on a line by themselves. For more information see the section "Including Title, Query, Fields and Environment Variables" in this guide.

#environ -- Insert the contents of an environment variable

The lines:

<!-- #environ = "WHATEVER" -->

in a parsed document instructs the server to include the contents of the environment variable WHATEVER. Remember to use an "Attributes=parse" line when using this construct and to use an "Attributes=cgi" when it is a CGI variable like HTTP_REFERER which is to be included.

Conditional Redirecting

#redirect -- Redirect to a different URL

The line:

<!-- #redirect = "url" -->

specifies that if no text has yet been sent the server should send an HTTP redirect to the given URL. This might be used as follows. If the text:

<!-- #if hostname =~ "\.uk$" -->
<!-- #redirect = "UK_mirror_url" -->
<!-- #endif -->

is included at the beginning of an HTML document then any request from a uk host will automatically be redirected to the specified URL, the UK_mirror_url in this case. This mechanism could also be used to redirect text only browsers to a text only alternative page, etc.

There must be no text sent before the '<!-- #redirect = "url" -->' is encountered (not even blank lines) since the server cannot send an HTTP redirect while in the middle of transmitting a document. Thus the example above would be an error if there are any blank lines before the "#if hostname" line or any blank lines after it before the "#redirect" line. When such an error occurs it is logged in the error file and the "#redirect" line is ignored.

Note however that:

<!-- #if hostname =~ "\.uk$" -->
  [Lots of text here]
<!-- #else -->
<!--   #redirect = "some_URL" -->
<!-- #endif -->

is correct since when the #redirect line is encountered no text has been sent.

Normally the URL in the "<!-- #redirect = "URL" -->" line is fully qualified, like "http://host/path/foo". However, it can also be simply "foo" referring to a file in the same directory as the file being parsed. In this case an HTTP redirection is not sent, and instead the file "foo" is returned immediately to the client.


WN version 2.5.0
Copyright © 1998-2005 John Franks <john@math.northwestern.edu>
licensed under the GNU Free Documentation License
Last modified: Sat June 18 2005
[Previous] [Next] [Up] [Top] [Search] [Index]