Contents [*] CGI scripts [*] Example of calling a CGI script file [*] Decoding data sent to a CGI script [*] Script to record users of web page [*] Post vs. get [*] Check list ------------------------------------------------------------ Warning if you are not using a browser that supports tables such as Netscape 1.1 or later then this page will probably be very difficult to read. ------------------------------------------------------------ [Index] CGI scripts A CGI script file is written in a programming language which can be ether: * Compiled to run on the server. * Interpreted by an interpreter on the server. Examples, of languages used include: * Compiled languages C, C++, Ada * Interpreted languages perl, JCL (e.g. The unix sh command language) The CGI script is executed when an anchor tag or an image tag refers to the CGI script file rather than a normal file. The determination of whether this is a CGI script file or just an HTML file is made on the physical placement of the file on the server. Usually this placement is in the web servers cgi-bin directory. However the exact location of this directory on the server machine is determined by the web administrator. This placement and control of the cgi-bin directory is determined by the web administrator to prevent security problems, that could occur if arbitrary programs where allowed to be executed by anybody accessing the machine. [Index] Call of a CGI script file An anchor tag to execute the CGI script dynamic_page on the server www.mc.com is: Dynamic page When the web server process a request to fetch a file, if the requested file is in the servers nominated cgi-bin directory then as long as this file is marked as being executable the script will be run on the server. If the file is not executable then an error will be reported. The script eventually returns an HTML page or image to be displayed as the result of its execution. When a CGI script file executes it may access environment variables to discover additional information about the process that it is to perform. The first line of the returned data must be: Type of returned data Text An HTML page Content-type: text/html A gif image Content-type: image/gif A simple CGI script on a unix based system to return a list of the current users who are logged onto that system is: #!/bin/sh echo Content-type: text/html Remember: echo echo * The "'s around text echo "" with a < or > echo "" character. echo "" echo "" On a Unix system: echo "

Users logged on the server are:

" echo "
"                                      * The first line is
 who                                                 #!/bin/sh
 echo "
" * The file is set echo "" executable. echo "" Note: The JCL (Job Control Language) command echo echoes the rest of the line to the standard output The JCL command who lists the current users who are logged onto the system. Allowing users to create their own CGI scripts can lead to security problems on the server. The major environment variables that can be accessed by the CGI script when it executes are: Environment variable Contains Data sent to the CGI script, by its caller. This may QUERY_STRING be the output from a form, or other dynamically or statically generated data. REMOTE_ADDR The internet address of the host machine making the request. A C++ program mas_env.cpp when run prints many of the environment variables available to a CGI script. CGI scripts can be written in any language. For example, a CGI script to return the contents of the environment variable QUERY_STRING can be written in Ada 95. Note: I used the gcc compiler version 2.7.0 to compile this source code. In particular this compiler recognises the new data type bool. ----------------------------------------- ------------------------------------------------------------ ----------------------------------------- [Index] Decoding data sent to a CGI script When a form is used, the information collected in the form is sent to the CGI script for processing. This information is placed in the environment variable QUERY_STRING. To pass information explicitly to the environment variable QUERY_STRING a modified form of an anchor tag is used. In this modified anchor tag, the data to be sent to the environment variable QUERY_STRING is appended after the URL which denotes the CGI script. The character ? is used to separate the URL denoting the CGI script and the data that is to be sent to the script. For example: Link The data "name=Your+name&action=find" is placed in the environment variable QUERY_STRING and the cgi script script executed. A class written in C++ composed of the specification parse.h and implementation parse.cpp is used to extract the individual components in the QUERY_STRING . The header file t99_type.h contains definitions for C++ features not implemented in some compilers. The members of this class are: Method Responsibility Parse Set the string that will be parsed. set Set a different string to be parsed. get_item Return the string associated with the keyword passed as a parameter. If no data return NULL. get_item_n Return the string associated with the keyword passed as a parameter. If no data then return the null string. When using the member functions get_item and get_item_n the optional second parameter specifies which occurrence of the string associated with a keyword to return. This is to allow the recovery of information attached to identical keywords. In addition the returned string will have had the following substitutions made on it. * + Will be converted to a space. * %HH Will be converted to the character whose hexadecimal value is HH. * ~user Will be replaced by the full path to the user's home directory, but only if the optional third parameter is true. Note: The definition of NO_MAP will cause the code for ~username processing to be not included. This is so that the code can be compiled for machines, which do not support the system function map_uname defined in the header file pwd.h. For example, if the QUERY_STRING contained: tag=one&name=mike&action=%2B10%25&tag=two&log=~mas/log&tag=three Then the following program when compiled and run: enum bool { false, true }; #include #include #include "parse.h" #include "parse.cpp" void main() { char *query_str = getenv("QUERY_STRING"); Parse list( query_str ); cout << "name = " << list.get_item_n( "name" ) << "\n"; cout << "action= " << list.get_item_n( "action" ) << "\n"; cout << "log = " << list.get_item_n( "log", 1, true ) << "\n"; for ( int i=1; i<=4; i++ ) { cout << "tag (" << i << ") = "; cout << list.get_item_n( "tag" , i ) << "\n"; } } would produce the following output: name = mike action= +10% log = /usr/staff/mas/log tag (1) = one tag (2) = two tag (3) = three tag (4) = ----------------------------------------- ------------------------------------------------------------ ----------------------------------------- [Index] Script to record users of web page By using an URL denoting a CGI script in an tag additional processing can be performed before the image is delivered. This additional processing records details about the current viewer of the web page. Additional information is sent to the CGI script to specify the exact details of the action to take. For example: Formatted text HTML markup required [Image] Record not made The CGI script mas_rec written in C++ is sent the following information: Parameter name Specifies file The name of the file in which the usage information will be appended. page A name for the page that will recorded in the log. img The image that will be loaded. Of course for this to work, the viewer of the page must be viewing and hence loading images. Several reasons why images may not be loaded include: * The browser does not support viewing of images. * The viewer has de-selected the view image option to improve performance when they have a slow link to the web server. * The server may not receive the request to return the image. ----------------------------------------- ------------------------------------------------------------ ----------------------------------------- [Index] Post vs. Get So far the method used to send information to the CGI script has been GET. When the method GET is used the data sent is placed in the environment variable QUERY_STRING for the CGI script to process. An alternative method is to use POST. When the method POST is used the data is sent by a separate stream and becomes the standard input to the CGI script. The method used is specified on the
tag using the attribute METHOD="get" or METHOD="post". The default method is GET. For example: Generated form HTML markup required
When using the POST attribute, the following environment variables are set: Environment variable Contains CONTENT_LENGTH The length of the data sent via the standard input to the CGI program. CONTENT_TYPE The MIME type of the data. [Try it] A simple script to record in a log file data sent by a user is: #!/bin/sh echo Content-type: text/html Remember: echo echo * To use a full path name for echo "" the location of the file in echo "" which the information is echo "" recorded. echo "" echo "

Data recorded

" On a Unix system: echo Use the back arrow on the browser echo to return to the original web page * The first line is #!/bin/sh echo "" * The file is set executable echo "" and setuid to you cat >> /home/snowwhite/staff/mas/log echo >> /home/snowwhite/staff/mas/log An example of its use is shown below: Generated form HTML markup required
----------------------------------------- ------------------------------------------------------------ ----------------------------------------- [Index] Check list It is important to make sure that the CGI script is: * Executable * Placed in the cgi-bin directory. The location of this directory is defined by the WWW server administrator. * Capable of accessing any files that it needs. As the CGI script runs in the /cgi-bin directory, any file paths which are relative, must be relative to this directory. When returning HTML which accessing images or uses hypertext links, it is easier if these are specified by using the http form of a URL. * On a Unix server the set uid bit should be set on the executable script file. This causes the program to execute as if it had been run by the owner of the file. This is important if the program is to access resources that only the creator of the CGI script can access. This of course may open up many security loop holes.