MinCGI
You are tasked with building a web server that will handle multiple connections and execute
applications that are compliant with a subset of the Common Gateway Interface specification. The
Common Gateway Interface is used by web-servers to allow compatible programs to be executed
on the server, to process HTTP requests and compute a HTTP response. The web-server uses
standard input and output as well as environment variables for communication between the CGI
program and the itself. HTTP requests that are received by the web-server can be passed to a CGI
program via shell environment variables and standard input.
What is a web-server?
A web-server is a common application which serves web pages and commonly executes queries
as part of a request. A web-server interacts with clients through the HTTP protocol, a client will
send a HTTP Request, the web server will process it and send a HTTP response to the client. A
client is typically a web-browser such as Mozilla Firefox, Google Chrome, Microsoft Edge and
Apple Safari.
What is CGI?
Common Gateway Interface is a specification for a web-server to allow programs and scripts to
process requests that are given to it via standard input and environment variables. A CGI Program
will process the data and output a HTTP request via standard output.
HTTP Request
The following are request headers that are sent by a web browser. You will need to retrieve each
one. Each header is optional, content-type and content-length are
Accept - Content type of the request
Host - Domain of the server
User-Agent - The browser used by the client
Accept-Encoding - Encoding of the stream
Remote-Address - The IP Address of the client
Content-Type - The content type of the request body
Content-Length - Size of the request body in bytes
Query Strings is data after a ? symbol that is placed at the end of the request URL. A query string
will list variables in a string, each variable and their value pair will be separated by a & character.
Below is an example of a query string where the variables are name and age .
A HTTP request will specify the method, resource name and the HTTP protocol version on the first
line, followed by the request headers on each line afterwards. The request body is separated by
an empty new line after the last request header.
http://localhost:9001/hello?name=James&age=25
Example - GET Request 1
URL Example:
Protocol Example:
Example - GET Request 2
URL Example:
Protocol Example, web browser can provide additional headers:
Example - GET Request 3
URL Example:
Protocol Example:
Example - GET Request 4 (Query String Example)
URL Example:
Protocol Example:
http://127.0.0.1/ GET / HTTP/1.1 http://127.0.0.1/ GET / HTTP/1.1 Host: 127.0.0.1 http://127.0.0.1/js/helper.js GET /js/helper.js HTTP/1.1 Accept: application/javascript http://127.0.0.1/cgibin/posts?id=20 GET /cgibin/posts?id=20 HTTP/1.1 Host: 127.0.0.1
CGI Environment Variables
The following are common gateway variables which you will retrieve from request headers,
configuration file and execution environment. You will need to set these variables (if found in the
above sources) as a shell environment variable for the CGI Program.
HTTP_ACCEPT - Content type of the request, this value is retrieved from the HTTP request,
omitted if it is not part of the request
HTTP_HOST - Domain of the server, this value is retrieved from the HTTP request, omitted if
it is not part of the request
HTTP_USER_AGENT - The browser used by the client, this value is retrieved from the HTTP
request, omitted if it is not part of the request
HTTP_ACCEPT_ENCODING - Encoding of the stream, this value is retrieved from the HTTP
request, omitted if it is not part of the request
REMOTE_ADDRESS - The IP Address of the client, this value is retrieved from the connection
REMOTE_PORT - The port the client is connected on, this value is retrieved from the
connection
REQUEST_METHOD - The HTTP method, this value is retrieved from the HTTP request
REQUEST_URI - The resource requested by the client, this value is retrieved from the HTTP
request
SERVER_ADDR - The server address, this value is retrieved from server socket.
SERVER_PORT - The port the server is listening on, this value is retrieved from configuration
or server socket
CONTENT_TYPE - Content type of the information supplied by the body of a POST request,
omitted if it is not part of the request (POST request only)
CONTENT_LENGTH - Length of the request body, this value is retrieved from HTTP request,
omitted if it is not part of the request (POST request only)
QUERY_STRING - Information after a '?' at the end of the request URL. Will map variables to a
value in a single string, separated by '&' characters, omitted if it is not part of the request
You will need to pass the HTTP request body via standard input to the CGI Program. The request
body is important when form data is sent to the server.
Content-Type Mapping and Processing
Your web-server will need to map each file extension to the following content-type. When a http
requests a file of one of the following type, the content-type should be mapped to the following.
Any file ending with .txt , is mapped to text/plain
Any file ending with .html , is mapped to text/html
Any file ending with .js , is mapped to application/javascript
Any file ending with .css , is mapped to text/css
Any file ending with .png , is mapped to image/png
Any file ending with .jpg or .jpeg , is mapped to image/jpeg
Any file ending with .xml , is mapped to text/xml
HTTP Response
Your server only needs to handle GET HTTP and be able to return statuses for both files and CGI
programs. Your data must return the HTTP Protocol Version, Status Code and Status Message,
However, your web-server does not need to handle all of the different return statuses and these
will typically be defined by the web application itself.
For files that will not involve the command gateway interface, your web-server should do the
following.
If the file exists, it should return 200 OK and the content of the file. Make sure the contenttype is set to what the extension maps to.
If the file does not exist, it should return 404 File not found , your server should send the
following HTML to the client.
Command gateway interface programs may encounter a problem during execution. The web
server should check the exit code of the application executed to detect that the command
executed and if the program returned data. If the command is not found or the program crashes
(exits with non-zero status), the web-server must respond with 500 Internal Server Error .
Special case: HTTP request on / , your server must retrieve a file named index.html at the root
of your staticfiles path.
HTTP Responses follow the following form
Variation for Status Code
CGI programs are able to specify the status code of the HTTP response as well as the message. If
the status code is absent on the first line of the response, your web-server should assume it
returns status code 200 with the message OK ( HTTP/1.1 200 OK ).
However, a CGI Program can specify the status code on the first line in this format.
In the event, the CGI program specifies this format on the first line, your web-server should fix the
HTTP response from CGI Program to format the first line by extracting the Status Code and the
Message and using them as part of the standard response.
404 Not Found 404 Not Found
Status-Code:
Example - Status 200, 1
HTTP Request
Example Response
Example - Status 200, 2
HTTP Request
HTTP Response
Example - Status 404, 1
HTTP Request
HTTP Response
GET / HTTP/1.1 HTTP/1.1 200 OK Content-Type: text/html Hello World!
GET /goodmorning.html HTTP/1.1 Accept: text/html HTTP/1.1 200 OK Content-Type: text/html Good Morning!
GET /missingfile.html HTTP/1.1 HTTP/1.1 404 Not Found Content-Type: text/html 404 Not Found 404 Not Found
Example - Status 200, 3, Image Type
HTTP Request
HTTP Response
Configuration File
You will read in a configuration file that will specify the static file directory and the cgi-bin
directory for your application. As with most web servers, your configuration will need to know the
port it is listening on.
The configuration file is a simple text file where each property is assigned to a value. When
reading the configuration file, you will need to retrieve each of the following properties from the
file and their value.
staticfiles - string, path to the files that will be simply served by the web-server.
cgibin - string, path to the CGI executables.
port - integer, the port the server listens on.
exec - string, executable path or command that will be executed on inbound cgibin requests.
This is a simple file where each line will have a key and a value separated by = .
The configuration file will be supplied as a command line argument for your server. For example,
your server will be invoked like so python webserv.py config.cfg .
In the event the configuration is not passed to the program, the program should output Missing Configuration Argument .
If the path does not exist for the configuration file, the program should output Unable To Load Configuration File . If the configuration file does not contain the four fields specified, the
program should output Missing Field From Configuration File .
Static Files
Your web server must retrieve all static files within a specified directory, this will also need to
retrieve any file that exists within a folder in that directory. However, your web-server should
forbid accessing files that exist in the parent directory of staticfiles.
GET /image.png HTTP/1.1 Accept: image/png,*/* HTTP/1.1 200 OK Content-Type: image/png staticfiles=./root cgibin=./bin port=8070 exec=/bin/python
Parse HTTP/1.1 - Supplementary
Your web server will need to parse HTTP/1.1, you can refer to HTTP/1.1 RFC located here: https://t
ools.ietf.org/html/rfc7230. This is a stateless protocol that will simply acknowledge a request and
serve a file or in the case of cgibin files, parse the request, set the environment variables, launch
the CGI Program and send the request body via standard input to the program.
Specifically, you can refer to section 2 which outlines the HTTP URI scheme.
You may refer to the following examples of HTTP requests that will be sent to a client. You can
also test using your web browser by visiting your web server. You do not need to implement the
full HTTP/1.1 specification such as cache-validation as this can be difficult to get right.
Common Gateway Interface - Supplementary
Common gateway interface for version 1.1 is outlined here: https://tools.ietf.org/html/rfc3875.
Common gateway interface is a web standard protocol for web-servers to communicate to
executables and pass data to and from to these executables. Specifically, cgibin has its own
executable folder and the configuration format outlines the executable to be used with the files
within this directory.
The CGI Request is outlined in section 4 of the Common Gateway Interface RFC. The interface is
quite simple and just involves using the standard input and standard output file descriptors as
described in previous sections.
Extension
Try to implement one of the following extensions for your web server.
Handling POST requests from browsers and accepting form data.
Compressed packets to send back to the browser. Use the gzip module in python.
Your own extension, please check with a staff member about the eligibility of your extension.
It must be relevant to the problem and demonstrate a similar amount of complexity to the
extensions above.
You must provide test cases to demonstrate how it works and provide a README with your
submission outlining your extension.
CGI-Web for everyone
Refer to below as a task list for your web server. Prioritise the configuration, accepting
connections and static files before getting CGI Programs working.
Load a configuration file
Accept connections
Parse HTTP/1.1 data
Set up all environment variables
Fork and execute the CGI Program or retrieve static file
If CGI Program, pipe the data from the application back to the web-server
After implementing your server, try to improve performance and make all requests
asynchronous by forking each connection that is accepted.
Marking - Due 22/11/20 11:59 PM
The assessment weight is 20%, it is divided into 3 parts. Your submission will be assessed on the
following.
Automatic Test Cases (10%) - Automatic test cases your submission will be submitted to. This
will check that your application is compliant with the MinCGI specification.
Test Cases (4%) - Your submission must include your own set of test cases, these test cases
should be shell scripts that make a request to your web-server using curl and check the
final result against an expected file using diff .
Manual Marking (4%) - Your submission will be marked against a style, please make sure you
have a consistent style and conform. You can supply a README file with your submission
that describes your programming. If you do not supply a README that discusses your
programming, we will assess your source against the PEP-8 specification.
Extension (2%) - After implementing your CGI web-server, extend your it by implementing
one of the extensions above.