The goal of this project is to build a functional web server using low-level networking primitives. This assignment will teach you the basics of network programming, client/server architectures, and issues in building high performance servers. In addition to writing your server, you will also write a document explaining its behavior and your major design choices.
This project should be done in teams of two (or three, with prior permission). However, remember that the objective of working in a team is to work *as* a team - i.e., you should not try to approach the project by splitting up the work. All team members are expected to work on all parts of the project.
Your task is to write a simple web server capable of servicing remote clients by sending them requested files from the local machine. Communication between a client and the server is defined by HTTP (the Hypertext Transfer Protocol); your server will both need to understand HTTP requests sent by clients as well as respond as defined by HTTP.
Your server must support the core functionality of both the HTTP 1.0 and the HTTP 1.1 standards, with several notable limitations:
GET
method, not others like HEAD
, POST
, etc.200
, 400
, 403
, and 404
in your server responses.Content-Type
, Content-Length
, and Date
headers in your server responses.HTML
, TXT
, JPG
, and GIF
.In short, the only request headers you need to be concerned with are "Host" and "Connection", and the only response headers you need to be concerned with are "Date", "Content-Length", and "Content-Type". However, feel free to extend your server to provide any functionality not required by the base specification.
Your server program must be written in C or C++ on Linux and must accept (at least) the following two command-line arguments:
-p [num]
to set the port on which the server listens. If this option is ommitted, the server should default to port 8888.-r [path]
to set the directory out of which all files are served, called the document root. If this option is ommitted, the server should default to the current working directory.For example, you could start the server on port 8887 using the document root serverfiles
like the following:
./server -p 8887 -r serverfiles
Command-line options may appear in arbitrary order; therefore, you should use getopt
for parsing arguments. Also note that unless your document root starts with a /
, it is a relative path, and therefore is interpreted relative to the current working directory.
Finally, as with most real web servers, requests for a directory (e.g., GET /
or GET /catpictures/
) should default to fetching index.html
(i.e., index.html
is the default filename if none is provided).
There are several ways you can test your server. The first is to simply access your server in a browser - if your server is running on port 8888, then you can type turing.bowdoin.edu:8888/index.html
into your web browser to access index.html
on the server. However, using your browser may not be so helpful during initial debugging, as the browser window will generally simply hang if something's not right. A more effective initial testing approach is to use telnet
, which is a tool for sending arbitrarily-formatted text messages to a server. For example, below is an example of connecting to bowdoin.edu
on port 80 and then sending a valid HTTP formatted request for the file index.html
:
$ telnet www.bowdoin.edu 80 GET /index.html HTTP/1.0
Note that in the above command, there must be two carriage returns (i.e., blank lines) after the "GET" line in order to complete the command. The response to this request will be the HTTP-formatted response from the server.
As an intermediate step, you can also use the wget
or curl
utilities. These utilities provide command-line HTTP clients - wget
will send HTTP/1.0 requests, while curl
will send HTTP/1.1 requests (though can be configured to send HTTP/1.0 requests as well). Consult the man
pages for details on proper usage.
A recommended testing strategy is to use telnet
initially, then move to wget
and/or curl
, then finally graduate to a full-blown browser once things seem to be working.
Important: Do not leave your server running indefinitely! Whenever you are done working, make sure to terminate your server (Control-C) before logging off the server. Leaving a server running will both take up port numbers and potentially expose security flaws to the outside world.
To test that HTTP 1.1 is working properly, you will want to test on a web page with embedded images (so that multiple files will be requested in order to load the full page). Here is a sample document root that you can use for this purpose. Using a relatively simple page such as this will be easier for testing than a full-blown page with many components (e.g., Bowdoin's home page or similar).
This section contains tips and advice on going about various parts of the program.
At a high level, your web server will be structured something like the following:
Forever loop:
Accept new connection from incoming client
Parse HTTP request
Ensure well-formed request (return error otherwise)
Determine if target file exists and if permissions are set properly (return error otherwise)
Transmit contents of file to connect (by performing reads on the file and writes on the socket)
Close the connection (if HTTP/1.0)
You have three main choices in how you handle multiple clients within the structure of the above simple design:
pthreads
thread library (i.e., pthread_create
).
pipe
to allow your processes to communicate (and thereby
avoid just creating a new process every time).
select
system
may also be quite useful.
Remember that HTTP requests will specify relative filenames (such as index.html
)
which are translated by the server into absolute local filenames. For example, if your
document root is in ~username/cs3325/proj1/mydocroot
, then when a request
is received for foo.txt
, the file that you should read is actually
~username/cs3325/proj1/mydocroot/foo.txt
.
The translated filename may exist and be readable, or it may exist but be unreadable (e.g., due to file permissions), or it may not exist at all. A missing file should result in HTTP error code 404, while an inaccessible file should result in HTTP error code 403.
Remember that the default filename (i.e., if just a directory is specified)
is index.html
. This is why the
two URLs http://www.bowdoin.edu
and http://www.bowdoin.edu/index.html
return the same page. Also note that some pages, such as Bowdoin's home page above, actually redirect to a different (i.e., the real) home page.
This redirection normally happens automatically in a browser, so you don't even realize it's happening,
but if testing with telnet
, you may see a very short page simply instructing
the browser to request a different file instead.
When you fetch an HTML web page in a browser (i.e., a file of type text/html
),
the browser parses the file for embedded links (such as images) and then retrieves those
files from the server as well. For example, if a web page contains 4 images, then a total of
5 files will be requested from the server. The primary difference between HTTP 1.0 and HTTP 1.1
is how these multiple files are requested.
Using HTTP 1.0, a separate connection is used for each requested file.
While simple, this approach is not the most efficient. HTTP 1.1 attempts
to address this limitation by keeping connections to clients open,
allowing
for "persistent" connections and pipelining of client
requests. That is, after the results of a single request are returned
(e.g., index.html
), if using HTTP 1.1, your server should leave the connection open for
some period of time, allowing the client to reuse that connection to make subsequent requests.
One key issue here is determining how long to keep the connection open.
This timeout needs to be configured in the server and ideally should be dynamic based
on the number of other active connections the server is currently supporting. Thus if the server
is idle, it can afford to leave the connection open for a relatively long period of time.
If the server is busy servicing several clients at once, it may not be able to afford to
have an idle connection sitting around (consuming kernel/thread resources) for very long.
You should develop a simple heuristic to determine this timeout in your server (but feel
free to start with a fixed value at first).
Socket timeouts can be set using setsockopt
. Another option for implementing timeouts is the select
call.
Since a significant part of this assignment involves working with strings, you will want
to refamiliarize yourself with C's string processing routines, such as strcat
,
strncpy
, strstr
, etc. Also remember that pointer arithmetic can
often result in cleaner code (e.g., by maintaining pointers that you increment rather than
numeric indices that you increment).
One important thing to remember when sending and receiving data over a network socket is that
what you are really doing is reading or copying data to a lower-level network data buffer.
Since these data buffers are limited in size, you may not be able to read or send all desired data at once.
In other words, when receiving data, you have no guarantee of receiving the entire request at
once, and when sending data, you have no guarantee of sending the entire response at once. As a
result, you may need to call send
or recv
multiple times in the course
of handling a single request.
In addition to your program itself, you will also write a short paper (2-4 pages) that describes your server. A typical format for a systems-style paper such as this is something like the following:
While you do not need to rigidly adhere to this structure, it is a good basic framework to follow.
Your writeup should also clearly state anything that does not work correctly, and any major problems that you encountered.
Finally, include a discussion in your writeup addressing the following questions. Since your web server is on campus, it is unlikely that you will notice any significant performance differences between HTTP/1.0 and HTTP/1.1. Can you think of a scenario in which HTTP/1.0 may perform better than HTTP/1.1? Conversely, can you think of a scenario in which HTTP/1.1 outperforms HTTP/1.0? Think about bandwidth, latency, and file size. You should also consider the behavior of TCP when establishing a connection.
Submit your assignment to Blackboard as a tarball, e.g.:
tar czvf proj1.tar.gz your-project-files
which will create proj1.tar.gz
from the files in your-project-files
). Your project files should include (1) your source files, and (2) a Makefile that allows me to build your server by running make
.
Separately (by the writeup due date, which is 48 hours after the code due date), you should upload your writeup to Blackboard as a PDF.
Finally, if working in a group, you must *individually* email me a group report at the end of the project summarizing your contributions and the contributions of your partner(s) to the project. The goal of this policy is to promote an equitable distribution of work within the group. Your report is not shared with your group, but in the event of clearly uneven contributions, I reserve the right to adjust individual grades up or down from the group grade. Reports need not be lengthy, and in many cases may be as simple as "We worked on the entirety of the project together in front of one machine" or similar. Submit your individual group report to me by email by the writeup deadline. You should only submit a single group report covering both coding and writing.
Your project will be graded on following the assignment specification, your program's design and style, and the quality of your project writeup. You can (and should) consult the Coding Design & Style Guide for tips on design and style issues. Please ask if you ahve any questions on what constitutes good program design and/or style that are not covered by the guide.
Here is a list of available resources to help you get started with various aspects of this assignment. As always, Google and Linux man pages will also be useful.
fork
(process creation) tutorial: pthreads
(thread creation) tutorial: getopt
: