Project 4 - Final Project

The previous three projects have given you experience with single-tier client/server systems, multi-tier client/server systems, and cluster-based systems. The purpose of your fourth and final project is to give you experience with computing in the wide-area, as well as designing a more complex distributed system that will run on many machines.

Your final project should be done in teams of two or three. All team members are expected to contribute equally to all parts of the project (coding, presentation, writeup, etc).

Project Specification

The specification for this project is much more open-ended than for the previous projects, and the type and purpose of your distributed system is left up to you. However, your system to adhere to the following broad guidelines:

Your system should be designed to facilitate reasonable scalability. As a practical guideline, you should be able to demonstrate your system running on at least ~20 machines.
Your system should display reasonable fault-tolerance (to the extent possible, given the architecture of your system). For example, if your design includes some kind of master server, your system might not be able to survive an outage of that particular server, but your system should not go completely offline if any one of your ~20 machines goes down.
You should be able to demonstrate the effectiveness of your system using some well-defined metric(s), which can be empirically tested. Evaluating your system along these metrics will be an important part of your writeup. However, what these metrics are is left up to you.

Beyond these guidelines, creativity is encouraged! I will consider most project ideas if they are thoughtful and reasonably feasible.

Technology

The technology and languages(s) you choose to use in your system are up to you. While you are welcome to use any of the technologies that we have already used or discussed in the semester to implement your system (e.g., XML-RPC, Java RMI, sockets, etc), you are not restricted to any particular language or communication framework. You are also permitted to use third-party systems or frameworks within your project, with the understanding that you would not receive credit for functionality provided by third-party systems. For instance, constructing a 50-machine Hadoop cluster and using it to compute an inverted index would technically fulfill the project requirements, but your only real contribution would have been writing the inverted index functions.

Sample Project: P2P File Transfer

If you are struggling to come up with a project idea, a suggested project is to build a simple peer-to-peer file transfer application. In such an application, the basic idea is that a peer that wishes to download a particular file can download it simultaneously from all peers that have at least part of the file. Thus, a group of peers can distribute the file more rapidly than if a single server had to send the file in its entirety to a number of clients that want to download the file. The figure below shows this basic design (in which peers exchange 'chunks' of the file with each other).

Note that even if you choose to pursue this suggested project idea, there is still room for creativity! For example, questions that you might consider in the design of your system include (a) how do peers organize the connections between them, (b) how do peers locate a file that they wish to download, or (c) how do peers decide what chunks to download (and who to download from).

Infrastructure

As in the previous project, you will be provisioned with a set of Amazon machines with which to run your systems. The infrastructure in this project is different in several key ways, however: (1) all machines are shared among all groups, (2) machines are no longer geographically in the same area, but rather are spread around the world, and (3) you will have access to a greater number of machines, and should be aiming to run your system on as many of them as possible. Some of these machines may not be as responsive as others, or may have slower network connections, etc. This is part of the challenge of operating in the wide-area!

Since these machines are shared, you do not have sudo permissions on these machines. If you need specific software installed, let me know and I can probably install it for you. Also, please be good citizens whenever possible! While some degree of interference is inevitable, you should not try to max out all the machines transferring data at full speed for long periods at a time. This will make your classmates unhappy.

General Advice

Design

Regardless of whether you choose your own project or opt for the suggested project, it is important that you actively manage the complexity of your system, especially at first. It is much better to start with a simple design, implement it, then add features later, rather than starting with an overly complex design and never getting a working prototype! Ideally, you should start by planning a base 'core' of the system that you are sure you can implement, then a set of extensions that you can add once the base system is running.

Be thoughtful about what metric(s) you are optimizing for when you build your system. For example, in the context of the file transfer application, are you trying to minimize transfer time, aggregate bandwidth, or something else? Make sure you discuss these decisions in your writeup.

Finally, remember that system design is all about tradeoffs (e.g., performance vs fault tolerance, complexity vs scalability), and you are almost certainly going to need to make compromises. The key point is to be conscious of what these compromises are!

Implementation

One of the challenges of this project is managing and running a distributed application without an off-the-shelf control infrastructure (such as provided in Hadoop). Don't try to run your system on 20 machines by opening up 20 terminal windows! Instead, it is strongly recommended that you automate the process of deploying and running your application through scripts whenever possible. Every systems programmer should be proficient in at least one scripting language (e.g., Python, Bash, Perl), and having such a language in your toolkit will save you lots of time trying to run your system. For instance, rather than SSHing to 20 machines and issuing the same command one-by-one, just write a script that automatically issues the command over SSH to all the machines you're trying to run on! Consider automating any task that you find yourself repeatedly performing and keep the principle of DRY in mind (Don't Repeat Yourself).

Of course, initially you will be better served by just running on a few machines during development, and then scaling up to more machines once your system is running.

You are welcome to use the regular class server while implementing your system. For instance, you might choose to develop your application on the class server, and run scripts from there to deploy and run your system on the Amazon servers.

Presentation and Writeup

In addition to designing and building your system, you will write a paper detailing and evaluating your system, as well as present your project to the class at the end of the semester as detailed below.

Paper: Your final paper should be more substantial than your writeups thus far and should be written in the style of a research paper (e.g., intro, design/architecture, implementation, evaluation, related work, conclusions). This document should be a self-contained, detailed description of your project and your evaluation of your system. You should aim for 8-10 double-spaced pages in length (~6 pages in single-spaced double-column format) and should include, at a minimum, a figure depicting the architecture of your system and 3 graphs evaluating your system with accompanying analysis. Don't underestimate how much time running experiments may take!
Presentation: You will also present the design and results of your system in a 15 minute presentation to the class during the last ~week of the semester. While your project does not need to be complete at the time of your presentation (since this will be a week or more before your final paper and code is due), you should be able to present at least preliminary results of running your system. Managing the initial complexity of your design will help with your presentation by forcing you to build your system incrementally and making sure you have a working early prototype.

Submission and Dates

You will need to submit a project proposal as well as a series of intermediate checkpoints leading to the final due date, as detailed below.

Proposal (Tuesday, April 16): 1 page describing your project and where you plan to be at each checkpoint. If I think your project is under-specified or not feasible, I may ask you to revise.
Checkpoint 1 (Tuesday, April 23): ~1 page describing your progress and your current code (even if non-functional).
Checkpoint 2 (Tuesday, April 30): ~1 page describing your progress and your current code (even if non-functional, though that would probably concern me at this point).
Presentations (Thursday, May 2 and Tuesday, May 7): in-class, order TBD. Groups presenting on the second day will have had an extra 5 days to work, which will be considered in light of what you have accomplished by the time of your group's presentation.
Due Date (Tuesday, May 14, 5pm): final code and writeup. This deadline is firm. Your final submission should include all source code files, any scripts you may have written to aid in running your application, your Makefile (or compile instructions) and any other information that might be needed to run your code. Also please submit individual final group reports to me by email.

All submissions (intermediate and final, except for group reports) should be made via gzipped tarball to Blackboard.

Evaluation

Your project will be graded on following the assignment specification, your program's design and style, the quality of your presentation, and the quality of your final writeup. While intermediate checkpoints will not be explicitly graded as such, they are required and failure to satisfactorily complete them may result in penalties. You can (and should) consult the Coding Design & Style Guide for tips on design and style issues. Please ask if you have any questions on what constitutes good program design and/or style that are not covered by the guide.

Note that the quality of your presentation and especially your writeup are of particular importance to this project, since they are the primary means by which I will understand what you have accomplished. Just like in a real systems research project, the final deliverable is less the system itself than the design, evaluation, and discussion of the system. As such, it is essential that you do not neglect these aspects of your project.