CSCI 3325
Distributed Systems

Bowdoin College
Spring 2019
Instructor: Sean Barker

Project 4 - Final Project

The previous three projects have given you experience with single-tier client/server systems, multi-tier client/server systems, and cluster-based systems. The purpose of your fourth and final project is to give you experience with computing in the wide-area, as well as designing a more complex distributed system that will run on many machines.

Your final project should be done in teams of two or three. All team members are expected to contribute equally to all parts of the project (coding, presentation, writeup, etc).

Project Specification

The specification for this project is much more open-ended than for the previous projects, and the type and purpose of your distributed system is left up to you. However, your system to adhere to the following broad guidelines:

Beyond these guidelines, creativity is encouraged! I will consider most project ideas if they are thoughtful and reasonably feasible.

Technology

The technology and languages(s) you choose to use in your system are up to you. While you are welcome to use any of the technologies that we have already used or discussed in the semester to implement your system (e.g., XML-RPC, Java RMI, sockets, etc), you are not restricted to any particular language or communication framework. You are also permitted to use third-party systems or frameworks within your project, with the understanding that you would not receive credit for functionality provided by third-party systems. For instance, constructing a 50-machine Hadoop cluster and using it to compute an inverted index would technically fulfill the project requirements, but your only real contribution would have been writing the inverted index functions.

Sample Project: P2P File Transfer

If you are struggling to come up with a project idea, a suggested project is to build a simple peer-to-peer file transfer application. In such an application, the basic idea is that a peer that wishes to download a particular file can download it simultaneously from all peers that have at least part of the file. Thus, a group of peers can distribute the file more rapidly than if a single server had to send the file in its entirety to a number of clients that want to download the file. The figure below shows this basic design (in which peers exchange 'chunks' of the file with each other).

arch

Note that even if you choose to pursue this suggested project idea, there is still room for creativity! For example, questions that you might consider in the design of your system include (a) how do peers organize the connections between them, (b) how do peers locate a file that they wish to download, or (c) how do peers decide what chunks to download (and who to download from).

Infrastructure

As in the previous project, you will be provisioned with a set of Amazon machines with which to run your systems. The infrastructure in this project is different in several key ways, however: (1) all machines are shared among all groups, (2) machines are no longer geographically in the same area, but rather are spread around the world, and (3) you will have access to a greater number of machines, and should be aiming to run your system on as many of them as possible. Some of these machines may not be as responsive as others, or may have slower network connections, etc. This is part of the challenge of operating in the wide-area!

Since these machines are shared, you do not have sudo permissions on these machines. If you need specific software installed, let me know and I can probably install it for you. Also, please be good citizens whenever possible! While some degree of interference is inevitable, you should not try to max out all the machines transferring data at full speed for long periods at a time. This will make your classmates unhappy.

General Advice

Design

Regardless of whether you choose your own project or opt for the suggested project, it is important that you actively manage the complexity of your system, especially at first. It is much better to start with a simple design, implement it, then add features later, rather than starting with an overly complex design and never getting a working prototype! Ideally, you should start by planning a base 'core' of the system that you are sure you can implement, then a set of extensions that you can add once the base system is running.

Be thoughtful about what metric(s) you are optimizing for when you build your system. For example, in the context of the file transfer application, are you trying to minimize transfer time, aggregate bandwidth, or something else? Make sure you discuss these decisions in your writeup.

Finally, remember that system design is all about tradeoffs (e.g., performance vs fault tolerance, complexity vs scalability), and you are almost certainly going to need to make compromises. The key point is to be conscious of what these compromises are!

Implementation

One of the challenges of this project is managing and running a distributed application without an off-the-shelf control infrastructure (such as provided in Hadoop). Don't try to run your system on 20 machines by opening up 20 terminal windows! Instead, it is strongly recommended that you automate the process of deploying and running your application through scripts whenever possible. Every systems programmer should be proficient in at least one scripting language (e.g., Python, Bash, Perl), and having such a language in your toolkit will save you lots of time trying to run your system. For instance, rather than SSHing to 20 machines and issuing the same command one-by-one, just write a script that automatically issues the command over SSH to all the machines you're trying to run on! Consider automating any task that you find yourself repeatedly performing and keep the principle of DRY in mind (Don't Repeat Yourself).

Of course, initially you will be better served by just running on a few machines during development, and then scaling up to more machines once your system is running.

You are welcome to use the regular class server while implementing your system. For instance, you might choose to develop your application on the class server, and run scripts from there to deploy and run your system on the Amazon servers.

Presentation and Writeup

In addition to designing and building your system, you will write a paper detailing and evaluating your system, as well as present your project to the class at the end of the semester as detailed below.

Submission and Dates

You will need to submit a project proposal as well as a series of intermediate checkpoints leading to the final due date, as detailed below.

All submissions (intermediate and final, except for group reports) should be made via gzipped tarball to Blackboard.

Evaluation

Your project will be graded on following the assignment specification, your program's design and style, the quality of your presentation, and the quality of your final writeup. While intermediate checkpoints will not be explicitly graded as such, they are required and failure to satisfactorily complete them may result in penalties. You can (and should) consult the Coding Design & Style Guide for tips on design and style issues. Please ask if you have any questions on what constitutes good program design and/or style that are not covered by the guide.

Note that the quality of your presentation and especially your writeup are of particular importance to this project, since they are the primary means by which I will understand what you have accomplished. Just like in a real systems research project, the final deliverable is less the system itself than the design, evaluation, and discussion of the system. As such, it is essential that you do not neglect these aspects of your project.