Requirements Documentation

 

DISTRIBUTED VERSIONING SYSTEM

 

VOLVOX

 

 

volvox.jpg

 

URL: http://www.cmu.edu/volvox

By:

Rahul Raheja

Adam Goldhammer

Karthik Krishnan

Meghal Gosalia


 

Contents

Abstract 3

Project Description.. 4

Quality Attributes/ Functional Requirements. 6

System Design.. 9

Architecture (diagrams using ACME Architecture Description Language - ADL). 9

Communication Architecture: Client Server (Request Response Type). 9

Node Architecture: Tiered Architecture (Call Return Type). 10

Architectural Elements. 11

Code Organization - Class Diagram... 12

Data Structures. 14

Tracker. 14

Reverse Lookup. 14

Project Config file. 14

Volvox Config file. 14

Revision Trees. 14

Interactions & Communication Protocol - Sequence Diagrams. 16

Add File. 16

Checkout. 17

Update. 18

Commit. 19

Synchronization.. 21

Demo sequence. 22

Midpoint Demo Sequence: 22

Final Demo Sequence: 22

Use Cases. 23

Stretch Goals. 24

Schedule And Responsibilities. 25

References. 26

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Abstract

 

Most current versioning systems rely on a central server to store the project, from which user’s checkout and check in files.  This leaves a single point of failure in the network, and costs for hosting the files.  A distributed approach would be cheaper, by using the local disks of the users for storage, as well as more failure tolerant; no one node can fail and bring down the entire system.  Even with these advantages, current distributed versioning systems still try and force users into a linear workflow; users are expected to fold changes back in each commit.  Volvox versioning system provides distributed file storage and nondestructive editing designed to be used in networks that are prone to fragmentation, allowing for a more streamlined workflow than other distributed versioning projects while still offering the advantages.  Volvox will allow users to commit files to the repository even if they are operating with only one other visible node, and will attempt to automatically resolve conflicts during a commit.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Project Description

 

Revision control is the management of multiple versions of a file and is commonly used in collaborative projects. Manually managing files is difficult when multiple people are working on the same project, making automated systems very helpful in tracking changes, recovering old versions, and synchronizing users.

 

There are many popular version control systems, such as Subversion and Perforce. However, these systems are based on a centralized server access system. Users contact a designated node each time they need to check in or out a file.  This kind of architecture makes the server as a single point of failure. For example, if the server gets compromised, or fails due to some fault, then the users can’t collaborate; version tracking is also lost when the network is partitioned.  Moreover, people have to buy server space on nodes that provide versioning capabilities to host their projects; our motivation comes from the fact that all the users have enough storage space on their own machines to host their project and even version them. Hence, we propose to remove the central server concept and host the version files on the collaborating user’s own machines. This gives rise to the concept of a distributed versioning system.

 

Today, some systems already exist that provide distributed versioning capabilities. For example, Mercurial is one such system that offers such a capability.  Instead of using one computer as a server, Mercurial uses file storage on all users’ machines, avoiding the central failure point. However, we believe that they still lack in availability.

 

 

Consider the following scenario:

 

There are two users Alice and Bob that are working on a project (Figure 1). They have the same working head that is “d”. Now Bob commits twice creating “e” and “f” and gets a conflict since Alice has her own working copy. So Bob resolves the conflicts and then finalizes the commit. At this commit, the changes are not pushed out on other users. So, if Bob goes down, or goes offline, then if Alice wants to commit or update, she wouldn’t get the consistent set of files. So the problem here is pushing out commit changes to other users for redundancy, so that Alice has availability in most situations.

 

 

                                    Figure 1. Picture from [1]

Volvox is a distributed versioning system that attempts to address this issue. 

Volvox has all the features of a distributed versioning system plus it adds more availability to the system with a minor increase in network bandwidth.  Also, Volvox allows the project tree to branch as the users commit, and attempts to then merge the branches automatically, or later by users resolving conflicts. Volvox’s distributed nature prevents any single node failure from stopping the network. This makes it more fault-tolerant and less expensive by storing the files in multiple nodes, removing the single point of failure.  The project branching provides a much easier workflow than Mercurial type systems; allowing coders to concentrate on coding, while the system merge most files.

 

The following diagram (Figure 2) shows the overall architecture of the system; how users communicate with each other. Even in case of a network partition, users can commit and get the latest versions within their partition (with nodes going offline or leaving partitions), and later when partition joins the mainstream network, they merge.

 

 

 

Figure 2. Volvox System View


Quality Attributes/ Functional Requirements

 

 

Quality Attribute

Availability

Stimulus

Not able to get access to revision details, updates or revision files

Source(s) of the stimulus

Checkout or update of project

Relevant Environmental Conditions

Work timing differences or inter-continental fiber optic cable disruptions

Architectural Elements

Data Access Logic

System Response

User able to get latest versions of files even if all nodes not online. Access to project revision files, whether or not all project members are online, or node is in a partitioned network

Response Measure

All files updated

 

 

Functional Requirement

Portability

Stimulus

Installing the Volvox software on a COTS hardware unit fails

Source(s) of the stimulus

Customer

Relevant Environmental Conditions

Volvox software is being prepared for deployment at the customer location

Architectural Elements

Volvox application software and hardware

System Response

Volvox software is loaded on the device hardware platform

Response Measure

All application properties and features are installed in a reasonable period of time

 

Functional Requirement

Usability

Stimulus

User rejects software because of having to change the way he is used to using a versioning system

Source(s) of the stimulus

Volvox use cases

Relevant Environmental Conditions

During initial distribution of product

Architectural Elements

User Interface component of Volvox

System Response

Behavior as expected from a client server system

Response Measure

User study showing good acceptance

 

Quality Attribute

Modifiability

Stimulus

Making changes to existing modules to add/remove functionality effects changes in more than one place

Source(s) of the stimulus

Developers/Maintainers

Relevant Environmental Conditions

Adding new functionality or improving existing functionality

Architectural Elements

Volvox application components

System Response

Each functionality is implemented as a separate module and hence eases modification

Response Measure

Modification time is as less as possible

 

 

Quality Attribute

Security

Stimulus

The files are compromised during transfer

Volvox assumes that the individual nodes are secure and files need not be hashed

Source(s) of the stimulus

Volvox application software

Relevant Environmental Conditions

File transfer

Architectural Elements

Application Network Interface Module

System Response

Files are encrypted during transfer

Response Measure

-

 

 

Quality Attribute

Performance

Stimulus

Response time is too high

Source(s) of the stimulus

Volvox Use cases

Relevant Environmental Conditions

During lesser number of users online

Architectural Elements

Push/Pull update module

System Response

Parallel download and update of files

(Prohibited by Number of users online)

Response Measure

Action completes in a reasonable time using minimum resources possible

 

 

Functional Requirement

Flexibility

Stimulus

Volvox is not able to integrate to work with existing editors and hence loses market interest

Source(s) of the stimulus

Promote/Increase product influence sales by having it as an embeddable plug-in in existing prominent project editors

Architectural Elements

Volvox components should have distinct API’s available and the development has to be modularized.

System Response

Embed distributed versioning techniques in

Response Measure

Ability to include Volvox versioning techniques in different existing editors ex. Eclipse, Netbeans, etc.

 

 

 

Quality Attribute

Scalability

Stimulus

The response time of the system reduces

Source(s) of the stimulus

More users working/joining project

Relevant Environmental Conditions

 

Architectural Elements

 

System Response

 

Response Measure

Responsiveness not reduced when more users working at the same

 

 

 

Functional Requirement

Parallel Development

Stimulus

Product not delivered

Source(s) of the stimulus

2 months development time and testing time

System Response

Components Modularized

Response Measure

Each developing in parallel

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

System Design

Architecture (diagrams using ACME Architecture Description Language - ADL)

Communication Architecture: Client Server (Request Response Type)

 

 

 

 

CSConnT (connector protocol for volvox) - HTTP

 

 

Quality attributes & Requirements promoted by Client Server architecture:

 

  1. Scalability (Users can join in and start making connections)
  2. Availability  (If one client goes down during file transfer, or any other operation, connections to others can be made using the Coconut protocol)
  3. Performance (Multiple clients Concurrently making connections)

 

 

 

 

 

 

 

 

 

 

 

Node Architecture: Tiered Architecture (Call Return Type)

 

 

 

 

 

Quality attributes & Requirements promoted by Tiered Architecture:

 

  1. Modifiability (All 3 components have separate functionality that is accessible through well defined interfaces, and hence changes in one hidden from others)
  2. Parallel Development (All 3 components have separate functionality that is accessible through well defined interfaces, and hence changes in one hidden from others)
  3. Flexibility (Modularized components give rise to well defined access API’s that enable software to be integrated with multiple)

 

 

 


Architectural Elements

 

  1. Data Access Logic - Server software and other helper modules
  2. User Interface
  3. Project File System
  4. Connector defining communication protocol
  5. Ports used for accepting data (and converting formats if required)

 

Each node will act as a client as well as a server.

 

The node behaving as a client should be able to perform the following functions:

 

1)     Checkout project

2)     Update project

3)     Add Files

4)     Delete Files

5)     Commit files

6)     Work on multiple projects

7)     Make SECURE connection to other clients

 

The node behaving as a server should be able to perform the following functions:

 

1)     Receive connections from other clients

2)     Return response for the following requests:

a.       Current highest revision number

b.      Diff’s (for file changes) between two file versions

c.       File metadata

d.      Project revision tree

 

 

 

 


Code Organization - Class Diagram

 

1. System View

 

2. DataAccessLogic Package

 

 

 

3. ServerPackage

 

4. FileAccess Package

 

 

 

 


Data Structures

 

Tracker

Maps userid’s to IP addresses to contact them

 (In case a user moves, the change is reflected in the tracker at other nodes using the Tracker Update protocol, which can be separately implemented, or as a part of other protocols)

 

Userid1= xxx.xxx.xxx.xxx

Userid2= xxx.xxx.xxx.xxx

 

 

Reverse Lookup

Maps project revision numbers with associated file-id’s

ProjRevsionId1 = fileId1, fileId2, fileId3

ProjRevsionId2 = fileId1, fileId2, fileId4

 

 

 

Project Config file

Will store the project related global constants such as:

 

 

 

Volvox Config file

Will store the software (Volvox) related global constants such as:

 

 

 

Revision Trees

Each project will consist of a project file and multiple file revision tree files.  All these file will be stored in a file named “.volvox” at the project root path.   The project file is structured as a list of subsequent project revisions, with each entry containing the revision ID, the parent revision ID, and a list of file id/file name pairs altered in this revision.  Each file revision tree is structured similarly, with a header specifying file path, and then entries containing a revision id, the parent revision offset, and a set of differences from the parent.  The revision id in both files consists of a monotonically increasing number and the username of the author.

 

 

 

Project revision Tree:

Format of 1 Entry -

 

Example

ProjectRevId1:0:FileId1|| ProjectRevId2:1:FileId3, FileId4|| ProjectRevId4:1:FileId1|| ProjectRevId3:3:FileId5

 

 

 

The project revision tree is read into memory and remains there for further access.

 

 

File Revision Tree:

Path = “\foo.c”, creator=”washington”

washington1

NULL

Difference string

adams3

1

Difference string

washington4

2

Difference string

 

 

 

 

Interactions & Communication Protocol - Sequence Diagrams

Add File

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Checkout

Update

Commit

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Synchronization

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Demo sequence

The midpoint and final demo will both be preformed with 4 computers running the volvox program.

 

Midpoint Demo Sequence:

This demo will show the basic functionality of the volvox program, in the presence of a full network and no conflicts.  We will assume that the users have downloaded the tracker file for a project and have the Volvox software deployed.  These users have already joined the project.  After, one user will add a file to the project and commit it.  The other three users will update to the new revision, and will make a change to a file.  They will then commit the changed file.

 

Final Demo Sequence:

This demo will show the full volvox functionality, its ability to merge files committed when the network was partitioned, and its ability to handle node failure cases during commit and update.  First, one user will create a project with the creation command, and will send the tracker file by email to the other 3 users.  These users will use the file to join the project.  After, one user will add a file to the project and commit it.  The other three users will update to the new revision and then partition the network.  One user on each subnet will change the file, and will commit their change.  The networks will then be joined, and the users will update, sowing the automerge feature.  One user will then create changes to the file, and will start a commit.  During the commit, the computer will be disabled, and the network will be forced to recover, showing the failure tolerance of the system.  Finally, a file will be created and added to the project by two different people, who will then attempt to commit it.  This will show the system’s ability to resolve file name conflicts.

 

 

 

 

 

 

 

 

 

 


Use Cases

 

 

 

 

 

Stretch Goals

 

File level access rights – Our current design assumes all users working on a given project access all files, this may be true for academic or free software, but larger commercial software is often developed as modules worked on by groups independently.  For such projects, it would make sense to provide read and write access at directory or file level.

 

User specific public/private key encryption – Our current design does not account for malicious users; these users may be part of a project, and try and poison a function, or may be an external actor who simply has acquired the shared key for the project.  User specific public keys would insure all changes come from known users, and allow for tracing back a malicious revision to a specific user.

 

Graphic User Interface – Our current design uses command line interaction with the user; while this is a flexible and powerful way of interacting, it is not friendly for new users.   A graphical interface that obscures the command line would improve the user experience.

 

 

 

 

 

 

 

 

 

 

 


Schedule And Responsibilities

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


References

 

1]http://www.selenic.com/mercurial/wiki/index.cgi/UnderstandingMercurial