Our Experiences with CVS
Jim Blandy
Before starting Cyclic Software, I spent several years working for the Free Software Foundation (FSF), the organization which has produced GNU Emacs, the GNU C Compiler, and many other widely used programming tools. I worked on version 19 of GNU Emacs, a system containing 400,000 lines of source code in 700 source files. At the time, we used the Revision Control System (RCS) to manage our sources.
I am not sure if you are familiar with RCS; I will summarize its operation here. RCS uses a lock-modify-unlock development model. That is, before a developer can modify a file, she must first ask RCS for a lock on the file. When she completes her changes, the developer checks in her changes as a new revision of the file, and releases the lock. Only one developer may hold a lock on a file at a time, and one must hold a lock on the file (perhaps only briefly) to check in a new revision of the file.
RCS's locking mechanism effectively serializes all changes to a given file. Under this system, there is clearly no risk that two developers will simultaneously make incompatible modifications to a file; the locks assure that one of them must complete her changes before the other can begin. However, this also means that developers may not even make non-conflicting changes to the file (in separate areas, for example), without one waiting for the other. RCS can sometimes serialize changes more than necessary, and thus delay work. While at the FSF, I found I had to mail the other Emacs developers frequently, asking them to release locks on files; usually, they had finished their work and forgotten to release the lock, or were working on a separate region of the file, so I could have made my change safely without their involvement.
I eventually completed my work at the FSF, and took a position in the Indiana University Biology Department, working on a gene editor, for use in comparative analysis of organisms. The completed editor contained 36,000 lines of code in 89 files. I collaborated on the editor with Karl Fogel, who is now my partner at Cyclic Software. Since Karl was at the University of Illinois at Urbana-Champaign, we needed a system for keeping our sources synchronized across the Internet. A friend provided us with a network-transparent version of CVS, which we used to manage the project; that code eventually became the official version of CVS.
Unlike RCS, CVS uses a copy-modify-merge development model. Under this model, each developer has her own working copy of each source file, which she may edit at any time. When her changes are complete, she performs an update operation, which merges into her working files any changes made by other developers. She then performs a commit operation, to publish the merged sources to the group. CVS flags any textual conflicts between the developer's own changes and those made by others, and requires her to resolve the conflicts before committing her changes.
In contrast with RCS, CVS performs no locking on source files; any developer can edit any file at any time. Instead, synchronization occurs via the merge/commit process. The question now becomes, how often do conflicts occur, and how difficult are they to resolve? In our experience, conflicts occur rarely. During the period Karl and I used CVS to manage the gene editor sources, we found one conflict roughly every two months. All our conflicts were straightforward to resolve. I believe conflict frequency depends partially on how cleanly the team has divided the project.
The rarity of serious conflicts may be surprising, until one realizes that they occur only when two developers disagree on the proper design for a given section of code; such a disagreement suggests that the team has not been communicating properly in the first place. In order to collaborate under any source management regimen, developers must agree on the general design of the system; given this agreement, overlapping changes are usually straightforward to merge.
Of course, CVS can only detect textual conflicts between changes, not semantic conflicts. If one developer changes the semantics of a function, and another developer adds a new call to that function expecting the old semantics, CVS alone will not warn them of this situation. However, one can configure CVS to require the program under development to pass a test suite before committing any changes. This warns the developer of any semantic conflicts that are visible to the test suite. Since our gene editor project was relatively small, Karl and I simply agreed to test changes manually before committing them.
I do not have personal experience using CVS to manage very large projects. However, I do know that my successors at the FSF are now using CVS to manage GNU Emacs, which is rather large. Also see the paper CVS II: Parallelizing Software Development, by Brian Berliner, which describes the author's experiences using an ancestor of CVS to manage the SunOS 4.0 kernel source tree, which contains ``over a thousand files spread across a hierarchy of dozens of directories.''
Our experience with CVS has generally been quite positive. The ancestors of CVS have been in widespread use in the Unix community for several years now. CVS itself performs reliably for us; the internal design is relatively clean, so the bugs we have encountered have been straightforward to fix. Because Karl and I collaborated across a wide-area network, we relied on CVS's network transparency. It is indeed transparent; merging worked just as well remotely as locally. In our view, its most serious shortcoming was the lack of serious support and development; we intend Cyclic Software to fill that need.
![[Cyclic Home]](cyclichome.gif)
![[ Valid XHTML 1.0! ]](/branding/w3c-valid-xhtml10-44x16.png)
![[ Valid CSS! ]](/branding/w3c-valid-css-44x16.png)
