CVS interface to RCS library
Although the title of this page refers to an RCS library, the main point of writing this page is to imagine replacing the backend revision storage of CVS with something rather different than RCS. We refer to the backend library as the "version control subsystem" or "the library". Possible backends include:
- PRCS 2.x with xdelta,
- SCCS (the most obvious win is checksums, the file format has some other potential advantages too)
- some kind of database format with separated metadata (the goals in #3 might end up being fulfilled by #1, depending on how PRCS develops)
- whatever backends decide to implement the Versioning Extensions to DAV; see that page for more discussion.
The idea is that the interface would be an API which is based on the RCS_* interface which is currently used between the "RCS library" currently in CVS (rcs.c and rcscmds.c, roughly), and the rest of CVS. Trying to do this on top of CORBA, a network/pipe protocol, or some such approach, is likely to be too slow and/or awkward, and the existing protocols like CVS and CVSup are probably a better approach to working across a network, anyway.
The rest of this file contains detailed notes about making the RCS_* interface useful in this kind of way. It ranges from detailed to sketchy, so consider yourself warned.
Things to include
Design goal: all data passed via callbacks, arguments, &c, no files are used to communicate between CVS and the version control subsystem.
To adopt the way that CVS does things now, we would need to make some fields of the RCSNode part of the interface. Probably better off going with the RCS_getexpand style. I think that cleaning this up would remove the need for RCS_reparsercsfile in the interface (currently called at the start of admin_fileproc).
Need some kind of function to free/close an RCSNode (see call to RCS_reparsercsfile at the end of admin_fileproc). Also see freercsnode.
Magic branches: ideally would largely hide the difference between 1.2.3 and 1.2.0.3. Not sure quite how.
Locks: I guess lock_tree_for_write, Lock_Cleanup, &c, or some analogues thereof, want to be part of this interface too, since a database is going to want to do locking its own way. Haven't thought about this carefully - the lock_tree_for_write interface seems a bit ugly, especially in this context.
Date format: At the API level, probably want to insist on the variation of the RCS date with 4 digit years (even if the year is before 2000). Either that or just go for an ISO8601 subset. There is no particular win, and many headaches, associated with using the operating system's format (time_t, VMS quadword, Win32's format, &c).
File attributes. File attributes want to be part of the backend (I think, might want to think a bit more about cases like SCCS). Probably an interface similar to the current fileattr_* would work but would be nice to clean it up (look for interfaces like dbm for ideas).
The RCS_checkout analogue should only support the callback interface, not access files directly. The callback interface needs to be extended to pass the mode and PreservePermissions information. This separation needs to happen anyway (it is the cleanest way to avoid storing entire files in memory, for example).
RCS_deltas - need the annotate (not checkout) variant of this. Unless there should be some lower-level alternative. As with RCS_checkin and a few of the others, issues of how to replace cvs_output. Could do something like tagged text, but probably separate callbacks for each kind of data is cleaner. Error handling would, of course, continue to be a general-purpose callback.
RCS_checkin - similar to status quo but providing file contents via a read()-like callback.
RCS_parse - Note that "repository" and "file" are the way that the caller identifies what to operate on, but they need not correspond to actual files in a filesystem.
RCS_setattic - Note that as far as the interface is concerned, "atticness" is just a single bit that the version control subsystem maintains. Trying to have the API specify the performance characteristics of this bit, if any, is probably unnecessary (although of course callers do care about performance).
Find_Names Find_Directories We probably don't want the List data structure to be a part of the interface (although some walklist-like callbacks might be). There is a somewhat subtle ugliness in the vicinity of here, RCS_parse, and so on, namely that RCS_parse would really like to know whether we found the file in the Attic or not, so that it doesn't have to look for the file again (with bad results for error handling). So it seems like Find_Names wants to return an inattic flag which gets passed to RCS_parse, or something like that. Of course, with a different backend all that might end up being degenerate (but harmless).
RCS_check_kflag and such - basically two choices. One is that the interface provides a way for a caller to get lists of possible kflag settings and such. The other is that the interface just contains an "enum kflag" or similar. I think I kind of like the latter.
Some analogue to add_rcs_file. Haven't thought too much about what the interface should be.
RCS_getdate - vendor branch, or default branch, stuff is an open issue which I haven't thought too much about.
RCS_gettag RCS_tag2rev RCS_getversion RCS_whatbranch RCS_getbranch RCS_branch_head - might think about whether some shorter, simpler set of functions would be simpler.
RCS_exist_rev RCS_exist_tag
RCS_magicrev - hmm, need to see how this is used.
RCS_head RCS_setbranch RCS_getexpand RCS_setexpand
RCS_isdead - also, access to the state field (or equivalent) in general. admin.c and log.c deal with ->state directly, need some kind of programmatic interface. And an attempt to figure out how to handle this if the backend doesn't do things quite the same way.
RCS_nodeisbranch
RCS_lock - doing this one file at a time isn't necessarily quite right in the long run, but for now this will do. RCS_unlock
RCS_delete_revs
RCS_addaccess RCS_delaccess RCS_getaccess - kind of specific to RCS files (I think).
Need a way for the caller to specify the CVSUMASK.
RCS_rewrite - would need to think fairly hard about the semantics of this (it is some sort of analogue to a database commit, in which the changes don't count until we call RCS_rewrite).
RCS_getrevtime - want a version of this which use RCS format dates rather than time_t.
RCS_symbols - need to see how this is used outside of rcs.c. We probably don't want the List data structure to be a part of the interface (although some walklist-like callbacks might be). RCS_getlocks - likewise.
make_file_label - need to see how this is used.
RCS_settag RCS_deltag
RCS_check_tag
RCS_valid_rev - not completely sure what a caller would use this for, but I suppose it is harmless enough to include. Although I don't think I like the idea of it varying between version control subsystems, so maybe the API should just specify a syntax and callers can do their own variant of RCS_valid_rev.
Things to leave out (or think harder about)
RCS_parsercsfile - this is used by "cvs admin -A". The solution is just to have "cvs admin -A" give an error if we are using vc_ops. commit.c (fixaddfile) - can just be using RCS_parse, no? (checkaddfile): Related to the add_rcs_file case. mkmodules.c (checkout_file): Could be using RCS_parse, I think.
RCS_fully_parse - This is an interface which gets all the metadata (that is, everything except the actual deltas/files) from an RCS file. Only used by log.c (log_fileproc). Need some such interface but may or may not be a good idea to include it in this particular form.
RCS_datecmp - Callers and version control subsystems can independently implement functionality as trivial as this.
RCS_cmp_file - Can be implemented in terms of the RCS_checkout callback interface (once the latter passes PreservePermissions info back to the caller).
rcs_change_text - this is a somewhat ugly interface (in that it wants files to be stored contiguously in memory). What to do with it is pretty much a separate issue from RCS_* stuff anyway.
RCS_exec_rcsdiff RCS_merge - this wants to be on the caller side, I think. (It sits on top of RCS_checkout and friends). Although note that making that decision does assume RCS-style (diff3-based) merging rather than SCCS-style (which is based on tracing lines back to the common ancestor, I believe). In that vein, see the above notes concerning a low-level alternative to RCS_deltas.
Details
Probably best to make the library pluggable at run-time, which is what we sketch here.
Setting in CVSROOT/config picks a library to dlopen().
vc_ops vector contains:
- a function pointer for each of the RCS_* functions, filled in by version control subsystem.
- a data pointer for use by the version control subsystem.
- some functions provided by CVS (error(), cvs_output_*, &c).
The library contains a vc_version function, which gets passed a version number and returns the version number supported. The vc_init function gets passed a pointer to the vc_ops structure which corresponds to that version, and the library fills it in.
Legal aspects (note the two interpretations of the GPL, with respect to whether dlopen() creates a derived work).
![[Cyclic Home]](../cyclic-pages/cyclichome.gif)
![[ Valid XHTML 1.0! ]](/branding/w3c-valid-xhtml10-44x16.png)
![[ Valid CSS! ]](/branding/w3c-valid-css-44x16.png)
