BMI Students

Sunday, February 27, 2005

geekery: setting up a version control system for your files.

I've been meaning to set up a CVS or CVS-like repository for some of the files I actively edit for a while, so I can track changes and roll back errors. I think this should be useful for more than just code: one of my roommates swears by CVS for tracking the edits he makes to LaTeX documents, and I figure I will use it for keeping multiple revisions of artwork/figures that I make in Illustrator, or presentation files, or entire websites, or whatever.

So a few days ago, I bit the bullet and set up a repository. I chose Subversion as my versioning system over CVS because it is basically a bit more modern and has the maddening bugs of CVS removed. (Subversion, for example, manages directory trees whereas CVS manages files only; this allows subversion to be much smarter about moving, renaming and deleting directories of stuff than CVS is.)

Here's a quick guide to how to do it.

These directions are just for setting up a repository you alone plan on using. Things get (only slightly) more complex if you want to allow multiple users: you'll have to RTFM for that.
  1. Get the Subversion user manual (html | pdf). It's excellent.
  2. Download copies of the Subversion binaries for your platform, or get the source and compile it yourself. Or you could use a ports manager like fink, or BSD ports, or apt-get or whatever. For OS X, I found that the fink install had a lot of largely unnecessary dependencies, so I just installed the binary package from the Subversion page.
  3. Install Subversion on (a) the machine you intend to host your repository, and (b) the machine that you will use the repository from. (Machines a and b can be the same!) Note that if you want to do network client/server business, but don't have admin rights on the box you want to be a server that's no problem. All you need is a user account and the ability to SSH to that account: just put the subversion binaries somewhere in your home directory (say under ~/bin), and make sure to add that to your path in your .tcshrc or .bashrc. SSH will find them.
  4. Logged into machine (a), the Subversion host, run the following command: svnadmin create /path/to/repository (it's really that easy). In my case, the command was svnadmin create /Users/zpincus/Projects . Note that the repository directory (e.g. "Projects") need not exist beforehand.
  5. Now, go to machine (b), the client. We'll make a checkout of the Subversion repository by running the following command: svn checkout svn+ssh:// path/to/local/checkout where "path/to/local/checkout" is the directory that you want to check the remote repository into. If you look in the directory that you just checked the repository out into, you'll find it is empty save for a hidden ".svn" administrative directory. (If you want to run the Subversion repository on the same machine you will use as a client, then ditch the svn+ssh stuff, and just use the following URL style: file:///path/to/repository .)
  6. Now, let's add files that we want Subversion to manage. Copy some directory into the local checkout -- for example, copy a directory called "thesis_latex" to the checkout. Now run the command svn add thesis_latex . This command tells Subversion to start managing that directory (and all contents). Currently, only the client knows about these files. Now it's time to transmit the additions over to the server: svn commit --message "Added thesis_latex directory to repository." . Note that a message is required!
  7. You could now do a checkout of this repository onto a third machine, and you would recieve the latest version of the thesis_latex directory. After making some set of local changes to the files, you can then commit these changes to the repository with a svn commit command. To receive the newest version of the files from a repository, use svn update. (Note that you'll need to use the commands "svn move" "svn copy" and "svn delete" instead of mv, cp and rm if you want Subversion to be able to properly keep track of move/copy/delete actions!)
  8. OK, now for an example of reverting a file to a previous state. Say at some point, we notice that a change we made to "thesis_latex/chapter1.tex" was bad. If we haven't committed this change to the repository, we can just do svn revert thesis_latex/chapter1.tex to go back to the original local copy we made the last time we ran "svn checkout" or "svn update". However, if the changes were already committed to the repository, then it's time to roll back to a previous version. For that we use "svn merge" which basically takes a patch (a set of changes) and applies it to a file. In our case, we want to figure out which changes we want to reverse. So we look at the log for this file with svn log thesis_latex/chapter1.tex to figure out where we made the bad change. We could also use svn diff --revision X:Y thesis_latex/chapter1.tex to compare the precise differences in this file between revisions X and Y. So say we find that in between revision 10 and 12 we made some bad changes. However, we're now at revision 50, having added a lot of good stuff after revision 12. We just want to selectively back out the bad changes. Here's how: svn merge --revision 12:10 thesis_latex/chapter1.tex. This command takes the changes between revision 12 and revision 10, and applies it to the current copy of the file. Note that we take the changes in reverse: things added going from 10 to 12 are removed when you go from 12 to 10. So we take these removals and apply them to the current version. (If we just want to revert wholesale, we can do the following: svn merge --revision HEAD:10 thesis_latex/chapter1.tex noting that "HEAD" signifies the most recent version of the repository.)
  9. OK, the last thing for now: undeleting a file or directory. Suppose we deleted a file, and then committed that deletion to the svn repository. Now we want it back. It's just a matter of copying the file we want back from the repository at a particular revision. Suppose we deleted the file "chapter10.tex" and committed that deletion as revision 20. (We could find this information out with juducious use of "svn log".) Well, all we need to do is svn copy --revision 19 svn+ssh:// chapter10.tex : this goes to the repository and selects the file "chapter10.tex" as of revision 19 and copies it to the local machine as "chapter10.tex".
OK, that's all for now. This really isn't a complete introduction to Subversion, but it should give you an idea of how simple a repository is to set up, and how powerfully it can be used.


  • I've been thinking of checking my entire research project directories into subversion. Since I have to work on multiple different computers, it would be an easy way to keep them all in sync. CVS couldn't do it because its handling of binary files is broken. However, I'm a little worried whether svn will scale up to the task.

    1. Do you know how well it handles large data files, i.e. 100'sMb?

    2. How much overhead does it typically add to your files?

    3. Will I lose the ability to use normal unix commands on the files? For example, if I want to rename my file, do I do:
    mv foo.c bar.c; svn mv foo.c bar.c
    or just:
    svn mv foo.c bar.c

    That is, does "svn mv" actually do the moving for me, or is it just a command to tell the repository what happened? The documentation is unclear, and I suspect the latter, which seems like it would make my life much more complicated. I'd have to remember whether I'm working in a svn directory, and whether the file is being tracked by svn.

    The ideal would be for svn to accept either way of doing it...

    4. How hard is it to move a SVN repository to another machine? While I know that's not preferred, it does happen occasionally. In CVS, for all the checked out copies, I'd just write a perl one-liner that changes the CVS/Root files. However, since SVN info is stored in a BSDDB database, you can't do that anymore. Is there any way to tell it that the repository location has changed?

    5. Any annoying bugs?

    By Blogger jchang, at 8:02 PM  

  • It looks like svn has some pretty cool features, but I must say setting up the repository doesn't look so quick and easy as cvs. I guess that's the cost for the features it has. The database system seems to be more flexible, but you'll have to work through their tools to get it right.

    I think I would still recommend RCS or CVS for beginners, and SVN for more advanced controls. Maybe as I get more familiar with SVN, I'll change my mind.

    A quick start for RCS is this:

    You are working in your project directory, and you decide you want source control. Just create a directory called RCS, "mkdir RCS", in that project directory, and you're on the roll. Any file you want revision control, type "ci -l FILENAME". Everytime you have a new version to check in, do the same thing. This is an easy per-file revision control system. Rolling back a version is just "co -l -rVERSIONNUM FILENAME" Note that this use is for personal use only. By using the "-l" lock flag on the commands, you can always modify your source code, but this prevents sharing of the repository.

    If you're working with others, then I would recommend using CVS. Quick steps:

    # First, create repository
    mkdir ~/cvs
    # Then, set the CVSROOT environment variable, e.g.
    export CVSROOT=$HOME/cvs
    # Finally initialize the repository
    cvs init

    Then inside your working directory, you can import all the files with:

    # This imports all the files in the current directory
    cvs import -m "initial revision" PROJDIR $USER initial

    The -m flag specifies the initial message. PROJDIR is the name of your project, usually the directory name. $USER is just the "vendorid". I usually use my user name. "initial" is just the first tag name, you can specify whatever, I just use that as a default.

    Finally, you have to re-checkout the directory so that CVS knows about it. Usually, from your working directory:

    # move up a directory
    cd ..
    # rename the working directory (PROJDIR)
    # checkout the project
    cvs checkout PROJDIR

    you can get rid of PROJDIR.orig when your comfortable.

    Now you can make changes to the files, and when you want to commit, just do:
    cvs commit

    To add new files, use:
    cvs add FILENAME

    Hehe, now that I think about it, SVN may require about just as many steps. So maybe its not much more complicated. :p

    By Blogger Mike, at 12:24 PM  

Post a Comment

<< Home