Friday, December 5, 2008

File version control with Subversion

CCOM uses Subversion (svn) on their server to enable folks to have version control on their important documents. What this means is that I can commit a file to my subversion account on the server, and each time I modify the file, a new version will be saved. If I realize I made a mistake or need to go back to a previous copy of the file, I simply "checkout" an earlier version. This is great for constantly changing documents such as source code, html, my thesis proposal, etc. One neat thing is that everytime you "commit" a new version of the document to svn, it only saves the actual changes. Therefore, your memory usage increases only the minimal required amount each time.

To access svn on my Windows machine, I use a program called
Tortoise. Tortoise allows me to upload, checkout, update, and commit files easily to svn. For example, initially I uploaded some source code of mine to the repository. It is now safely stored on the server. When I checkout the file from svn, the most recent version (unless I specify otherwise) is dropped onto my local machine. Once I am happy with any edits I make to this local copy, I simply commit the file back to svn. This new copy autmatically gets a new version number (called revision in SVN) and will be stored as the most recent version. The update command is used to update any checked out copies on your local machine to most recent version in the svn repository. This is handy when multiple people are working on the same file.

The only potentially confusing thing thus far is that revision numbering in svn is global. This means that everytime I commit a file, the global revision number for the whole repository increments by one. For example, say I upload a file and svn assigns it a revision number of 25. Then I checkout a file with a revision number 2. Once I edit the file and commit it back to svn, its revision number will update to 26, instead of 3.

A typical svn reposity is set up with three main directories (note this is only a recommended structure):

  • trunk - this is the main directory for your files and where you commit regular changes
  • tags - a collection of snapshots of the trunk (or a branch) at a user-defined point in time. This basically contains pointer files to specific versions. This is good for a version you want to be able to access quickly. For example, if I used a specific version of my source code to process some data for a publication, I will want to quickly be able to access this exact version.
  • branches - active variations of the project compared to the trunk (or even another branch). Branches are good for when you are editing a file already in the trunk, but you do not have a working version yet. Perhaps you are trying out some new addition to your code, and you are still in the testing phase. This is also a good place to store any revisions you have to make to older versions of files in your trunk directory. For example, say I use version X of my code to process data for a paper. Now, a year later, I am on version Z of that code. Someone using version X from my paper finds a bug in it I need to fix but I do not wish to give them version Z yet. I can fix the bug in version X, and commit the new version of that code to the branch directory. That version Z of the code is still my most recent version in the trunk directory.
I found a good guide on svn here. This guide is for a specific svn client called Subclipse, but I find it a better tutorial than the one for Tortoise. Now I can stop pestering my poor boyfriend with svn questions over IM.

Oh, I should also note that Windows users who use Cygwin can also use svn via command line .

No comments:

Post a Comment