This article is going to fall under the ‘duh, obvious’ category for any software engineer who has professional experience, (at least I hope it does!) so it is intended for students and non-engineers who haven’t heard of RCS or don’t understand the point of it. It is not an article about how to use any particular software package, but instead will outline some reasons why this tool is important and useful.
Revision control systems are extremely important for any software project that doesn’t take five minutes to create. It is especially critical when working in a team (of any size), but still very useful if you are the only developer. Unfortunately, universities and colleges tend to omit or gloss-over this category of tools, despite their importance!
What is revision control? I’ll grab a definition from Wikipedia (http://en.wikipedia.org/wiki/Revision_control):
Revision control, also known as version control and source control (and an aspect of software configuration management), is the management of changes to documents,computer programs, large web sites, and other collections of information. Changes are usually identified by a number or letter code, termed the “revision number”, “revision level”, or simply “revision”. For example, an initial set of files is “revision 1”. When the first change is made, the resulting set is “revision 2”, and so on. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged.
So if you are working on a project that involves a collection of files, a RCS helps you track the lifetime of each file, and all changes to them. You can even track who made each change, and when. Revision control systems help teams to coordinate their efforts, reduce the chance of errors, and provide a living history of changes for their future selves (or new team members) to refer to. All changes have a person associated with them, so you know who to ask if you need more information about a change, or if something broke with a recent change.
You may be bored already, but oddly enough I get excited talking about this subject. Whenever I encounter someone who has never heard of the process, explaining it for the first time is fun! I find using revision control to be so incredibly useful, so passing those skills on to others is rewarding.
Every computing university student likely has had the experience of working on a programming assignment that took several days to write. You get something working in a few hours, then after a few more hours, it’s all completely broken. What did you change that broke it all? I know I didn’t remember, and would end up with copies of files all over the place, desperately undo-ing my changes. I would be hoping to somehow recover a working version, without losing all the work I had just done. This sucks.
Just use revision control. You can check in your changes as frequently as you’d like, see the differences between revisions clearly (oh look, that’s how I broke it) and switch between older versions and your current version easily (ah, so it was always broken).
But you have to make it part of your process. Everyone needs an account, and should be checking their changes in when appropriate. How you choose to group your check-ins is either up to you, or you may have a team policy. Sometimes changes are grouped by feature, sometimes they are more incremental. A good policy to have is that all changes should not break a project in any way, whether introducing errors in compilation or in runtime.
During my second internship at Relic Entertainment I was working on Company Of Heroes: Opposing Fronts. It was an expansion pack (that you could play standalone, lovingly referred to as an ‘expandalone’) of a game that had a four-year development cycle, and was based on technology that predated it by a number of years as well. What that meant was a huge codebase with many authors, spread across many years. How could I know the intent of the original author of a single line in a single file in that massive project?
Comment and change history was a lifesaver. Even if the original author of some code was still at the company, how likely is it that they remember what they did from a year ago? Even a week ago? With RCS tools you can track the lifetime of even a single line of code, see how it changed, was moved, and what else was going on that led to those changes.
I’m not talking about source code comments. Some (including myself) would argue that you should avoid source code comments as much as you can, due to the fact they often go stale (they are not automatically temporal) and often end up being misleading over time. But that is for another article! Revision control systems allow you to write a comment about your collection changes changes, each time you perform a ‘check in’ (or ‘commit’, the language varies across different RCS packages, but the meaning is the same). Use this comment as an opportunity to help the future maintainers of the project (likely you) – be descriptive. I cannot understate this. Don’t use comments like “changed stuff” or “fixed problems”, or even “fixed bug 3425”. Describe the intent and effect of your changes. If you are referring to a bug, at the very least include the title of the bug, since you may change bug reporting systems in the future, and the bug reference number may end up being useless.
Diffs make all the difference
Instead of storing a copy of every file, every time you change even the slightly bit of it, most revision control systems store the differences. This not only saves storage space, but also helps you see exactly what has changed from one version of a file to the next. Since source code is all text, it is (fairly) easy to detect differences between two files, and even display those differences graphically. When it comes to images, audio, or any other binary format, things are not as easy. Those files usually end up being stored as complete copies, and the differences between them are not useful. But being able to look at previous versions, and the comment history is still important.
For comparing text files, there are a number of ‘diff’ tools that allow you to view and interact with two (and even three, when you need to merge changes into one master version) files, so you can examine exactly what changed. At a basic level they are very useful, but when it comes to merging things can get a bit more complex, and is usually only done (and only needed) by software engineers.
Not a magic bullet
Using a RCS takes planning and discipline. To many it seems like extra work that gets in the way of the creative process. But using it properly is very worthwhile, and critical for successful collaboration and maintaining quality. At first some of the concepts may seem complex, and there are many tools and functions that are not very user friendly. This is improving and learning the current tools is a valuable exercise. All popular revision control systems are available for free (or free up until a certain number of users), and there are plenty of free tutorials and reference material available to learn how to use them.
It’s not easy (at first) but it is worth it. The key is to make it part of your workflow, and as painless as possible for all team members to use. Once you have some experience, I recommend listing it in your resume – I know I look for that when hiring, and will ask about it as well.
Quick list of resources:
- Interesting article from primarily Perforce developers who migrated to Git: http://www.altdevblogaday.com/2013/07/30/git-off-my-lawn-develop-2013/
- A good quick overview of Git vs Perforce from Insomniac Games: http://www.insomniacgames.com/jonathan-adamczewski-an-introduction-to-the-git-revision-control-system/
“File folders” image courtesy of http://www.flickr.com/photos/86937504@N00/3154846755/