Revision Control Smells
As software developers, we talk about code smells, design smells, and even configuration smells. While reviewing some code, it occurred to me that revision control smells are also distinguishable and important. Here are some obvious ones and my recommendations for avoiding them. An orderly revision control repository is a sign of professionalism.
Commented-out code
Delete the code in the revision it's no longer needed. Leaving the code lying around hinders readability.
Generated artifacts
You should build object files, debugger databases, libraries, and so on, from the source in the repository, not store them in it. The revision control system is for managing revisions, it is not a shared drive.
Binary artifacts
Avoid committing binary files, because they are difficult to difference and merge. Instead, choose applications and file formats that have a readable text-file representation. For example, adopt Markdown or LaTeX, rather than storing Microsoft Office or Libre Office documents in your repo. Or, instead of a relational database blob, store the corresponding SQL data definition language commands. If you need to commit drawings, explore their declarative specification. And obtain required libraries and tools through a configuration management system, such as Puppet, Ansible, Chef, or CFEngine rather than including them in the repository.
If your project contains binary files (e.g. icons, pictures, music, videos, voice prompts), consider maintaining the binary resources in a separate repository. This ensures that the large storage space often required for storing binary files doesn't burden your source code repository, which remains lean and mean. Binary resources are still under version control, but for many use cases developers won't need to pay the associated cost. Note that Git's submodule mechanism allows you to link the binary file repository with the source code one.
Commits lacking comments or with badly written ones
Follow this excellent guide on how to write great commit messages. In brief, always include a short (up to 50 characters), descriptive, summary message written in imperative style. Follow that with a blank line and then explanatory text (wrapped at 72 columns) explaining the what and the why behind the change, rather than the how.
Inconsistently named branches and tags
The names of branches and tags are at least as important as your code's identifiers. Establish and follow guidelines for naming releases, feature branches, bug-fix branches, developer-private branches, and so on.
Whitespace changes
Changes in whitespace between revisions, such as the conversion of newlines into CR-LF pairs, or soft tabs to hard ones, introduce noise, making it hard to see the actual changes and developer contributions. Although Git has options for working around these problems, they need effort to be applied, and the gratuitous changes can get in the way of other tools.
Encryption keys, access tokens, and passwords
Don't commit these in the repository, or you'll regret it. Following this without getting in the way of automated builds, tests, and deployments is tricky. One way around this problem is to store them in key files, which you protect with a pass phrase.
Log comments and metadata embedded in the file
The file's last change date, its authors, and a chronological list of changes belong to the revision control system, not on a long outdated comment at the beginning of the file. Similarly, in most cases you should include text detailing the background of implemented changes in the commit messages not in code comments. The code represents the now, the revisions the history.
Slapdash reorganizations
Although modern revision control systems can follow files and folders that are moved and renamed, such operations can still cause confusion. Therefore, avoid frequent file renames and moves around directories. When you initiate a project, plan carefully its directory layout and naming conventions and stick to them as long as it is practical to do so. If possible, adopt widely-accepted directory structures, such as Maven's Standard Directory Layout.
Also, when you move things around, consult the other project members and give them a timely heads-up.
More?
Am I missing some obvious smells? Please comment below.
This blog entry was last changed on September 15th to add a section on file renames and moves suggested by Rafael Chaves, and propose how to handle binary files in response by a query by Panos Papadopoulos.
Read and post comments