Internal Documentation: Definition, Examples, and Applications

Git is an open-source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. It is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows.

Git was created by Linus Torvalds in 2005 for development of the Linux kernel, with other kernel developers contributing to its initial development. Its current maintainer since 2005 is Junio Hamano. As with most other distributed version control systems, and unlike most client-server systems, every Git directory on every computer is a full-fledged repository with complete history and full version-tracking abilities, independent of network access or a central server.

Definition of Git

Git is a distributed version control system, which means that the entire codebase and history is available on every developer's computer, which allows for easy branching and merging. The Git feature that really makes it stand apart from nearly every other SCM out there is its branching model.

Git allows and encourages you to have multiple local branches that can be entirely independent of each other. The creation, merging, and deletion of those lines of development takes seconds. This means that you can do things like: Frictionless Context Switching. Create a branch to try out an idea, commit a few times, switch back to where you branched from, apply a patch, switch back to where you are experimenting, and merge it in.

The Git Data Model

The data model that Git uses ensures the cryptographic integrity of every bit of your project. Every file and commit is checksummed and retrieved by its checksum when checked back out. It’s impossible to get anything out of Git other than the exact bits you put in.

Git generally only adds data. You can lose information that you don’t carefully protect, but it’s generally difficult to make Git do anything that is not undoable, including deleting data. Git even provides a couple of mechanisms to recover lost commits.

Git's Speed

Git is fast. With Git, nearly all operations are performed locally, giving it a huge speed advantage on centralized systems that have to constantly communicate with a server. Git was built to work on the Linux kernel, meaning that it has had to effectively handle large repositories from day one. Git is written in C, reducing the overhead of runtimes associated with higher-level languages. Speed and performance has been a primary design goal of the Git from the start.

Branching in Git is an incredibly lightweight operation. Creating a new branch is as quick and easy as writing 41 bytes to a file (40 characters and a newline). This means that you can do things in Git that you cannot do in other VCSs. For example, you can perform tasks such as testing every commit that you have made by creating a new branch for every commit.

Explanation of Git

Git stores and thinks about information much differently than these other systems, even though the user interface is fairly similar, and understanding those differences will help prevent you from becoming confused while using it.

The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. These systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they store as a set of files and the changes made to each file over time.

Git's Data as Snapshots

Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a set of snapshots of a miniature filesystem. Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots.

This is an important distinction between Git and nearly all other VCSs. It makes Git reconsider almost every aspect of version control that most other systems copied from the previous generation. This makes Git more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS. We’ll explore some of the benefits you gain by thinking of your data this way when we cover branching in Git.

Git's Data Integrity

The most important aspect of any VCS is data integrity. The main reason you use a VCS is so you can get back previous versions of your work, and it’s absolutely critical that you can trust the VCS to give you the data you want. Git’s main focus is on data integrity.

Everything in Git is check-summed before it is stored and is then referred to by that checksum. This means it’s impossible to change the contents of any file or directory without Git knowing about it. This functionality is built into Git at the lowest levels and is integral to its philosophy. You can’t lose information in transit or get file corruption without Git being able to detect it.

History of Git

The history of Git dates back to the year 2005. The Linux kernel is an open-source software project of fairly large scope. For most of the lifetime of the Linux kernel maintenance (1991–2002), changes to the software were passed around as patches and archived files. In 2002, the Linux kernel project began using a proprietary DVCS called BitKeeper.

In 2005, the relationship between the community that developed the Linux kernel and the commercial company that developed BitKeeper broke down, and the tool’s free-of-charge status was revoked. This prompted the Linux development community (and in particular Linus Torvalds, the creator of Linux) to develop their own tool based on some of the lessons they learned while using BitKeeper. Some of the goals of the new system were as follows:

Speed

Performance was a primary design goal from the start. Git was designed to be fast. In fact, performance testing shows that it is an order of magnitude faster than some version control systems, and several orders of magnitude faster than others. Performance tests demonstrate that Git is generally the fastest widely-used version control system in terms of time to make a new commit.

Git is also significantly faster than other version control systems at switching between different versions of files. Git’s design philosophy is to do the simplest thing that can possibly work, and this simplicity makes it robust and fast.

Simple Design

Git was designed to be easy to understand and use. In this respect, it is very similar to other version control systems. However, Git’s simplicity is more “conceptual” than “user interface.”

Git has a very simple design with a strong emphasis on speed and efficiency. It is built to handle large projects like the Linux kernel efficiently. It is not bogged down by the need to keep a constant connection to a central repository. Developers can work anywhere and collaborate asynchronously from any time zone.

Use Cases of Git

Git is used for version control of files, much like tools such as Mercurial, Bazaar, Subversion, CVS, Perforce, and Team Foundation Server. It’s mostly used for source code management (SCM) in software development, but it can be used to keep track of changes in any set of files. As a distributed revision control system it is aimed at speed, data integrity, and support for distributed, non-linear workflows.

Git has been designed with the following features to support distributed, non-linear workflows:

Strong support for non-linear development
Distributed development
Compatibility with existing systems and protocols
Efficient handling of large projects
Cryptographic authentication of history
Toolkit-based design
Pluggable merge strategies

Non-Linear Development

Git supports rapid branching and merging, and includes specific tools for visualizing and navigating a non-linear development history. A core assumption in Git is that a change will be merged more often than it is written, as it is passed around various reviewers. Branches in Git are very lightweight. A branch in Git is only a reference to a single commit. With its parental commits, the full branch structure can be constructed.

Git's branching and merging capabilities give it a huge advantage over centralized systems. Branching is a core concept in Git, and the entire Git workflow is based upon it. Unlike other systems, Git allows for easy branching and merging. This facilitates a wide variety of workflows. For example, you can use only one branch in your project, or you can have multiple branches that are used for developing features, fixing bugs, and staging and testing changes.

Distributed Development

Like Darcs, BitKeeper, Mercurial, SVK, Bazaar, and Monotone, Git gives each developer a local copy of the entire development history, and changes are copied from one such repository to another. These changes are imported as additional development branches, and can be merged in the same way as a locally developed branch.

Users can publish subsets of their changes, and can pull updates from other repositories. In this way, a set of changes can be pulled from one user, modified, and then offered to the original user to pull back.

Specific Examples of Git

Let's take a look at a few specific examples of how Git can be used in real-world scenarios. These examples should help to illustrate the power and flexibility of Git as a version control system.

Perhaps the most common use case for Git is in a collaborative software development project. In this scenario, each developer would have their own copy of the Git repository. They would make changes to their own copy, then push those changes to a central repository. Other developers can then pull those changes from the central repository and merge them into their own copies. This allows for a highly collaborative and efficient workflow.

Example 1: Collaborative Software Development

In a collaborative software development project, Git can be used to manage and track changes to the codebase. Each developer has their own copy of the repository, allowing them to work independently on their own tasks. Once a task is completed, the changes can be committed to the repository, providing a record of what changes were made, who made them, and why they were made.

Other developers can then pull these changes from the repository and merge them into their own copies of the codebase. This allows for a highly collaborative and efficient workflow, where developers can work on different parts of the codebase simultaneously without stepping on each other's toes.

Example 2: Documentation Versioning

Git can also be used to manage and version control documentation. This can be particularly useful in projects where the documentation needs to be kept in sync with the codebase. Each time a change is made to the codebase, corresponding changes can be made to the documentation. These changes can then be committed to the Git repository, providing a record of what changes were made and why.

This can be particularly useful in scenarios where multiple people are working on the documentation simultaneously. Each person can work on their own copy of the documentation, then commit their changes to the repository. Other people can then pull these changes and merge them into their own copies of the documentation. This allows for a highly collaborative and efficient workflow, where multiple people can work on the documentation simultaneously without stepping on each other's toes.

Example 3: Configuration Management

Git can also be used for configuration management. This can be particularly useful in scenarios where you need to keep track of changes to configuration files. Each time a change is made to a configuration file, the change can be committed to the Git repository. This provides a record of what changes were made, who made them, and why they were made.

This can be particularly useful in scenarios where multiple people are responsible for managing and updating the configuration files. Each person can work on their own copy of the configuration files, then commit their changes to the repository. Other people can then pull these changes and merge them into their own copies of the configuration files. This allows for a highly collaborative and efficient workflow, where multiple people can manage and update the configuration files simultaneously without stepping on each other's toes.

Internal Documentation

What is Internal Documentation?