Git Partial Clone

What is Git Partial Clone?

Git Partial Clone is a feature that allows cloning a repository without downloading all of its objects. It fetches only the minimal set of objects needed to check out the current branch, with other objects downloaded on demand. This feature significantly reduces initial clone time and disk usage for large repositories.

In the world of software development, Git has emerged as a powerful and essential tool for version control. It allows multiple developers to work on a project simultaneously, tracking changes and resolving conflicts with ease. One of the key features of Git is its ability to create a 'partial clone'. This article will delve into the depths of what a Git partial clone is, how it works, its history, use cases, and specific examples.

Understanding the concept of a partial clone requires a solid grasp of Git's fundamental operations. Git is a distributed version control system, meaning every clone of a repository contains the full history of all changes. However, in certain scenarios, having the entire history can be unnecessary and even burdensome. This is where the concept of a partial clone comes into play.

Definition of Git Partial Clone

A Git partial clone is a clone of a Git repository that does not contain the full history of the project. Instead, it includes only a subset of the objects in the repository, based on specific criteria defined by the user. This can significantly reduce the size of the clone, making it faster to create and easier to manage, especially for large projects.

The partial clone feature was introduced in Git 2.19, released in September 2018. It was designed to address the issue of bloated repositories that could slow down operations and consume excessive disk space. With a partial clone, users can choose to download only what they need, improving efficiency and performance.

Components of a Partial Clone

A partial clone consists of several key components. The first is the 'sparse checkout', which allows users to define a set of paths that they are interested in. Only these paths will be checked out into the working directory. The other files in the repository remain on the server, reducing the size of the local clone.

The second component is the 'shallow clone', which limits the depth of the history that is cloned. Users can specify a depth, such as the last 10 commits, and only those commits and their associated objects will be cloned. This can significantly reduce the size of the clone for repositories with a long history.

How Git Partial Clone Works

The process of creating a Git partial clone involves several steps. The first step is to use the 'git clone' command with the '--filter' option. This option allows users to specify a filter that defines which objects should be included in the clone. The filter can be based on various criteria, such as the size of the objects, the paths they are associated with, or their type.

Once the clone is created, Git will only download the objects that match the filter. The other objects remain on the server. When a user tries to access an object that is not in the clone, Git will automatically fetch it from the server. This is known as 'on-demand fetching' and it allows the clone to remain small while still providing access to the entire repository.

On-Demand Fetching

On-demand fetching is a key feature of Git partial clones. It allows users to access objects that are not in the clone without having to download the entire repository. When a user tries to access an object that is not in the clone, Git will automatically fetch it from the server and add it to the clone.

This feature is made possible by the 'promisor remote', a special remote that promises to provide any missing objects on demand. The promisor remote is usually the same as the origin remote, but it can be a different remote if needed. The promisor remote is defined when the clone is created and can be changed later if necessary.

Use Cases for Git Partial Clone

There are several scenarios where a Git partial clone can be beneficial. One common use case is for large projects with a long history. Cloning the entire repository can be time-consuming and consume a lot of disk space. With a partial clone, users can download only the parts of the history they are interested in, saving time and disk space.

Another use case is for projects with large binary files. These files can significantly increase the size of the repository, making it slow to clone and difficult to manage. With a partial clone, users can choose to exclude these files, making the clone much smaller and faster to create.

Large Projects

Large projects with a long history can benefit greatly from Git partial clones. These projects often have thousands of commits and hundreds of contributors, resulting in a large and complex history. Cloning the entire repository can be slow and consume a lot of disk space. With a partial clone, users can download only the parts of the history they are interested in, making the clone much smaller and faster to create.

Furthermore, large projects often have many branches and tags, which can also increase the size of the clone. With a partial clone, users can choose to exclude certain branches or tags, further reducing the size of the clone.

Projects with Large Binary Files

Projects with large binary files can also benefit from Git partial clones. Binary files, such as images, videos, and compiled binaries, can significantly increase the size of the repository. These files are often not necessary for development and can be excluded from the clone to save disk space.

Furthermore, binary files are often not diffable, meaning Git cannot efficiently store changes to these files. As a result, every change to a binary file creates a new copy of the file in the repository, further increasing the size of the repository. With a partial clone, users can choose to exclude these files, making the clone much smaller and faster to create.

Examples of Git Partial Clone

Let's look at some specific examples of how to create a Git partial clone. The first example will show how to create a partial clone that includes only the last 10 commits. The second example will show how to create a partial clone that excludes large binary files.

To create a partial clone that includes only the last 10 commits, you can use the 'git clone' command with the '--depth' option:


git clone --depth 10 https://github.com/example/repo.git

This command will create a clone of the repository at the specified URL, but it will only include the last 10 commits and their associated objects. The other commits and objects will remain on the server and can be fetched on demand.

Excluding Large Binary Files

To create a partial clone that excludes large binary files, you can use the 'git clone' command with the '--filter' option:


git clone --filter=blob:none https://github.com/example/repo.git

This command will create a clone of the repository at the specified URL, but it will exclude all blobs (binary large objects) that are not referenced by the tip of any branch. The excluded blobs will remain on the server and can be fetched on demand.

By understanding and utilizing Git partial clones, developers can work more efficiently with large repositories, saving both time and disk space. As Git continues to evolve, the partial clone feature is likely to become even more powerful and flexible, further enhancing Git's status as a vital tool for software development.

Join other high-impact Eng teams using Graph
Ready to join the revolution?
Join other high-impact Eng teams using Graph
Ready to join the revolution?

Build more, chase less

Add to Slack