Using GitPython to List All Files Affected by a Commit

List All Files Affected By A Certain Commit In GitPython

Have you ever needed to see exactly which files were changed in a Git commit? When working on a project with a large and complex codebase, it can be invaluable to query commit history and analyze changes to specific files over time.

In this post, we’ll explore how to use the Python Git library GitPython to easily get a list of all files affected by a given commit. We’ll look at practical examples using GitPython so you can directly apply this knowledge in your own projects. Let’s dive in!

Also read: Learning Python? Things You Should Know with Growing Programming Knowledge

What is GitPython?

GitPython is a powerful Python library that provides access to Git objects and repositories. It allows you to leverage Git functionality in your Python applications and scripts.

Some key things you can do with GitPython:

  • Open local and remote repositories
  • Inspect commits, trees, blobs
  • Traverse commits and branches
  • Compare file changes between commits
  • And more!

By using GitPython, we avoid having to call Git command line operations and can directly access repository data structures in our Python code.

Installing GitPython

GitPython can be installed via pip:

pip install GitPython

Or conda:

conda install -c conda-forge gitpython

That will grab the latest version and all dependencies needed to start working with Git repos from Python.

Also read: Conda vs Pip: Choosing your Python package manager

Setup Repo Access

To open a Git repo, first import the Repo class:

from git import Repo

Then open a repository:

repo = Repo("/path/to/repository”)

You can open from a local file system path or remote URL.

Some examples:

# Local repo
repo = Repo("/users/project/code”) 

# Clone remote
repo = Repo.clone_from("https://github.com/user/repo.git")  

This repo object gives us access to all GitPython methods and the underlying Git data.

Get a Commit Object

Next we’ll see how to grab a specific commit from the repository’s history.

GitPython has powerful commit traversal and querying abilities. Common ways to get commits:

# From branch tip 
head_commit = repo.head.commit

# By SHA hex string
commit = repo.commit("0737db7”)  

# By index/ref name
commit = repo.commit("my-feature-branch")

There are also functions like repo.iter_commits() and repo.commits() to iterate through commits.

For our example here, we’ll just grab the head commit of our repo:

commit = repo.head.commit
print(commit)

# <git.Commit "5d466f4a3ca9995eb7e3ac36e27e4c0872d6b3b6">

Which gives us a commit object we can now inspect.

List Changed Files using GitPython

So we have our commit instance ready. Next up we can get a list of files that were changed in this commit with commit.stats.files:

files = commit.stats.files

print(files)
# ['file1.py', 'scripts/helper.py', 'README.md'] 

Behind the scenes, this runs git diff-tree to compare the commit to its first parent commit, and collects all affected files.

This includes files that were:

  • Modified
  • Added
  • Renamed
  • Copied
  • Or deleted

So it gives us the full set of file changes in that commit.

Handling Renames and Deletions

When accessing file stats, there are some special cases around detecting renames and deletions worth noting:

  • Renamed files will show up in the file list under their new path only. So if helper.py was renamed to helper_module.py, you would only see the updated filename helper_module.py in the file list for that commit.
  • Deleted files are NOT included in the main file list. But you can access deleted files with:
deletes = commit.stats.files.get("del")

# ['old_script.py']

So combining the files list and deletes list gives us the complete set of changed files in the commit.

Also read: How to Rename a File/Directory in Python?

Filtering By Change Type

When you retrieve the list of committed files, they will include files with any type of change:

  • Additions
  • Modifications
  • Copies
  • Renames
  • Deletions

You can also filter to only files that were added, modified, etc.

For example, to only see modified files:

modified_files = commit.stats.files.get("mod") 

print(modified_files)
# ['helper.py', 'README.md']

Options for filtering by change type:

  • added – New files
  • copied – Copied files
  • modified – Modified files
  • renamed – Renamed files

This can be useful for reviewing changes focused on one change type.

Performance Considerations when Using GitPython

Calling commit.stats.files is very convenient. But under the hood, there is a lot of diffing and comparison to generate complete file stats.

For large commits or repos with long histories, getting full file stats can take a while.

If you only need the file names, there is a faster alternative — use git show with name status:

commit_files = repo.git.show(commit, name_only=True, format="%n").splitlines()  

This just prints the filenames without deeper analysis, so is much quicker than comparing full file trees.

The tradeoff here is you only get the names without change types or detects around renames/deletions like commit.stats.files provides.

But when iterating through hundreds of commits, for example, it can make your script much faster!

Full Script Example for GitPython

Let’s walk through a full script to solidify the concepts:

from git import Repo
import datetime

# Set commit author date cutoff 
ONE_WEEK = 7*24*60*60 # Unix timestamp
last_week = datetime.datetime.now() - datetime.timedelta(seconds=ONE_WEEK)

# Open repository
repo = Repo("/path/to/my/repo”)  

print("Getting last week's commits...")

# Get all commits in last week
commits = repo.iter_commits(since=last_week)

for commit in commits:
    print(f"Commit: {commit.hexsha}, Author: {commit.author.name}")
    
    # Get files changed in commit 
    files = commit.stats.files
    print(f"Changed files: {files}")
    
    # Get deleted files 
    deletes = commit.stats.files.get("del") 
    print(f"Deleted files: {deletes}")
    
    # Filter only modified 
    modified = commit.stats.files.get("mod")
    print(f"Modified files: {modified}") 
    
print("Script complete!")   

Running this would print output like:

Getting last week's commits...
Commit: 467de124e, Author: John
Changed files: ['helper.py', 'scripts/process.py']
Deleted files: [] 
Modified files: ['helper.py']

Commit: 9b1aff2fc, Author: Sarah
Changed files: ['README.md', 'docs/quickstart.md'] 
Deleted files: []
Modified files: ['README.md', 'docs/quickstart.md']

Script complete!

So here we:

  • Found commits from the last week
  • Printed commit SHA and author name
  • Listed total changed files
  • Checked for deleted files
  • Filtered to only display modified files

This demonstrates applying the GitPython techniques covered here to efficiently analyze commits and files changed.

Summary

Let’s recap what we covered:

  • GitPython provides Python access to Git repos
  • Easily get commit objects to traverse history
  • Use commit.stats.files to list all changed files
  • Check for deleted files with deletes list
  • Filter file changes by type like modified
  • Option to use faster git show to just get names

With these GitPython building blocks, you can gain powerful insights into commit histories and file changes in your Git repositories.

Whether it’s reviewing recent commits on a branch, analyzing diffs for a troublesome merge, or extracting commit metadata – GitPython has you covered!