Determine Directory Sizes in Python like Linux ‘du’ Command

Featured image

In Linux, handling and analyzing file systems is easy because it provides commands like ‘du’ which provides the summary of directories and their sizes. But, you can also achieve a similar purpose by using modules in Python. Here we deep dive into its implementation.

Also Read: How to check if a process is still running using Python on Linux?

Python’s Role in Directory Size Calculation

Python has a collection of diverse libraries that help in accessing files on our devices. The ‘os’ module is one such module that provides functions for interacting with our device’s operating system. It helps us obtain all kinds of data related to our files.

Import-Module

The first step is to import the ‘os’ module, which helps us in working with files and folders across our system.

import os

Define Recursive Function

Recursive functions are methods that run in a loop, till it meets the end condition. We will define a similar function that will provide the size of directories, and loop inside till no subdirectories are left.

def get_file_sizes(directory='.'):
    file_sizes = {}
    with os.scandir(directory) as entries:
        for entry in entries:
            if entry.is_file():
                file_sizes[entry.name] = entry.stat().st_size
    return file_sizes

Here the function ‘get_file_sizes’ will run in loops over the entries in the file path given. It means it will first go to the main folder and then go into files inside one by one, till it reaches the destination file or end. Along with this, it will keep on adding the file’s size.

Format Size Unit

To make it more human-readable, you can add a format function. The function will format the size as per standards.

def format_size(size):
    for uni in ['B', 'KB', 'MB', 'GB', 'TB']:
        if size < 1024.0:
            break
        size /= 1024.0
    return f"{size:.2f} {uni}"

Here the “format_size” function will divide the tracked file sizes by 1024 unless it is converted to an appropriate unit. (KB, MB etc)

Implement a Recursive Function to Calculate Sizes

Here you must add a display function that takes the desired file path and makes a call to previous functions. After obtaining the result, it will print the size of each file as follows:

def print_file_sizes(directory='.'):
    file_sizes = get_file_sizes(directory)
    for file, size in file_sizes.items():
        formatted_size = format_size(size)
        print(f"{file}: {formatted_size}")

print_file_sizes(r'C:\Users\Lenovo\Desktop\gitfile'
terminal_output1
terminal_output1

I have used a demo location for implementing the function. Thus change the file path with the desired directory (whose size you want to calculate) and put ‘r’ before the file path to allow Python to recognise it as a system file.

Also Read: How to check if a process is still running using Python on Linux?

Handling Exceptions

It is important to handle code situations where an error might occur. To avoid it, we should examine all cases where an exception can occur. As we are using a recursive function here, issues like permission errors or unexpected file changes can arise. Let’s look into some of Python’s exception-handling techniques :

We will add a try-except block in the “get_file_sizes” function. In the try block, we will add the code that can lead to exceptions and the except block will catch errors.

def get_file_sizes(directory='.'):
    file_sizes = {}
    try:
        with os.scandir(directory) as entries:
            for entry in entries:
                if entry.is_file():
                    file_sizes[entry.name] = entry.stat().st_size
    except OSError as e:
        print(f"Error accessing {directory}: {e}")
    return file_sizes

terminal_output2
terminal_output2

We included a try-except block to catch any operating system error that might occur during directory access. If any exception occurs, the message in the except block will be printed.

Which Python library can be used to obtain directory sizes?

The most suitable Python library for dealing with file systems is the ‘os’ module. It eliminates the need to import other modules and can easily perform operations with files and folders in your system.

What programming logic can be used for calculating directory sizes in our devices?

Recursion is a programming technique used for accessing large disk spaces. Thus here we added a recursive function that starts with the major directory and enters into sub-folders for tracking sizes along.

Summary

Python offers simple yet powerful capabilities for system interaction. With recursive functions and exception handling, we can easily replicate Linux utilities like ‘du’ in Python itself. Manipulating directories is just one example – we can leverage Python’s versatile standard libraries for many more Linux admin tasks. What other Linux admin tasks can be achieved with Python scripting and its libraries?

References

https://stackoverflow.com/questions/12480367/how-to-generate-directory-size-recursively-in-python-like-du-does

https://docs.python.org/3/library/os.html