Garbage Collection in Python

Garbage Collection In Python

In this article, I will introduce you to the concept of garbage collection in Python. Garbage collection is a way in which Python manages its memory automatically.

It does so with the use of a reference counter. So before we get into the concept of garbage collection, let’s understand what a reference counter is.

What is a Reference Counter in Python?

A reference counter is the number of references made to an object within a running program. It allows the Python compiler to know when a variable is in use, and when it is safe to remove an object from memory.

This reduces the programmers job to keep track of objects filling up the system resources and allows them to focus on creating programs.

How does Garbage Collection in Python Work?

Let’s understand how Python uses reference counters to perform garbage collection on the back-end. We can understand this with a simple example.

We will go through how references are counted first, and then look at how Python identifies when there are no references to an object.

Have a look at the below code:

# Increasing reference count as more variables link to it

reference1 = 9 # Reference count for the value 9, becomes 1
reference2 = reference1 # Reference count for value 9 becomes 2
reference3 = reference1 # Reference count for value 9 becomes 3

# Decreasing reference count as the variable values change
reference2 = 10 # Reference count for value 9 decreases to 2
reference3 = 5 # Reference count for value 9 decreases to 1
reference1 = 1 # Reference count for value 9 decreases to 0

# After the reference value becomes 0, the object is deleted from memory

As it’s clear from the above, the value 9 has no more references in the memory once the value of the last referring variable “reference1” is changed to 1.

Once the Python interpreter sees no references to a value in the entire code, the garbage collector deallocates memory to the value to free up space.

What is a Reference Cycle?

Let’s look at another concept called the reference cycle. In this, we’re simply referring an object from itself. Have a look at the example code below:

>>> a = []
>>> a.append(a)
>>> print a
[[...]]

Further, we will do a=[] and an empty list is created. The a.append() means that we’re going to add something to that list.

In this case: a. So we’re going to add another empty list to this object. So what’s going on here?

If we call a will see there’s two lists here.

So we have created an empty list, then we append that list to itself in the object. So in the object, we got a list and then inside that object the list is getting called again so the reference counter goes up to 1.

But we’re no longer using a, our program doesn’t call it anymore but the reference counter’s at 1.

Python has a way to remove Reference cycles and it doesn’t do it immediately. It does it after so many occurrences of references referencing something and then not referencing something and that’s an occurrence.

So in this case after so many occurrences python will run its garbage collection and it will go into the memory and look at every object.

When it goes in the memory and looks at every object, it’s going to see that this one is referencing itself and our program that no longer calls it but it has a reference count of one but nothing’s calling it.

So it’s going to go ahead and remove that.

How do we know when the garbage collection is going to run?

Well, we can look at that by using a Python module called garbage collection. We will import the garbage collection module by import gc.

We then get the threshold to know when the garbage collection is going to go ahead and catch these reference cycles.

We can pull that information up by typing gc.get_threshold().

import gc
gc.get_threshold()

The above two lines of code display the following output.

(700,10,10)

Let’s take a closer look at the output. What the value ‘700’ means is that after 700 occurrences of references referring to something and then dereferencing it, Python will go ahead and collect the reference Cycles.

In simple terms, after 700 occurrences, Python will run a script or an algorithm that will go through and clean up your memory.

Although Python does this automatically when the reference counter gets to 0 when you have a reference counter stuck at 1 because of a reference cycle. Then only after 700 occurrences will Python run its garbage collection to catch the cycles.

Working with Garbage Collection Manually

We can change this by using the module. We’re not going to cover that in detail in this article, but just be aware you can change it.

The code for the same is as shown below.

The user can also just turn on or off garbage collection. There’s so much you can do with the module.

import gc
gc.disable()  

class Track:
    def __init__(self):
        print("Intitialisting your object here")
    def __del__(self):
        print("Deleting and clearing memory")

print("A")
A = Track()
print("B")
B = Track()

print("deleting here...")
del A
del B  

gc.collect() 

To explain the above code, in short, I have imported the garbage collector module but disabled the garbage collection at the beginning of the code using the gc.disable().

This is to ensure that the auto garbage collection is not done. Then, a class Track is defined with just a constructor and destructor. Two objects have been defined as A and B which print Initialising your object here in the console after defining them.

The objects are then deleted using the del method and these print Deleting and clearing memory in the console upon the successful deletion of an object.

The gc.collect() method ensures that the garbage collector frees up the memory space occupied by the objects A and B.

So when we get there, you’ll see how much we can do with it. But for now, just know that python does a very good job of maintaining managing our memory.

What could be the reason if garbage collection is not taking place?

One other thing I want to point out is if your memory is close to being full and used up, garbage collection will not run because it takes memory for garbage collection to run.

So say your program is very large and it’s using up a lot of memory and there’s not enough to run garbage collection, then you’re going to get a bunch of exceptions and you’ll have a bunch of issues.

So just be aware, if you’re having a lot of issues like that then you might have to get used to the module to run this a little bit earlier in your program.

Conclusion

Hope this article has been insightful. Do let us know what you think in the feedback section below.

References

https://docs.python.org/3/library/gc.html