The mmap Function in Python- A Quick Reference

Mmap function In Python

There are many ways for file I/O in Python and mmap is the coolest but rarely used method for the same. In this tutorial, we will learn the mmap function in Python and we will also learn about memory mapping which is the underlying concept of the mmap function.

What is memory mapping?

Memory mapping is a process through which machine level constructs are used to map a file directly for use in a program from disk. It maps the entire file in the disk to a range of addresses within the computer program’s address space. The program can access the files on the disk in the same way it accesses data from random access memory.

Memory management in a Computer

To clearly understand the process behind memory mapping and working of mmap, let us briefly understand the types of computer memory.

  • Physical Memory: It is the random access memory(RAM) and is a volatile memory. It is available to programs while they are active.
  • Virtual Memory: If we have a 64-bit machine, we can access upto 17 billion gigabytes of data. But in reality, our physical memory is 8 or 16 gigabytes maximum in personal computers. The computer maps the physical memory to its virtual space and uses part of storage disk called swap space to compensate for less memory if we are running larger programs and we don’t have to worry about the size of the file even if it’s too large. Different paging techniques are used for swapping data from disk to memory during usage.
  • Shared Memory: With the help of techniques like paging and virtual memory, multiple programs can run simultaneously using a single physical memory even if it’s low in capacity. The data which isn’t being used is sent to swap memory and data which is to be used is copied to main memory thus all the programs work.

The mmap function uses the concept of virtual memory to make it appear to the program that a large file has been loaded to main memory.

But in reality the file is only present on the disk. The operating system just maps the address of the file into the program’s address space so that program can access the file.

How to use mmap function in Python?

We can use mmap module for file I/O instead of simple file opeartion. Let us understand how to use mmap with the help of following example.

#import module
import mmap

#define filepath
filepath="/home/aditya1117/askpython/sample.txt"

#create file object using open function call
file_object= open(filepath,mode="r",encoding="utf8")

#create an mmap object using mmap function call
mmap_object= mmap.mmap(file_object.fileno(),length=0,access=mmap.ACCESS_READ,offset=0)

#read data from mmap object
txt=mmap_object.read()

#print the data
print("Data read from file in byte format is:")
print(txt)
print("Text data is:")
print(txt.decode())

Output:

Data read from file in byte format is:
b'This is a sample file for mmap tutorial.\n'
Text data is:
This is a sample file for mmap tutorial.

In the example above,

  1. we first import the mmap module
  2. then define the filepath of the file in the disk
  3. then we create the file_object using open() system call
  4. After getting the file object we create a memory mapping of file into the address space of the program using the mmap function
  5. Then we read the data from mmap object
  6. and print the data.

Description of mmap function

mmap_object= mmap.mmap(file_object.fileno(),length=0,access=mmap.ACCESS_READ,offset=0)

mmap requires a file descriptor for the first argument.

The argument length takes the size of memory in bytes to be mapped and the argument access informs the kernel how the program is going to access the memory.

The argument offset instructs the program to create a memory map of the file after certain bytes specified in the offset.

  • The file descriptor for the first argument is provided by the fileno() method of the file object.
  • The length in the second argument can be specified 0 if we want the system to automatically select a sufficient amount of memory to map the file.
  • The access argument has many options. ACCESS_READ allows the user program to only read from the mapped memory. ACCESS_COPY and ACCESS_WRITE offer write mode access. In ACCESS_WRITE mode the program can change both the mapped memory and file but in ACCESS_COPY mode only the mapped memory is changed.
  • The offset argument is often specified 0 when we want to map the file from the start address.

How to write data to memory mapped file?

To write some data to a memory-mapped file, we can use the ACCESS_WRITE option in the access argument and use mmap_object.write() function to write to the file after creating the file object by opening the file in r+ mode.

Here we have to take care of the point that mmap doesn’t allow mapping of empty files. This is due to the reason that no memory mapping is needed for an empty file because it’s just a buffer of memory.

If we will open a file using “w” mode, mmap will cause ValueError.

#import module
import mmap

#define filepath
filepath="/home/aditya1117/askpython/sampleoutput.txt"

#create file object using open function call
file_object= open(filepath,mode="r+",encoding="utf8")
print("Initial data in the file is:")
print(file_object.read())

#create an mmap object using mmap function call
mmap_object= mmap.mmap(file_object.fileno(),length=0,access=mmap.ACCESS_WRITE,offset=0 )

#write something into file
text="Aditya is writing this text to file  "
mmap_object.write(bytes(text,encoding="utf8"))

# read data from file 
nfile_object= open(filepath,mode="r+",encoding="utf8")

print("Modified data from file is:")
print(nfile_object.read())

Output:

Initial data in the file is:
This is a sample file for mmap tutorial in python.

Modified data from file is:
Aditya is writing this text to file  al in python.

An important point we have to keep in mind regarding the above example is that the input should be converted into bytes before writing to mmap.

Also, mmap starts writing data from the first address of the file and overwrites the initial data. If we have to save previous data, we can do so by specifying the appropriate offset in mmap function call.

How to access a certain portion of the file using mmap?

We can access a part of a file directly using mmap objects. mmap objects can be sliced as we use slicing on python lists.

mmap objects show the behavior of strings and many operations which are done on strings can be applied to mmap objects.

mmap objects can be sliced as we use slicing on Python lists. Suppose we want to read from the 10th to 99th character of the file. We can do so as shown in the following example.

#import module
import mmap

#define filepath
filepath="/home/aditya1117/askpython/sample.txt"

#create file object using open function call
file_object= open(filepath,mode="r",encoding="utf8")

#create an mmap object using mmap function call
mmap_object= mmap.mmap(file_object.fileno(),length=0,access=mmap.ACCESS_READ,offset=0 )
print("Complete data in byte format is:")
print(mmap_object.read())
print("Data from 9th to 97th character is:")
#print 10th to 99th character
print(mmap_object[9:98])

Output:

Complete data in byte format is:
b'We can access a part of file directly using mmap objects. mmap objects can be sliced as we use slicing on python lists.mmap objects show the behavior of strings and many operations which are done on strings can be applied to mmap objects.mmap objects can be sliced as we use lists in python. Suppose we want to read from the 10th to 99th character of the file. We can do so as shown in the following example.\n'
Data from 9th to 97th character is:
b'cess a part of file directly using mmap objects. mmap objects can be sliced as we use sli'

Why use mmap in Python?

Simple read/write operations make many system calls during execution which causes multiple copying of data in different buffers in the process.

Using mmap provides us a significant improvement in terms of performance because it skips those function calls and buffers operations especially in programs where extensive file I/O is required.

Conclusion

In this tutorial, first, we have seen what memory mapping is. Then we looked at memory management techniques. Then we saw how to use mmap in Python using various examples and also saw some technical aspects behind the function’s working. Happy Learning!