Python YAML Processing using PyYAML

Yaml Python

YAML stands for YAML Aint’ Markup Language. It is widely used to write and store configuration files for many different DevOps tools and applications. It is written as a text file, which can easily be read by humans and is simple to read and understand. It uses .yaml or .yml as extensions. It’s similar to other data serialization languages like JSON, and XML.

Data serialization is the standard format to perform config file transfers and recovery over the network.

A data serialization file written in Python using YAML can be easily sent over the network and then it can be de-serialized using another programming language for usage. It supports multiple languages like Python, JavaScript, Java, and many more.  These languages have YAML libraries that enable them to parse and use a YAML file. In this article, we will use Python to review YAML with some examples. It also supports data structures like lists, dictionaries, and arrays.

YAML vs XML vs JSON

Let’s see an example for a config file to see the three versions to get an overview and the syntax.

YAML – YAML Ain’t A Markup Language
configuration:
  name: my-api-config
  version: 1.0.0
  about: "some description"
  description: |
    Lorem ipsum dolor sit amet, 
    consectetur adipiscing elit
  scripts:
    "start:dev": main.py
  keywords: ["hello", "there"]
XML – Extensible Markup Language
<configuration>
  <name>my-api-config</name>
  <version>1.0.0</version>
  <about>some description</about>
  <description>Lorem ipsum dolor sit amet, consectetur adipiscing elit</description>
  <scripts>
    <start:dev>main.py</start:dev>
  </scripts>
  <keywords>hello</keywords>
  <keywords>there</keywords>
</configuration>
JSON – JavaScript Object Notation
{
    "configuration": {
        "name": "my-api-config",
        "version": "1.0.0",
        "about": "some description",
        "description": "Lorem ipsum dolor sit amet, \nconsectetur adipiscing elit\n",
        "scripts": {
            "start:dev": "main.py"
        },
        "keywords": [
            "hello",
            "there"
        ]
    }
}

Breaking down a YAML File

The YAML file in our example shows some configuration settings for an application. There are some clear differences when compared with the other two config file formats, and they even use a lot more symbol parsing. YAML at its core uses key: value pairs to store data in the file. The keys are supposed to be strings and they can be written with or without quotes as well. The values can take in multiple data types like integers, strings, lists, booleans, and so on.

Proper indentations are expected while writing a YAML file. Using tab spaces is not allowed so we need to be careful else we will be having linting errors in our file. So, it’s really simple and human-readable. We don’t have to parse through multiple symbols while reading it unlike an XML or a JSON file.  

configuration:
  name: my-api-config
  version: 1.0.0
  about: "some description"
  # This is a comment
  description: |
    Lorem ipsum dolor sit amet, 
    consectetur adipiscing elit
  scripts:
    "start:dev": main.py
  keywords: ["hello", "there"]

We can also include comments in our file as written above as well as multi-line strings using the | pipe character as shown in the example code.

YAML File Processing with PyYaml

In this section, we are going to perform some basic operations with the YAML file like reading, writing, and modifying data using PyYaml Module for Python.

  • Installing PyYaml
pip install pyyaml

Reading a yaml file

Let’s say we have a yaml file with some configuration and we want to read the contents using Python.

Filename: config_one.yml

configuration:
  name: my-api-config
  version: 1.0.0
  about: some description
  stack:
    - python
    - django

Next, we will create a new python file and try to read the yml file.

Filename: PyYaml.py

import yaml
with open("config_one.yml", "r") as first_file:
    data = yaml.safe_load(first_file)
    print(type(data))
    print(data)
"""
Output:

<class 'dict'>
{'configuration': {'name': 'my-api-config', 'version': '1.0.0', 'about': 'some description', 'stack': ['python', 'django']}}

"""

Explanation:

We are importing the pyyaml module using import yaml. To read a yaml file, we first have to open the file in read mode and then load the contents using safe_load(). There are multiple loaders because of different constructors like the load() function. Using load() is not secure as it allows the execution of almost any script including malicious code, which is not at all safe. Thus, safe_load() is the recommended way and it will not create any arbitrary objects.

We are printing out the type of data in our yaml file using Python code. The console shows the output as <class dict> and the data contained is formatted as a dictionary, stored as key: value pairs.

Modifying our yaml file

To modify the file that we have loaded, we must first identify the data type. If the value for the key is a string, we must put all the additional values in a list before we can update the key: value pair.

import yaml

with open("config_one.yml", "r") as first_file:
    data = yaml.safe_load(first_file)
    print(type(data))
    # Accessing our <class dict> and modifying value data using a key
    data["configuration"]["stack"] = ["flask", "sql"]
    # Appending data to the list
    data["configuration"]["stack"].append("pillow")
    print(data)


"""
Output:

<class 'dict'>
{'configuration': {'name': 'my-api-config', 'version': '1.0.0', 'about': 'some description', 'stack': ['flask', 'sql', 'pillow']}}

"""

Explanation:

We have a nested dictionary here, and we are accessing the data using the key whose values we are trying to modify. There is also the append() function which adds another item to the list of values. Note that these modifications are performed at runtime only. We will write these values to our new yaml file.

Writing a yaml file with the modified data

The above data along with the modified values can be written in a new file with just a few lines of code.

import yaml

with open("config_one.yml", "r") as first_file:
    data = yaml.safe_load(first_file)
    print(type(data))

    # Accessing our <class dict> and modifying value data using a key
    data["configuration"]["stack"] = ["flask", "sql"]

    # Appending data to the list
    data["configuration"]["stack"].append("pillow")
    print(data)

# Writing a new yaml file with the modifications
with open("new_config.yaml", "w") as new_file:
    yaml.dump(data, new_file)

Filename: new_config.yaml

configuration:
  about: some description
  name: my-api-config
  stack:
  - flask
  - sql
  - pillow
  version: 1.0.0

Explanation:

We will have to provide the new file name with the below syntax and then use yaml.dump with 2 params, the data variable containing the original yaml code along with the changes made to it, and the second param as the new_file variable declared for executing the write method. We can see that the new file retained the code from the original file along with the changes that we applied to it.

Summary

In this article, we went through the fundamental structure of a yaml file and used it for reading, modifying, and writing the configuration to a new file. We also compared it with JSON and XML using different syntax for the same YAML file. The minimalistic approach used to write a YAML file is clearly very simple and human-readable which makes it one of the most popular text format configuration files used by a wide variety of technology stacks.

Reference

PyYAML Documentation