Using YAML to Configure Complex Environments

By Thorsten Scherf, 15 February, 2023

The DevOps world without YAML is difficult to imagine. In fact, YAML is a superset of JavaScript Object Notation (JSON). However, the focus of JSON is more on data serialization (e.g., to make data available to an API).

In contrast, YAML plays to its strengths when used as a configuration language because the format is more easily readable than JSON. Python programmers love YAML because, unlike JSON, it uses indentations instead of parentheses to define objects.

Basic YAML Syntax

Listing 1 shows a simple YAML document. The --- string in the first line means a file can contain several such documents; it is then followed by typical key-value pairs, which are familiar if you have used JSON. The first key pair is a simple scalar with a string value, although numbers and booleans are also allowed. The list that follows is a collection of objects. In this case, only numeric values are used, each of which is indented with spaces.

Listing 1: YAML Objects

---
name: starwars collection
year of publication:
    - 1977
    - 1980
    - 1983
movies:
# Only movies from the original trilogy (OT) are listed here.
    ot:
       - Episode IV - A New Hope
       - Episode V - The Empire Strikes Back
       - "Episode VI - Return of the Jedi Knights."

You should avoid using tabs if possible because they can cause issues when processing the data. By the way, you do not have to write strings in parentheses, as shown in the final line of Listing 1. This collection of key-value pairs is a dictionary. Unlike JSON, you can also work with comments in YAML without problems. Comments are introduced at the beginning or end of a line with the hash mark (#).

To process the data stored in this way with Python, you could use the PyYAML module, which converts YAML objects into Python dictionary (dict) objects, which you can then process further according to your own requirements. Listing 2 shows a simple example of the Python script reading data from the starwars.yaml file and forming it before output.

Listing 2: starwars.py

#!/usr/bin/env python3
 **
import yaml
from yaml.loader import SafeLoader
 **
with open('starwars.yaml') as f: sw = yaml.load(f, Loader=SafeLoader) print(yaml.dump(sw, indent=4, default_flow_style=False))

Command-Line YAML Parser

The yq (yq documentation), parser is a very good choice for processing configuration files written in YAML. Because it is based on the well-known JSON jq parser, it uses very similar syntax. A nice side effect is that you can process JSON data with yq, as well. If your Linux distribution does not offer a preconfigured yq package, simply install the software directly from the GitHub page:

wget https://github.com/mikefarah/yq/releases/download/v4.14.1/yq_linux_amd64 -O ~/bin/yq
chmod u+x ~/bin/yq

On macOS, you can also import the software with the Homebrew package manager.

Searching YAML Documents

A typical task when processing YAML files is to search for a specific key and the value assigned to that key. For example, if you want to filter out all years of publication from the starwars.yaml file, use:

yq eval ".publication-year[]" starwars.yaml

If you only want to know when the first movie was released, put an index on the first element of the list:

yq eval ".publication-year[0]" starwars.yaml

Listing 3 contains a slightly more extensive YAML document that describes a Kubernetes pod. For an initial overview of what keys this file contains, run the command:

yq eval keys pod.yaml

Listing 3: Kubernetes Pod in YAML

apiVersion: v1
kind: Pod
metadata:
    name: my-pod
spec:
    containers:
    - name: db1-container
       image: k8s.gcr.io/busybox
       env:
       - name: DB_URL
          value: postgres://db_url:5431
    - name: db2-container
       image: k8s.gcr.io/busybox
       env:
       - name: DB_URL
          value: postgres://db_url:5432

You can view a list of all container names with:

yq eval ".spec.containers[].name" pod.yaml

The command

yq eval '.spec.containers[].env[].value | select(. == "*32")' pod.yaml
 postgres://db_url:5432

filters with the select function.

Validating Values

The validation of certain values with the length function can be quite useful:

yq eval ".spec.containers[].name | length" pod.yaml

A YAML template can also be easily modified by yq to create configurations for different environments. For example, if you want to insert the hostname of your production database into the URL variable of the first container in the pod.yaml template, you can use:

yq eval '.spec.containers[0].env[0].value = "postgres:// prod.example.com:5431"' pod.yaml > prod-pod.yaml

To make sure the change is visible not just on the screen, the modified YAML document has been written to a new file named prod-pod.yaml, which now contains the modification, as shown with the command:

yq eval ".spec.containers[0].env[0].value" prod-pod.yaml
postgres://prod.example.com:5431

With Kubernetes, this function proves to be extremely useful, because you can use it to change existing configurations immediately. For example, you can simply forward the output of the Kubernetes kubectl tool with