4 September 2021

JSON and YAML are file formats used to structure data. Both are primarily used so that software can parse the data, but JSON and YAML files are also fairly easy to read by humans. For that reason the formats are also frequently used for configuration files, such as Dockerfiles.

The formatting rules for both JSON and YAML are quite strict. For instance, both use whitespace indentation to structure data, and to indent lines you should use spaces rather than tabs. There are also strict rules for how key-value pairs, lists and arrays are formatted. I won’t cover the exact syntax, as there are already hundreds of useful resources available online. The official documentation for JSON and YAML are also good starting points (but a bit dry). In this article I instead cover how you can parse JSON and YAML files on the command line.

JSON

To illustrate parsing JSON and YAML I will use the output of a whmapi1 command. The utility is available on all cPanel servers and is used to perform admin tasks from the command line. For instance, the below command lists information about domains owned by the user example, formatted as JSON:

# /usr/local/cpanel/bin/whmapi1 \
--output=jsonpretty \
get_domain_info \
api.filter.enable=1 \
api.filter.a.field=user \
api.filter.a.arg0=example
{
   "metadata" : {
      "result" : 1,
      "reason" : "OK",
      "command" : "get_domain_info",
      "version" : 1
   },
   "data" : {
      "domains" : [
         {
            "php_version" : "ea-php74",
            "modsecurity_enabled" : 1,
            "parent_domain" : "example.com",
            "docroot" : "/home/example/public_html",
            "ipv6" : null,
            "ipv4_ssl" : "84.18.207.55",
            "port" : "80",
            "domain_type" : "parked",
            "ipv6_is_dedicated" : 0,
            "ipv4" : "84.18.207.55",
            "user" : "example",
            "user_owner" : "root",
            "domain" : "example.acai.temporarywebsiteaddress.com",
            "port_ssl" : "443"
         },
         {
            "domain" : "example.com",
            "user" : "example",
            "user_owner" : "root",
            "port_ssl" : "443",
            "ipv4" : "84.18.207.55",
            "ipv6_is_dedicated" : 0,
            "domain_type" : "main",
            "port" : "80",
            "ipv6" : null,
            "ipv4_ssl" : "84.18.207.55",
            "docroot" : "/home/example/public_html",
            "parent_domain" : "example.com",
            "modsecurity_enabled" : 1,
            "php_version" : "ea-php74"
         }
      ]
   }
}

There are few things to note about the output. Firstly, there are two top-level nodes: metadata contains information about the command itself (i.e. the name of the command and the exit status) and data contains the actual data.

The data node contains another node: domains. This contains information about the two domains owned by example. The various bits of data are key-value pairs.

Parsing JSON files

Applications can filter specific fields. For instance, you might have a script that retrieves the domain and the php_version fields (and then perhaps updates the PHP version if it is no longer supported). A commonly used tool for parsing JSON files is jq. It is not installed by default on most RHEL-based servers but available from the EPEL repository on RHEL7 servers and the Appstream repo on RHEL8 systems.

To give you an idea of how to use jq, the below command prints just the data.domains node. It is worth mentioning that I have also changed the --output option to json rather than jsonpretty. The former option prints a blob of JSON data without any indentation. However, because I pipe the output to jq the formatting used by whmapi1 no longer matters – the formatting is now done by jq:

# /usr/local/cpanel/bin/whmapi1 \
--output=json get_domain_info \
api.filter.enable=1 \
api.filter.a.field=user \
api.filter.a.arg0=example \
| jq '.data.domains[]'
{
  "php_version": "ea-php74",
  "modsecurity_enabled": 1,
  "parent_domain": "example.com",
  "ipv4_ssl": "84.18.207.55",
  "ipv6": null,
  "docroot": "/home/example/public_html",
  "domain_type": "parked",
  "port": "80",
  "ipv4": "84.18.207.55",
  "ipv6_is_dedicated": 0,
  "port_ssl": "443",
  "domain": "example.acai.temporarywebsiteaddress.com",
  "user_owner": "root",
  "user": "example"
}
{
  "ipv4": "84.18.207.55",
  "ipv6_is_dedicated": 0,
  "domain": "example.com",
  "user_owner": "root",
  "user": "example",
  "port_ssl": "443",
  "docroot": "/home/example/public_html",
  "ipv6": null,
  "ipv4_ssl": "84.18.207.55",
  "port": "80",
  "domain_type": "main",
  "parent_domain": "example.com",
  "php_version": "ea-php74",
  "modsecurity_enabled": 1
}

To print specific values you can add a pipe inside the jq command and specify the fields you want to print inside curly brackets:

# /usr/local/cpanel/bin/whmapi1 \
--output=json \
get_domain_info \
api.filter.enable=1 \
api.filter.a.field=user \
api.filter.a.arg0=example \
| jq '.data.domains[] | {user, domain, domain_type}'
{
  "user": "example",
  "domain": "example.acai.temporarywebsiteaddress.com",
  "domain_type": "parked"
}
{
  "user": "example",
  "domain": "example.com",
  "domain_type": "main"
}

And you can also just print the values (without the keys). In that case you leave out the curly brackets and just provide a comma-separated string with keys. The main gotcha is that each key has to start with a dot:

# /usr/local/cpanel/bin/whmapi1 \
--output=json \
get_domain_info \
api.filter.enable=1 \
api.filter.a.field=user \
api.filter.a.arg0=example \
| jq '.data.domains[] | .user, .domain, .domain_type'
"example"
"example.acai.temporarywebsiteaddress.com"
"parked"
"example"
"example.com"
"main"

YAML

By default, whmapi1 formats its output in YAML. I personally find the output more human-readable than JSON. There are fewer brackets, and keys and values don’t need to be inside double quotes. Here is what the output of the command I ran earlier looks like in YAML:

# /usr/local/cpanel/bin/whmapi1 get_domain_info \
> api.filter.enable=1 \
> api.filter.a.field=user \
> api.filter.a.arg0=example
---
data:
  domains:
    -
      docroot: /home/example/public_html
      domain: example.acai.temporarywebsiteaddress.com
      domain_type: parked
      ipv4: 84.18.207.55
      ipv4_ssl: 84.18.207.55
      ipv6: ~
      ipv6_is_dedicated: 0
      modsecurity_enabled: 1
      parent_domain: example.com
      php_version: ea-php74
      port: 80
      port_ssl: 443
      user: example
      user_owner: root
    -
      docroot: /home/example/public_html
      domain: example.com
      domain_type: main
      ipv4: 84.18.207.55
      ipv4_ssl: 84.18.207.55
      ipv6: ~
      ipv6_is_dedicated: 0
      modsecurity_enabled: 1
      parent_domain: example.com
      php_version: ea-php74
      port: 80
      port_ssl: 443
      user: example
      user_owner: root
metadata:
  command: get_domain_info
  reason: OK
  result: 1
  version: 1

Parsing YAML files

A handy utility for parsing YAML is yq. It is not as full-featured as jq and the utility is not installed by default on RHEL-based servers (and not available in the repositories either). However, you can simply download the utility and make it executable. At the time of writing the latest version is 4.11.2. Here, I install jq in the /root/bin directory:

# wget -O /root/bin/yq https://github.com/mikefarah/yq/releases/download/v4.11.2/yq_linux_amd64
# chmod u+x /root/bin/yq

# yq --version
yq (https://github.com/mikefarah/yq/) version 4.11.2

yq works much like jq. For instance, you can print just the user, domain and domain_type fields in the data.domains node. The eval option is used evaluate the expression ('.data.domains[] | (.domain, .domain_type)') and the trailing dash (-) is used when you pipe the output of a command to yq.

# /usr/local/cpanel/bin/whmapi1 get_domain_info \
api.filter.enable=1 \
api.filter.a.field=user \
api.filter.a.arg0=example \
| yq eval '.data.domains[] | (.domain, .domain_type)' -
example.acai.temporarywebsiteaddress.com
example.com
parked
main

Note that the output is sorted alphabetically, which is not quite what we want. Also, there is currently no easy way to include the keys in the last command. There are various tricks you can try to get the output exactly how you want it, but it is much easier to simply pipe the output to grep:

# /usr/local/cpanel/bin/whmapi1 get_domain_info \
api.filter.enable=1 \
api.filter.a.field=user \
api.filter.a.arg0=example \
| yq eval '.data.domains[]' - \
| grep -E ^"domain|domain_type"
domain: example.acai.temporarywebsiteaddress.com
domain_type: parked
domain: example.com
domain_type: main