Using Hiera with Puppet

With Hiera, you can externalize your systems' configuration data and easily understand how those values are assigned to your servers. With that data separated from your Puppet code, you then can encrypt sensitive values, such as passwords and keys.

Separating code and data can be tricky. In the case of configuration management, there is significant value in being able to design a hierarchy of data—especially one with the ability to cascade through classifications of servers and assign one or several options. This is the primary value that Hiera provides—the ability to separate the code for "how to configure the /etc/ntp.conf" from the values that define "what ntp servers should each node use". In the most concise sense, Hiera lets you separate the "how" from the "what".

The idea behind separating code and data is more than just having a cleaner Puppet environment; it allows engineers to create more re-usable Puppet modules. It also puts your variables in one place so that they too can be re-used, without importing manifests across modules. Hiera's use cases include managing packages and versions or using it as a Node Classifier. One of the most compelling use cases for Hiera is for encrypting credentials and other sensitive data, which I talk about later in this article.

Puppet node data originally was managed through node inheritance, which is no longer supported, and subsequently through using a params.pp module subclass. Before Hiera, it was necessary to modify the params.pp module class locally within the module, which frequently damaged the re-usability of the module. params.pp still is used in modules today, but as of Puppet version 3, Hiera is not only the default, but also the first place checked for variable values. When a variable is defined in both Hiera and a module, Hiera takes precedence by default. As you'll see, it's easy to use a module with params.pp and store some or all of the variable data in Hiera, making it easy to migrate incrementally.

To get started using Hiera with your existing Puppet 3 implementation, you won't have to make any significant changes or code migrations. You need only a hierarchy file for Hiera and a yaml file with a key/value pair. Here is an example of a Hiera hierarchy:


hiera.yaml:

:backends:
    - yaml
:yaml:
  :datadir: /etc/puppet/hieradata
:hierarchy:
  - "node/%{::fqdn}"
  - "environment/%{::env}/main"
  - "environment/%{::env}/%{calling_module}"
  - defaults

And a yaml file:


/etc/puppet/hieradata/environment/prod/main.yaml:
---
$nginx::credentials::basic_auth: 'password'

Hiera can have multiple back ends, but for now, let's start with yaml, which is the default and requires no additional software. The :datadir: is just the path to where the hierarchy search path should begin, and is usually a place within your Puppet configuration. The :hierarchy: section is where the core algorithm of how Hiera does its key/value lookups is defined. The :hierarchy: is something that will grow and change over time, and it may become much more complex than this example.

Within each of the paths defined in the :hierarchy:, you can reference any Puppet variable, even $operatingsystem and $ipaddress, if set. Using the %{variable} syntax will pull the value.

This example is actually a special hierarchical design that I use and recommend, which employs a fact assigned to all nodes called @env from within facter. This @env value can be set on the hosts either based on FQDN or tags in EC2 or elsewhere, but the important thing is that this is the separation of one large main.yaml file into directories named prod, dev and so on, and, therefore, the initial separation of Hiera values into categories.

The second component of this specific example is a special Hiera variable called %{calling_module}. This variable is unique and reserved for Hiera to indicate that the yaml filename to search will be the same as the Puppet module that is performing the Hiera lookup. Therefore, the way this hierarchy will behave when looking for a variable in Puppet is like:


$nginx::credentials::basic_auth 

First, Hiera knows that it's looking in /etc/puppet/hieradata/node for a file named <hostname.domain.tld>.yaml and for a value for nginx::credentials::basic_auth. If either the file or the variable isn't there, the next step is to look in /etc/puppet/hieradata/environment/<prod|stage|dev>/main.yaml, which is a great way to have one yaml file with most of your Hiera values. If you have a lot of values for the nginx example and you want to separate them for manageability, you simply can move them to the /etc/puppet/hieradata/environment/<prod|stage|dev>/nginx.yaml file. Finally, as a default, Hiera will check for the value in defaults.yaml at the top of the hieradata directory.

Your Puppet manifest for this lookup should look something like this:


modules/nginx/manifests/credentials.pp


class nginx::credentials (
  basic_auth = 'some_default',
){}

This class, when included, will pull the value from Hiera and can be used whenever included in your manifests. The value set here of some_default is just a placeholder; Hiera will override anything set in a parameterized class. In fact, if you have a class you are thinking about converting to pull data from Hiera, just start by moving one variable from the class definition in {} to a parameterized section in (), and Puppet will perform a Hiera lookup on that variable. You even can leave the existing definition intact, because Hiera will override it. This kind of Hiera lookup is called Automatic Parameter Lookup and is one of several ways to pull data from Hiera, but it's by far the most common in practice. You also can specify a Hiera lookup with:


modules/nginx/manifests/credentials.pp


class nginx::credentials (
  basic_auth = hiera('nginx::credentials::basic_auth'),
){}

These will both default to a priority lookup method in the Hiera data files. This means that Hiera will return the value of the first match and stop looking further. This is usually the only behavior you want, and it's a reasonable default. There are two lookup methods worth mentioning: hiera_array and hiera_hash. hiera_array will find all of the matching values in the files of the hierarchy and combine them in an array. In the example hierarchy, this would enable you to look up all values for a single key for both the node and the environment—for example, adding an additional DNS search path for one host's /etc/resolv.conf. To use a hiera_array lookup, you must define the lookup type explicitly (instead of relying on Automatic Parameter Lookup):


modules/nginx/manifests/credentials.pp


class nginx::credentials (
  basic_auth = hiera_array('nginx::credentials::basic_auth'),
){}

A hiera_hash lookup works in the same way, only it gathers all matching values into a single hash and returns that hash. This is often useful for an advanced create_resources variable import as well as many other uses in an advanced Puppet environment.

Perhaps Hiera's most powerful feature is the ability to pull data from a variety of back-end storage technologies. Hiera back ends are too numerous to list, but they include JSON, Redis, MongoDB and even HTTP to create a URL-driven Puppet value API. Let's take a look at two useful back ends: Postgres and hiera-eyaml.

To start with the psql back end, you need to install the hiera-psql gem on your Puppet master (or each node if you're using masterless Puppet runs with Puppet apply), with a simple hiera.yaml file of:


:hierarchy:
* 'environment/%{env}'
* default 
:backends: 
* psql 
:psql:
  :connection:
    :dbname: hiera 
    :host: localhost 
    :user: root 
    :password: password

You can do lookups on a local Postgres installation with a single database called hiera with a single table called config with three columns: Path, Key and Value.


path                   key                               value

'environment/prod'   'nginx::credentials::basic_auth'   'password'

This is extremely useful if you want to expose your Hiera data to custom in-house applications outside Puppet, or if you want to create a DevOps Web console or reports.

Storing credentials in Puppet modules is a bad idea. If you store credentials in Puppet and your manifests on an external code repository, you're not only unable to share those manifests with developers with less-secure access, but you're obviously exposing vital security data outside the organization, and possibly in violation of various types of compliance. So how do you encrypt sensitive data in Puppet while keeping your manifests relevant and sharable? The answer is with hiera-eyaml.

Tom Poulton created hiera-eyaml to allow engineers to do just that: encrypt only the sensitive string of data inside the actual file rather than encrypting the entire file, which also can be done with hiera-gpg (a very useful encryption gem but not covered in this article).

To get started, install the hiera-eyaml gem, and generate a keypair on the Puppet master:


$ eyaml createkeys

Then move the keys to a secure location, like /etc/puppet/secure/keys. Your hiera.yaml configuration should look something like this:


hiera.yaml:
---
:backends:
  - eyaml
  - yaml
:yaml:
  :datadir: /etc/puppet/hieradata
:eyaml:
  :datadir: /etc/puppet/hieradata
  :extension: 'yaml'          # <- so all files can be named .yaml
  :pkcs7_private_key: /path/to/private_key.pkcs7.pem
  :pkcs7_public_key: /path/to/public_key.pkcs7.pem
:hierarchy:
  - "node/%{::fqdn}"
  - "environment/%{::env}/main"
  - "environment/%{::env}/%{calling_module}"
* defaults

To encrypt values, you need only the public key, so distribute it to anyone who needs to create encrypted values:


$ eyaml encrypt -s 'password'

This will generate an encrypted block that you can add as the value in any yaml file:


main.yaml:
nginx::credentials::user: slackey       #cleartext example value
nginx::credentials::basic_auth : >      #encrypted example value
ENC[PKCS7,Y22exl+OvjDe+drmik2XEeD3VQtl1uZJXFFF2Nn
   /HjZFXwcXRtTlzewJLc+/gox2IfByQRhsI/AgogRfYQKocZg
   IZGeunzwhqfmEtGiqpvJJQ5wVRdzJVpTnANBA5qxeA==]

Editing encrypted values in place is one of the coolest features of the hiera-eyaml back end. eyaml edit opens a copy of the eyaml file in your editor of choice and automatically decrypts all of the values in the file. Here you can modify the values just as though they were plain text. When you exit the editor by saving the file, it automatically encrypts all of the modified values and saves the new file in place. You can see that the unencrypted plain text is marked to allow the eyaml tool to identify each encrypted block, along with the encryption method that originally was used. This is used to make sure that the block is encrypted again only if the clear text value has changed and is encrypted using the original encryption mechanism:


nginx::credentials::user: user1
nginx::credentials::basic_auth : DEC(1)::PKCS7[very secret password]! 

Blocks and strings of encrypted text can get rather onerous once you have more than a hundred entries or so. Because these yaml files are meant to be modified by humans directly, you want them to be easy to navigate. In my experience, it makes sense to keep your encrypted values in a separate file, such as a secure.yaml, with a hierarchy path of:


:hierarchy:
  - "node/%{::fqdn}"
 - "environment/%{::env}/secure"
  - "environment/%{::env}/main"
  - "environment/%{::env}/%{calling_module}"

This isn't necessary, as each value is encrypted individually and can be distributed safely to other teams. It may work well for your environment, however, because you can store the encrypted files in a separate repository, perhaps in a different Git repository. Only the private keys need to be protected on the Puppet master. I also recommend having separate keys for each environment, as this can give more granular control over who can decrypt different datafiles in Hiera, as well as even greater security separation. One way to do this is to name the keys with the possible values for the @env fact, and include that in the path of the hierarchy. You'll need to encrypt values with the correct key, and this naming convention makes it easy to tell which one is correct:


:pkcs7_private_key: /path/to/private_key.pkcs7.pem-%{::env}
:pkcs7_public_key: /path/to/public_key.pkcs7.pem-%{::env}

When using Hiera values within Puppet templates, either encrypted or not, you must be careful to pull them into the class that contains the templates instead of calling the values from within the template across classes—for example, in the template mytest.erb in a module called mymodule:


mytest.erb:
...
username: user1
passwd: <%= scope.lookupvar('nginx::credentials::basic_auth') %> 
 ↪#don't do this
...

Puppet may not have loaded a value into nginx::credentials::basic_auth yet because of the order of operations. Also, if you are using the %calling_module Hiera variable, the calling module in this case would be mymodule, and not nginx, so it would not find the value in the nginx.yaml file, as one might expect.

To avoid these and other issues, it's best to import the values into the mymodule class and assign local values:


mymodule.pp:
class mymodule {
  include nginx::credentials
  $basic_auth = "${nginx::credentials::basic_auth}"
  file { '/etc/credentials/boto_cloudwatch.cfg':
     content =>  template ("mymodule/mytest.erb"),
  }

And then reference the local value from the template:


mytest.erb:
...
username: user1
passwd: <%= @basic_auth %>

You're now ready to start introducing encrypted Hiera values gradually into your Puppet environment. Maybe after you separate data from your Puppet code, you can contribute some of your modules to the PuppetForge for others to use!

Resources

Docs—Hiera 1 Overview: https://docs.puppetlabs.com/hiera/1

"First Look: Installing and Using Hiera": https://puppetlabs.com/blog/first-look-installing-and-using-hiera

TomPoulton/hiera-eyaml: https://github.com/TomPoulton/hiera-eyaml

dalen/hiera-psql: https://github.com/dalen/hiera-psql

"Encrypting sensitive data in Puppet": https://www.theguardian.com/info/developer-blog/2014/feb/14/encrypting-sensitive-data-in-puppet

Load Disqus comments