Secure Server Deployments in Hostile Territory
Would you change what you said on the phone, if you knew someone malicious was listening? Whether or not you view the NSA as malicious, I imagine that after reading the NSA coverage on Linux Journal, some of you found yourselves modifying your behavior. The same thing happened to me when I started deploying servers into a public cloud (EC2 in my case).
Although I always have tried to build secure environments, EC2 presents a number of additional challenges both to your fault-tolerance systems and your overall security. Deploying a server on EC2 is like dropping it out of a helicopter behind enemy lines without so much as an IP address.
In this article, I discuss some of the techniques I use to secure servers when they are in hostile territory. Although some of these techniques are specific to EC2, most are adaptable to just about any environment.
Behind Enemy Lines
So, what makes EC2 so hostile anyway? When you secure servers in a traditional environment, you may find yourself operating under a few assumptions. First, you likely assume that the external network is the main threat and that your internal network is pretty safe. You also typically assume that you control the server and network hardware, and if you use virtualization, the hypervisor as well. If you use virtualization, you probably also assume that other companies aren't sharing your hardware, and you probably never would think it is possible that a malicious user might share your virtualization platform with you.
In EC2, all of those assumptions are false. The internal and external network should be treated as potentially hostile. The server and network hardware are under someone else's control. Someone else also controls the hypervisor that manages all of the virtual machines, and your virtual machines definitely share hardware with other companies. Finally, although it may not be something that happens every day, it's definitely possible that your virtualization neighbor might be malicious.
EC2-Specific Practices
Although many of the practices I describe here could be used in just about any environment, a few of them are specific to EC2, but even then, you may find ways to map these notions to other cloud environments. Most of these practices revolve around Security Groups. EC2 Security Groups can be thought of in some ways like a VLAN in a traditional network. With Security Groups, you can create firewall settings to block incoming traffic to specific ports for all servers that are members of a specific group. Unlike traditional VLANs, you can create firewall rules within Security Groups that block traffic between members of that group. Servers can be members of multiple Security Groups, although it's important to know that Security Groups are assigned only when an instance is created—you can't add or remove Security Groups from an instance after you create it.
I generally use Security Groups like most people might use VLANs only with some changes. Every group of servers that share a common purpose have their own Security Group. All groups block all outside traffic by default, and I open ports only as I need them. For the most part, Security Groups allow no external access. I also have kept the "default" Security Group EC2 puts in place and make every server a member of that group as well; however, I lock down that group and use it only when I want to grant access from a different Security Group to all of my servers. For instance, I might use changes to the default Security Group to allow all servers to talk to my Puppetmaster server on its custom port. As another example, I use a VPN to access my cloud network, and that VPN is granted access to SSH into all of the servers in my environment.
Finally, I never store a secret in my userdata file. Often when you spawn a server in EC2, you provide the server with a userdata file. A number of AMIs (Amazon Machine Images—the OS install image you choose) are configured to execute the userdata script. Although in some cases this file is used to pass specific configuration values on to the server, many people (myself included) use the file as a post-install script. In my case, I use it to configure my configuration management system (Puppet) and from that point on let it take over the configuration of the system. What you may not know is that the contents of the userdata script are available via an API call to any user who is on the system throughout the life of the instance. If you use the userdata file to inject any sort of secrets (certificates or SSH private keys, passwords or shared secrets the system uses in its configuration, or anything you wouldn't want a regular user to see), those secrets will be visible to any user on the system. In fact, if you happen to use Puppet yourself (or otherwise have facter installed on the system), facter itself will return the contents of that userdata script for you.
Handling Secrets
It's incredibly important to think about how you manage secrets in a cloud environment beyond just the userdata script. The fact is, despite your best efforts, you still often will need to store a private key or password in plain text somewhere on the system. As I mentioned, I use Puppet for configuration management of my systems. I store all of my Puppet configuration within Git to keep track of changes and provide an audit trail if I ever need it. Having all of your configuration in Git is a great practice, but the first security practice I recommend with respect to secrets is to avoid storing any plain-text secrets in your configuration management system. Whenever possible, I try to generate secrets on the hosts that need them, so that means instead of pushing up a GPG or SSH key pair to a server, I use my configuration management system to generate one on the host itself.
Another practice I put in place is to store secrets in ramdisk whenever possible. This avoids the problem of securely deleting files on a hard drive that may go away at any moment. We just wrote a basic module in our configuration management system that automates the process of moving a sensitive file to /dev/shm, named after its checksum, and creating a symlink in the original location. Of course, if you go this route, you will need to be aware that any files stored in ramdisk will disappear after reboot (that's the whole idea), so don't store anything there you can't regenerate after a reboot, although I generally just respawn instances instead of reboot them in the cloud.
In many cases, the secret I generate on the host doesn't ever need to leave the system. In other situations, I may need to share the public component of a key pair to other servers. In that instance, Puppet imports and stores the generated public key in its database and then can export it onto any other host that needs it. In some cases, I've had to write some custom Puppet facts to aid in exporting some of this data.
Even with all of these practices in place, sometimes it is a lot simpler just to store a database password in your configuration management system. In those cases, I use an add-on module to Puppet called hiera-gpg that allows you to store GPG-encrypted YAML files with your Puppet configuration that contain sensitive configuration settings. Each independent environment (such as development and production) has its own Puppetmasters and its own GPG keys. Each sysadmin adds each of these keys to his or her own GPG keychain, and we use a wrapper script to aid in editing these hiera-gpg files. The wrapper script is told what file to edit and what environment it's for, and it makes sure the file is encrypted so that any sysadmin and the Puppetmaster in that particular environment can open it.
These techniques barely scratch the surface of the sorts of measures you should put in place when hosting sensitive servers in the cloud, and this is by no means the complete list of systems I have in place. In a future article, I will follow up with other techniques I use to keep my servers safe in hostile territory.