Linux on Azure—a Strange Place to Find a Penguin
Linux enthusiasts might think the idea of running a Linux virtual machine on Microsoft's Azure service is like finding a penguin sun tanning in the Sahara. Linux in the heart of the Microsoft cloud? Isn't that just wrong on so many levels?
Why would anyone want to run Linux on Microsoft servers? For the hobbyist, I suppose for the same reason people climb Mount Everest: because it's there. For the business user, the prospect of spinning up Linux VMs in Microsoft's fabric offers new options for collocating open-source technologies with existing Microsoft Azure services. For the cloud market in general, more competition is good news for consumers.
The Cloud Marketplace
Virtual machines in the form of virtual private servers (VPSes) have been offered for nearly a decade from a galaxy of providers, using virtualization technologies such as Xen, Virtuozzo/OpenVZ and KVM. These providers subdivide a physical server into multiple small virtual servers. Users typically subscribe on a monthly basis, with an allotment of memory, disk and network bandwidth.
Later vendors, such as Amazon, Rackspace and now Microsoft, offer the same service with a finer-grained commitment. Users can spin up a VM (or a hundred) by the hour, pay for bandwidth by the gigabyte and utilize more advanced features, such as private networks, SAN-like storage features, offloaded database engines and so on.
All of this diversity is good news for end users. In 2002, a VPS with 128MB cost nearly $100/month. In 2006, you could get a VPS with 512MB of RAM for $40/month. Today, such VPSes can be found for less than $5/month in the VPS market or for pennies per hour from cloud providers.
Microsoft Enters the Market
Amazon enjoyed early success with its Elastic Compute Cloud and other vendors, such as Rackspace, soon followed suit. Microsoft originally opted for a different, more complex cloud strategy. Azure was built as a "platform as a service" offering (see the Cloud Flavors sidebar) in which developers could write applications that ran in various roles and talked to Azure APIs. In theory, this allowed developers to concentrate on code and not worry about the abstracted hardware underneath.
In practice, developers were forced to write Azure-centric applications and adoption was slow. Many enterprises with mixed Windows/Linux environments found that hosting their own self-managed servers on Amazon and other cloud environments was more attractive than spending time porting and debugging their applications.
In 2012, Microsoft added "infrastructure as a service" (virtual machines) offerings to its lineup, allowing users to run and administer Windows and Linux virtual machines they directly control.
Azure virtual machines are still in "Community Preview", which is Microsoft lingo for "Beta". Support is limited to forums, and as you'll see, some sandpapering of the offering is still needed. However, after using the service for a couple months, I find Linux on Azure to be stable, easy to use and performs well. At the time of this writing, Microsoft has not set a date for General Availability.
Cloud Flavors
Cloud-based services come in several different forms, depending on what's being abstracted and provisioned.
-
Software as a Service (SaaS): the provider runs an application and exposes an interface to subscribers. This is simply using a Web-based application. Examples include Salesforce.com, NetSuite and 37 Signals' Basecamp.
-
Platform as a Service (PaaS): the provider offers a platform that users can use to build applications. Subscribers write and provide code, which runs on abstracted hardware and software services. Examples include Microsoft's Azure original offerings, Oracle's Fusion cloud or Google's App Engine.
-
Infrastructure as a Service (IaaS): the provider delivers virtual machines and other infrastructure pieces that users can configure as they like. Examples include Amazon's EC2, Rackspace Cloud, Google Compute and Microsoft's Azure. Virtual Private Servers also are IaaS offerings with a different financial model.
Comparing Azure to Amazon EC2
Azure's chief competition is Amazon EC2, and it's not hard to see that Microsoft patterned its IaaS offering after its rival's success.
Like EC2, Azure is priced by the hour, and the rates are similar. While in Community Preview, pricing is slightly discounted compared to expected General Availability pricing. Actual price comparisons for hourly VMs depend on how long a term commitment is made. For example, Amazon offers both spot instances and prepaid reserved instances, while Microsoft also discounts longer-term commitments. Storage and bandwidth pricing are very similar. In general, running a VM on Azure will cost you about the same as EC2, which is probably just what Microsoft planned.
EC2 offers features Microsoft has not caught up to yet. For example, the underlying storage virtual machine disk (Elastic Block Storage) can be snapshotted to S3 storage. However, many Amazon services have parallels in the Azure world. For example, Amazon's SimpleDB is analogous to Azure's Tables. Both vendors offer complex networking features, caching, monitoring and Content Delivery Network options. In either environment, a VM can be sited in the Americas, Europe or Asia, with global CDN nodes.
The Azure value proposition is not "we are a better cloud" but rather "you can do EC2-like things here alongside your Azure platforms". For shops that have deep Microsoft deployments, were early adopters of Azure or want to develop applications that move into and out of Azure, the new IaaS offerings will be appealing.
Taking Azure for a Spin
Using Microsoft Azure requires a free Windows Live account, as well as a credit card to open a charge account. If your employer participates in the Microsoft Developer Network (MSDN) program, you already may be entitled to a free quantity of Azure services every month. Once your account is set up, you can head to the management portal and start adding services.
The Azure control panel is, quite simply, gorgeous. Perhaps pretty controls are not a big selling point for a cloud service, but the Azure interface is marvelously interactive. As you set up services, messages on the status of operations appear at the bottom of the screen asynchronously. Performance graphs and history are integrated into the display, and the panel feels much more like a desktop app than a "click-submit-and-wait" Web interface.
Figure 1. The Azure management portal is easy to use and attractively designed.
Figure 2. The Azure portal displays task message and status.
Azure offers several flavors of Linux: CentOS 6.2, Ubuntu 12.04, SUSE Linux Enterprise Server and OpenSUSE 12.1. It's possible to roll your own image and upload it, but this requires working with Microsoft's Hyper-V server product, which is something the average Linux user is unlikely to have handy.
Figure 3. Five different Linux options are offered when creating a VM.
For this article, I create a CentOS 6.2 VM called "penguin1". A DNS name is created automatically for the VM in the cloudapp.net domain, which then can be CNAME'd if you own your own domain.
Figure 4. Setting Up penguin1 in Azure
VMs are not directly exposed to the Internet, but rather are given 10.x IP addresses. Inside the Azure panel, users then can configure endpoints to open firewall ports and map them as they like. For example, to set up a Web server, it's necessary to create a port 80 (and perhaps 443) endpoint, which can be mapped to any port desired on the VM.
Figure 5. Adding a Network Endpoint to Set Up a Web Server
This network firewall is a nice security feature. By default, only port 22 (SSH) is configured. If you intend to change your default SSH port (as often is done to prevent script-kiddie scanning), you'll need to change the endpoint in the Azure management portal as well. You also have the option of changing it in the management portal and mapping it back to 22 on the VM.
Creating "From Gallery" gives you the most options for creation. After supplying basic information, such as name and size, provisioning begins immediately and takes about ten minutes to complete in my experience.
What's the Root Password!?!?
Users of other VPS systems may wonder where they specify the root
password. The answer is simple: you don't. Whatever account you specify
will be given sudo authority to su
to root:
[andrew@penguin1 ~]$ sudo su -
[sudo] password for andrew:
[root@penguin1 ~]#
From that point, you could set the root password if you want with passwd.
Storage
Linux virtual machines have three types of storage.
First, each is given a 30GB root volume. Storage is locally redundant and optionally can be made geographically redundant for about a 33% cost increase. Unfortunately, short of creating your own template, there is no way to modify this 30GB configuration if you prefer a different filesystem layout or want a smaller monthly storage bill. Root volume data persists across reboots and is a permanent BLOB in Azure storage. So if you delete a VM, it's possible to retain its root volume and later mount it up on another system or use it as the root volume for a new VM.
You also can create as many other volumes as you'd like. These live as BLOBs in Azure storage and are persistent. They function much like SAN volumes, allowing you to create and attach them to one VM, then later unmount and attach to another. Unfortunately, there is no way to resize these volumes, which is a disappointing limitation. While you can create a larger volume, move data and delete the old volume, this obviously is not a scalable approach.
When you create a new disk and attach it to the VM, it appears as a new SCSI device that you can mkfs and mount. After creating a new 20GB disk in the Azure portal, penguin1's dmesg shows:
scsi 4:0:0:0: Direct-Access Msft Virtual Disk 1.0 PQ: 0 ANSI: 4
sd 4:0:0:0: Attached scsi generic sg3 type 0
sd 4:0:0:0: [sdc] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 0f 00 10 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled,
supports DPO and FUA
sdc: unknown partition table
sd 4:0:0:0: [sdc] Attached SCSI disk
Now you can fdisk, mkfs and mount:
[root@penguin1 ~]# fdisk /dev/sdc
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-2610, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-2610, default 2610):
Using default value 2610
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@penguin1 ~]# mkfs.ext4 /dev/sdc1
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
1310720 inodes, 5241198 blocks
262059 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
160 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736,
1605632, 2654208, 4096000
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 21 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
[root@penguin1 ~]# mkdir /data
[root@penguin1 ~]# mount /dev/sdc1 /data
[root@penguin1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
28G 2.0G 24G 8% /
tmpfs 872M 0 872M 0% /dev/shm
/dev/sda1 485M 86M 374M 19% /boot
/dev/sdb1 69G 180M 66G 1% /mnt/resource
/dev/sdc1 20G 172M 19G 1% /data
As you might expect, if I were to delete this disk in the Azure portal, the operation would fail unless I first unmounted it.
Finally, virtual machines also come with a variable amount of truly local storage. This storage does not live in the Azure cloud but rather is provisioned on the actual physical node. If there is a hardware or other fault in the Azure fabric and your VM migrates to a new host, data stored in this space is lost. This filesystem is meant to be used for state information, scratch data and other temporary files. On Linux images, it shows up as /mnt/resource. Small VMs receive a 70GB filesystem, and the size increases with VM size, up to 800GB for Extra Large VMs.
I have experienced a hardware fault on Azure. My VM shut down, migrated to a new hardware host and booted up on the new server. Services that I'd defined to start up came up normally, all network endpoints were moved correctly, disks I'd mounted were remounted, and the VM worked fine. However, the scratch data I had in /mnt/resource was lost, and the filesystem was empty, as designed.
Rough Edges
The CentOS image could use some improvement. Provided by OpenLogic, I've noted some things that make me scratch my head:
-
A swap partition is configured, but does not appear in /etc/fstab. Because there isn't much advantage to creating a swap partition but not using it, this is presumably an oversight.
-
Some default services run without justification. For example, why is CUPS needed? I'm unlikely to print in the cloud. Given that RAID redundancy is provided by the Azure storage layer and software RAID is not needed, why is mdmonitor set to run at boot?
-
I periodically receive crash reports from fprintd. Removing this service is straightforward, but why the CentOS image is created to support fingerprint biometric authentication in a virtual, cloud-based environment mystifies me.
-
iptables is enabled with a single rule to accept bootp, though the INPUT chain has a default ACCEPT policy anyway.
Hopefully, these things will be corrected as Linux settles into Azure.
Performance
In general, I/O performance is excellent. Measuring with ioping, I found the /data disk I created earlier averaged 6ms latency, while /mnt/resource showed a zippy 0.4ms. The root disk was a slower 18ms, but as Microsoft explains in its documentation, it optimizes the I/O performance on volumes tagged "OS Disk" differently:
The operating system disk and data disk has a host caching setting (sometimes called host-cache mode) that enables improved performance under some circumstances. However, these settings can negatively affect performance in other circumstances, depending on the application. Host caching is OFF by default for both read operations and write operations for data disks. Host-caching is ON by default for read and write operations for operating system disks. As noted, these should work best in most cases. However, your mileage may vary. We recommend you place data intensive operations on a data disk separate from the OS disk.
Figure 6. The Integrated Display of Performance Metrics in the Azure Portal
Compute performance depends on the VM size you select. Under the covers, Microsoft is using AMD gear, as evidenced by /proc/cpuinfo:
processor : 1
vendor_id : AuthenticAMD
cpu family : 16
model : 8
model name : AMD Opteron(tm) Processor 4171 HE
stepping : 1
cpu MHz : 2094.702
cache size : 512 KB
Summary
Some may say the best thing about Linux in Azure is that it gives competition to Amazon and Google. For the end user, performance and pricing are very similar, while the enterprise administrator may find the ability to collocate Linux alongside Windows and Azure-specific deployments to be advantageous. While running Linux in the Azure cloud might seem like a penguin pitching his tent in Mordor, one more quality player in the cloud space is good news for all.