Date:
30 September 2013
Author:
Rob Amos

One of the challenges in the new world of Amazon Web Services’ EC2-VPC environment is the ability to automatically replace (“heal”) a single EC2 instance. Inside a VPC, this is easily possible: create an auto-scaling group from your AMI, set the minimum and maximum group size appropriately and go; Auto Scaling will take care of the rest. But what if you need to map an Elastic IP to those instances? Inside the VPC you’re isolated from the EC2 API, and those of other services.

Traditional Approach

The usual way to approach this problem looks like this:

AWS - Auto-Healing Public EC2 Instances

When an Auto Scaling event occurs, such as the Auto Scaling service replacing your failed instance, it posts to a Simple Notification Service (SNS) topic, which is then delivered to a Simple Queuing Service (SQS) queue. A separate EC2 instance with some custom code monitors the SQS queue and re-maps your Elastic IP onto the new instance.

Given the polling cycles involved, this can take several minutes to perform, during which your new EC2 instance is unreachable from the Internet.

Elastic Load Balancing

You can always use Elastic Load Balancing in front of your single instance. You can then “load balance” across a single instance, and the ELB will be updated automatically by the Auto Scaling Group when your instance is replaced.

But, as Elastic Load Balancers do not support a static IP address, this was not an appropriate solution for my needs.

My Approach

The base problem here is that a newly started instance inside a publicly addressable VPC Subnet cannot reach the EC2 API as it only has a private IP Address. But, many VPC scenarios include private subnets and a NAT instance (or if you’re doing things right, multiple NAT instancesExternal Link ) to allow traffic from the private subnets to reach the general Internet.

AWS - Auto-Healing Public EC2 Instances

If this is you, read on.

Mangling the Default Gateway

With NAT instances available, the solution became obvious:

  1. On boot, change the default gateway to point to your NAT instance.
  2. Connect out to the EC2 API to forcefully re-map the Elastic IP to yourself.
  3. Change the default gateway back to the VPC’s router.
AWS - Auto-Healing Public EC2 Instances

This will work across multiple Availability Zones or subnets, but there needs to be a NAT instance in the same subnet as the instance you are booting. For this particular scenario, an Auto Scaling Group with a minimum and maximum group size of 1 was created across two Availability Zones, allowing spawning of a replacement instance in a different AZ, should the current one become available. To accomplish that, you’ll need to keep a mapping of subnet to NAT instance addresses.

The EC2 Metadata Service will not be available through the NAT instance; any metadata returned will be that of the NAT instance. Add a static route to the Metadata Service IP via the normal gateway first.

Code Sample

Here’s how we did it with the AWS PHP SDK 2.

This sample was written for Ubuntu instances and assumes the user executing the script has sudo access. The sample is copy/pasted from working code, but has not itself been tested.

/**
 * AssociateElasticIP.php
 *
 * Forcefully associates the Elastic IP with this instance
**/

// Make sure we include the AWS stuff we need
require_once dirname(__FILE__) . '/aws/aws-autoloader.php';

use Aws\Common\Aws;
use Aws\Ec2\Ec2Client;
use Aws\Common\InstanceMetadata\InstanceMetadataClient;

class AssociateElasticIP
{
	/**
	 * The external Elastic IP that this instance should forcefully take.
	 *
	 * @var string
	**/
	private $externalIP = '1.1.1.1';

	/**
	 * NAT Instance IP
	 *
	 * The IP Address of the NAT instance in the same subnet as the instance on
	 * which this script is running. You should keep a mapping on a per-subnet
	 * basis and select the appropriate NAT instance.
	 *
	 * @var string
	**/
	private $natInstanceIP = '172.31.0.10';

	/**
	 * Default Gateway IP
	 *
	 * This is the IP of the default gateway assigned by the VPC. You should
	 * probably detect this dynamically.
	 *
	 * @var string
	**/
	private $defaultGatewayIP = '172.13.0.1';

	/**
	 * Runs the recipe
	 *
	 * @return void
	 * @author Rob Amos
	**/
	public function run ()
	{
		// do we have a public IP already bound?
		$publicIP = $this->getMetadata('meta-data/public-ipv4');
		if (!empty($publicIP) && $publicIP == $this->externalIP)
			return;

		// no public IP means we need to change the default route on the instance
		if (!$publicIP)
			$this->changeDefaultGateway();

		// find our instance Id
		$instanceId = $this->getInstanceMetadata('meta-data/instance-id');
		if ($instanceId === false)
		{
			$this->resetDefaultGateway();
			throw new Exception('Could not obtain instance ID from the metadata service.');
		}
		}
		// find the Elastic IP's details
		$aws = Aws::factory();
		$ec2 = $aws->get('ec2');
		$response = $ec2->describeAddresses(array('PublicIps' =>
array($this->externalIP)));
		if (!$response || !isset($response['Addresses']) || count($response['Addresses']) == 0)
		{
			$this->resetDefaultGateway();
			throw new Exception(sprintf('Could not obtain information from the EC2 API about the
Elastic IP %s.', $this->externalIP));
		}
		}
		// bind it to this server
		$ip = $response['Addresses'][0];
		$params = array
		(
			'AllocationId' => $ip['AllocationId'],
			'InstanceId' => $instanceId,
			'AllowReassociation' => true
		);
		$response = $ec2->associateAddress($params);
		if (!$response)
		{
			$this->resetDefaultGateway();
			throw new Exception(sprintf('Could not associate Elastic IP %s with Instance %s.',
$this->externalIP, $instanceId));
		}
		$this->resetDefaultGateway();
	}
	}
	/**
	 * Changes the default gateway on the server to our NAT instance - typically
syd-gw.core.rdas.com.au
	 *
	 * @return void
	 * @author Rob Amos
	**/
	private function changeDefaultGateway ()
	{
		@exec(sprintf('/usr/bin/sudo /sbin/ip route add 169.254.169.254 via %s',
$this->defaultGatewayIP), $output, $exitCode);
		if ($exitCode !== 0)
			throw new Exception(sprintf("Could not add a separate route for the metadata service.
Output: \n%s", join("\n", $output)));

		@exec(sprintf('/usr/bin/sudo /sbin/ip route del default via %s && /usr/bin/sudo
/sbin/ip route add default via %s', $this->natInstanceIP), $output, $exitCode);
		if ($exitCode !== 0)
			throw new Exception(sprintf("Could not change the default gateway. Output: \n%s",
join("\n", $output)));
	}
	}
	/**
	 * Removes the NAT instance from the list of default gateways - its no longer necessary.
	 *
	 * @return void
	 * @author Rob Amos
	**/
	private function resetDefaultGateway ()
	{
		@exec(sprintf('/usr/bin/sudo /sbin/ip route del default via %s && /usr/bin/sudo
/sbin/ip route add default via %s', $this->natInstanceIP, $this->defaultGatewayIP),
$output, $exitCode);
		if ($exitCode === 0)
			throw new Exception(sprintf("Could not change the default gateway. Output: \n%s",
join("\n", $output)));

		@exec(sprintf('/usr/bin/sudo /sbin/ip route del 169.254.169.254 via %s',
$this->defaultGatewayIP), $output, $exitCode);
		if ($exitCode !== 0)
			throw new Exception(sprintf("%s: Could not remove the route for the metadata service.
Output: \n%s", $this->recipe, join("\n", $output)));
	}
	}
	/**
	 * Obtains the appropriate metadata result from the Instance Metadata Service
	 *
	 * @param	string		$uri	The metadata URI to load
	 * @return 	string				The page content, or false on failure.
	 * @author 	Rob Amos
	**/
	protected function getInstanceMetadata ($uri)
	{
		$client = InstanceMetadataClient::factory();
		$client->waitUntilServiceAvailable();

		// load it up from the metadata service
		try
		{
			$request = $client->get($uri);
			$request->getCurlOptions()->set(CURLOPT_TIMEOUT,
1)->set(CURLOPT_CONNECTTIMEOUT, 1);
			return $request->send()->getBody(true);

		} catch (Exception $e)
		{
			return false;
		}
		}	}
		}	}}
		}	}}
$recipe = new AssociateElasticIP();
$recipe-run();

Thanks to our resident AWS certified tech Rob for this blog post! Learn more about AWS hereExternal Link .

Subscribe to Salsa Source

Subscribe to Salsa Source to keep up to date with technical blogs. 

Subscribe