Cookie Warning

Tuesday 28 May 2019

Remove Route53 record on EC2 instance termination with Lambda

Creation auto scaling infrastructure requires connecting lot of small pieces, which in statically created infra are done once and than forgot. One of first problems, which you face, is how to name instances, created by auto scaling, in way that you can always access any of them in straightforward way. For this, I come during my work at Rocket Internet, to straightforward solution. Name of instance start always with it's role name, which in most cases is same as name of Auto Scaling Group and than after dash it contains instance id. With this setup we ensure, that each EC2 instance in ASG can be always easily identified and will have unique name. One of first and I think most common role is bastion host, which is entry point to access all resources in VPC. Therefore our bastion is named as bastion-1234567890abcdef0. Name might look long, but this is mostly due to longer IDs, introduced by AWS some time ago.

To get real use of hostnames, they are also added to our internal zone in Route53. This makes also much easier to connect to instance, by using it's name, rather than IP (and you are also independent, if you use IPv4, IPv6 or DualStack). Another topic is making auto completion of those name available in SSH, but this will be topic for another post.

Everything will work smoothly, but after some time and many scaling up and down, you will notice, that your Route53 is full of rubbish - entries of old instances, which are no longer existing, but still resolvable. What more you could hit some problems on Reverse DNS lookups, due to reuse of private IPs. How fast you will face this problem, depends on your subnet size and amount of scale up/down operations.

There are multiple possible solutions of that problem, some of them:

  • Running cronjob, which will clean up Route53 entries of non existing instances
  • Deleting Route53 records on instance termination via rc.d script
  • Using CloudWatch Events to run Lambda, removing entry after instance termination
Below will focus on third solution - using CloudWatch Events and Lambda - as it's most elegant and following AWS good practices. Guide would be to click through AWS Console or launch bunch of CLI commands to get that result, but currently it's not (or at least shouldn't be) way, how you manage your infrastructure. Of course, for smaller setups you can still do it, but even there you will loose quite fast overview of your infrastructure, not even talking about managing it with more than one person. One of best and I think most widely used, modern tool for such purposes is Terraform from Hashicorp. If you never heard about it, you should definitely check it out, as makes managing infrastructure much easier.


Code on first look seems to be long, for such easy feature, like delete record on instance termination, but all happens due to granular AWS permissions, which must be assigned to each role via policy, setup of logging and other features. As you will look through code, you should easily see, that bit less than first 100 lines are to setup permissions to Lambda, CloudWatch Events and CloudWatch Logs. After that there is creation of zipped code, what is required by AWS for each Lambda function. Security Group is required, as our Lambda function must have access to VPC, so must be also located in it. This happens by passing vpc_config parameter. Last step is to grant CloudWatch Events permission to invoke Lambda function, Without that, each step will work well, but when instance will terminate, you will see only in CloudWatch Logs, that invocation failed. Solving this problem took me actually some time and was cherry on top.
Keep in mind, that there some parts, like your aws_vpc and probably paths to python script may differ.

As talking about Python script, below you will find also code for Lambda function, which delete record, for instance, which was terminated. It assume, that record is created in form instance_name.internal_domain, where domain is configured at the top of file.


Code for Terraform was written with version 0.11.7, which was latest stable during it creation, but should work with 0.12, which is current stable as well. Python code was created for 3.7, as Python 2.X support will be finished with end of this year.

No comments:

Post a Comment