To get real use of hostnames, they are also added to our internal zone in Route53. This makes also much easier to connect to instance, by using it's name, rather than IP (and you are also independent, if you use IPv4, IPv6 or DualStack). Another topic is making auto completion of those name available in SSH, but this will be topic for another post.
Everything will work smoothly, but after some time and many scaling up and down, you will notice, that your Route53 is full of rubbish - entries of old instances, which are no longer existing, but still resolvable. What more you could hit some problems on Reverse DNS lookups, due to reuse of private IPs. How fast you will face this problem, depends on your subnet size and amount of scale up/down operations.
There are multiple possible solutions of that problem, some of them:
- Running cronjob, which will clean up Route53 entries of non existing instances
- Deleting Route53 records on instance termination via rc.d script
- Using CloudWatch Events to run Lambda, removing entry after instance termination
Below will focus on third solution - using CloudWatch Events and Lambda - as it's most elegant and following AWS good practices. Guide would be to click through AWS Console or launch bunch of CLI commands to get that result, but currently it's not (or at least shouldn't be) way, how you manage your infrastructure. Of course, for smaller setups you can still do it, but even there you will loose quite fast overview of your infrastructure, not even talking about managing it with more than one person. One of best and I think most widely used, modern tool for such purposes is Terraform from Hashicorp. If you never heard about it, you should definitely check it out, as makes managing infrastructure much easier.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
resource "aws_cloudwatch_event_rule" "ec2_terminate" { | |
name = "ec2_terminate" | |
description = "Remove record from R53 on EC2 scale down" | |
event_pattern = <<PATTERN | |
{ | |
"source": [ | |
"aws.ec2" | |
], | |
"detail-type": [ | |
"EC2 Instance State-change Notification" | |
], | |
"detail": { | |
"state": [ | |
"terminated" | |
] | |
} | |
} | |
PATTERN | |
} | |
resource "aws_cloudwatch_event_target" "lambda" { | |
rule = "${aws_cloudwatch_event_rule.ec2_terminate.name}" | |
target_id = "SendToLambda" | |
arn = "${aws_lambda_function.remove_ec2_from_route53.arn}" | |
} | |
resource "aws_iam_role" "iam_for_lambda" { | |
name = "iam_lambda_remove_ec2_from_route53" | |
assume_role_policy = <<EOF | |
{ | |
"Version": "2012-10-17", | |
"Statement": [ | |
{ | |
"Action": "sts:AssumeRole", | |
"Principal": { | |
"Service": "lambda.amazonaws.com" | |
}, | |
"Effect": "Allow", | |
"Sid": "" | |
} | |
] | |
} | |
EOF | |
} | |
resource "aws_iam_role_policy" "lambda_role_policy" { | |
name = "lambda_remove_ec2_from_route53_policy" | |
role = "${aws_iam_role.iam_for_lambda.id}" | |
policy = <<EOF | |
{ | |
"Statement": [ | |
{ | |
"Effect": "Allow", | |
"Action": [ | |
"ec2:Describe*", | |
"ec2:CreateNetworkInterface", | |
"ec2:DeleteNetworkInterface", | |
"ec2:DescribeNetworkInterfaces", | |
"route53:ChangeResourceRecordSets", | |
"route53:TestDNSAnswer" | |
], | |
"Resource": "*" | |
} | |
] | |
} | |
EOF | |
} | |
resource "aws_iam_policy" "lambda_logging" { | |
name = "lambda_logging" | |
path = "/" | |
policy = <<EOF | |
{ | |
"Version": "2012-10-17", | |
"Statement": [ | |
{ | |
"Action": [ | |
"logs:CreateLogStream", | |
"logs:PutLogEvents" | |
], | |
"Resource": "arn:aws:logs:*:*:*", | |
"Effect": "Allow" | |
} | |
] | |
} | |
EOF | |
} | |
resource "aws_iam_role_policy_attachment" "lambda_logs" { | |
role = "${aws_iam_role.iam_for_lambda.name}" | |
policy_arn = "${aws_iam_policy.lambda_logging.arn}" | |
} | |
data "archive_file" "lambda_function" { | |
type = "zip" | |
source_file = "files/remove_ec2_from_route53.py" | |
output_path = "files/remove_ec2_from_route53.zip" | |
} | |
resource "aws_security_group" "lambda" { | |
name = "allow lambda to internet" | |
description = "Allow outgoing traffic for Lambda" | |
vpc_id = "${aws_vpc.sysops.id}" | |
egress { | |
from_port = 0 | |
to_port = 0 | |
protocol = "-1" | |
cidr_blocks = ["0.0.0.0/0"] | |
} | |
} | |
resource "aws_cloudwatch_log_group" "remove_r53" { | |
name = "/aws/lambda/remove_ec2_from_route53" | |
retention_in_days = 14 | |
} | |
resource "aws_lambda_function" "remove_ec2_from_route53" { | |
filename = "files/remove_ec2_from_route53.zip" | |
function_name = "remove_ec2_from_route53" | |
role = "${aws_iam_role.iam_for_lambda.arn}" | |
handler = "remove_ec2_from_route53.lambda_handler" | |
source_code_hash = "${data.archive_file.lambda_function.output_base64sha256}" | |
runtime = "python3.7" | |
vpc_config { | |
subnet_ids = ["${aws_subnet.sysops_private_subnets.*.id}"] | |
security_group_ids = ["${aws_security_group.lambda.id}"] | |
} | |
depends_on = ["aws_iam_role_policy_attachment.lambda_logs", "aws_cloudwatch_log_group.remove_r53"] | |
} | |
resource "aws_lambda_permission" "allow_cloudwatch" { | |
statement_id = "AllowExecutionFromCloudWatch" | |
action = "lambda:InvokeFunction" | |
function_name = "${aws_lambda_function.remove_ec2_from_route53.function_name}" | |
principal = "events.amazonaws.com" | |
} |
Keep in mind, that there some parts, like your aws_vpc and probably paths to python script may differ.
As talking about Python script, below you will find also code for Lambda function, which delete record, for instance, which was terminated. It assume, that record is created in form instance_name.internal_domain, where domain is configured at the top of file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python3.7 | |
import boto3 | |
import socket | |
INTERNAL_DOMAIN = 'krzyzakp.internal' | |
ec2_client = boto3.client('ec2') | |
r53_client = boto3.client('route53') | |
def lambda_handler(event, context): | |
instance_id = event['detail']['instance-id'] | |
instance_details = get_instance_details(instance_id) | |
remove_from_r53(instance_details) | |
def get_instance_details(instance_id): | |
tags = ec2_client.describe_tags(Filters=[ | |
{ | |
'Name': 'resource-id', | |
'Values': [instance_id] | |
}, | |
{ | |
'Name': 'key', | |
'Values': ['Name'] | |
} | |
]) | |
hostname = '{0}.{1}'.format(tags['Tags'][0]['Value'], INTERNAL_DOMAIN) | |
private_ip = socket.gethostbyname(hostname) | |
return { | |
'hostname': hostname, | |
'private_ip': private_ip | |
} | |
def remove_from_r53(hostname_details): | |
response = r53_client.change_resource_record_sets( | |
HostedZoneId='ZUIP836UDHCG7', | |
ChangeBatch={ | |
'Changes': [ | |
{ | |
'Action': 'DELETE', | |
'ResourceRecordSet': { | |
'Name': hostname_details['hostname'], | |
'Type': 'A', | |
'TTL': 300, | |
'ResourceRecords':[{'Value': hostname_details['private_ip']}] | |
} | |
} | |
] | |
} | |
) | |
return response |
Code for Terraform was written with version 0.11.7, which was latest stable during it creation, but should work with 0.12, which is current stable as well. Python code was created for 3.7, as Python 2.X support will be finished with end of this year.