Sometimes, when you face a challenge, you might be able to solve it with routine processes. But other times you need to try something completely new, something that you know nothing about.
Usually in these scenarios you should apply engineering thinking. For me, these moments are the most insightful and I want to share some of mine with the community.
Here I will guide you through the steps that my team and I took when we connected existing AWS infrastructure to a large private network using Direct Connect.
What we'll cover
- Problems to Solve
- What is Direct Connect?
- How to embed it
- Transit VPC using Terraform
- Direct Connect using Terraform
- Peering between main and transit VPCs
- Do you use OpenVPN (optional)?
- Router Service
- Closing thoughts
Problems to Solve
We had services within our VPC that should be able to communicate with other services in a separate virtual private network
In order to establish the connection, we needed to accept an AWS hosted connection from a network provider as part of a signed contract to grant access to the VPN using AWS Direct Connect.
So how were we to implement all of this? How were we going to embed it a current solution that was managed using Terraform? Were there any best practices for doing that?
What is Direct Connect?
AWS Direct Connect makes it easy to establish a dedicated network connection from your premises to your Amazon VPC or among Amazon VPCs. This option can potentially reduce network costs, increase bandwidth throughput, and provide a more consistent network experience than the other VPC-to-VPC connectivity options. (source)
Essentially you have a network provider who has AWS facilities in a shared data centre. Then you both can make a direct connection between your AWS network components and the network using the provider's hardware (literally a patchcord in the nest) with subsequent access.
Generic implementation in terms of AWS looks like the following:
- You configure one or two (reserved) Direct Connections in the console, which creates a Direct Connect Gateway.
- Then you attach a private VIF (one per connection) to the gateway.
- Once you make a few calls with the provider's network engineers and exchange routing policies, it is done.
Usually all instructions regarding how to enable the connection will be sent over to you by the provider.
How to embed it
Our first assumption was that we would enable the connection in the VPC and create the routing configuration to direct connect gateway for the required requests (for example, we'd distinguish them by the header "Host" or by IPs).
On high level, it would look something like this:
During a call with the provider's network engineers, they asked us about our IP range that we advertised to the network. We wondered why. It was because Direct Connect work is declared by a protocol called BGP. If you want more info, there are a lot of videos that will teach you about one of the major Internet protocols that are running under the hood.
Our initial thought was that it needed to be a subnet which contained services that we wanted to access the network. After that, we were asked to configure the subnet
10.1.2.0/24 as an allowed prefix in our Direct Connect configuration.
Long story short, "allowed prefixes" here stand for an IP range that we were going to advertise to the network provider that they would register in the routing policies.
Well, after all that, it did not work. The provider did not "see" our advertised routes despite the fact that we could see them.
A bit of investigation and voilà:
AWS will allocate private IPs (/30) in the 169.x.x.x range for the BGP session and will advertise the VPC CIDR block over BGP. You can advertise the default route via BGP.
Additionally, we found other folks who seemed faced the same issue:
we ended up with creating a new VPC with smaller CIDR our partner wanted.
So basically, the IP's range that you can advertise over Direct Connect is limited up to
/30. Also, you can not advertise subnets – rather you should advertise the whole VPC CIDR.
Our network CIDR was
10.1.0.0/16 and we had an issue with that - it was too large to accept for the network provider. On top of that, during the call we discovered another thing we had to do when connecting to the network: we needed to contact the network IP access management department (if the network was large enough, I suppose) to ask them to provide us with a unique range within the network. Subsequently, it should be our new VPC CIDR.
We decided to create a separate VPC. To get some proofs of work we found some official guides form AWS such as this one. Shortly after this, we learned that the AWS community would start using separate words for that separate VPC - they'd call it a transit VPC.
Before getting a reply to a request a unique IP range in the network, we asked the provider about currently unused IP ranges so we could implement it quickly on our side. This would give us the proof of work we needed for a solution. Everything worked perfectly.
The next step was to implement everything (Direct Connect configurations + VPCs peering) in our existing Terraform configuration.
Transit VPC using Terraform
First of all, before we start to dig into the code, I want to say that you can find all the code below on GitHub here.
Let's first recap what we discussed before. We have conditions where we had an existing VPC. And we wanted some services within it to be able to communicate through the network that we connected to using Direct Connect.
We were granted two AWS-hosted connections (primary and secondary, in order to ensure connection fallback). The main idea was to extend our existing infrastructure somehow. Somehow meant Transit VPC – the solution that helped us integrate with such connections.
Now let's look at some code to represent what we have discussed. The first thing to define is going to be our main VPC. I want to present it for illustration purposes only, so it makes all further steps seem more consistent.
Next, some of the main VPC's parameters are going to be used in the transit VPC. So let's define them as output:
Now we can start to configure our transit VPC. Just for sake of good design, we decided to manage it in a separate state under a separate folder (e.g. tranist-vpc/). Let's first import above outputs as locals:
Next, we can start defining the transit VPC configuration. First, I want to list all variables that we need (pay attention to the IPs of the DNS servers in the network that we want to connect to. You should know them to specify as DNS servers in transit VPC):
And, secondly, the configuration:
Direct Connect using Terraform
Let's continue with the Direct Connect configuration. First, let's define all variables that we need in order to continue. You should get all these values from your network provider. I assume they will be sent over to you (the same worked for us) in a separate document like a spreadsheet:
And now we can do the rest of the configuration:
Now, if you go to your AWS console, next to Direct Connection you should see something like this:
Peering between main and transit VPCs
The last issue to solve is to configure connectivity between our services and transit VPC in order to establish access to the network.
To accomplish this, we decide to use VPC peering. Here we will need some of the locals' variables that we imported before:
Next we need to allow inbound HTTP traffic from the main VPC. That configuration can be done like this:
Great. Now we have two VPCs that are peered and coexist together.
Do you use OpenVPN (optional)?
In our case, we have an OpenVPN server to manage access (SSH) to the main VPC's internal resources. And we wanted to access the transit VPCs resources in the same way. In order to do that we needed to create few additional resources within the transit VPC:
And then add an ingress rule to
transit-vpc-SG that was created on the previous step:
To make all of this work you need to specify the transit VPC's CIDR along with the main VPC's CIDR in the OpenVPN server routing setting under the VPN Setting section:
So now we are almost there. The last thing to do is to design and configure how our services within the main VPC will be able to programmatically access the network.
To recap, the main reason why we've done all of that is that we need to be able to access other services in the network (for example request or submit data). We found two possible ways to achieve that here:
- Migrate required services to the transit VPC and use them there, assigned with new private IPs. Main VPC internal routing should be adjusted. On top of that, any access to DB servers, logs' storage, and so on should be managed as well.
- Create router service (running HAproxy or NGingx) within the transit VPC. Add router private IP to the
hostsfile in each service in the main VPC that wants to access the network so the IP will be resolved behind the required domain name.
We choose the second option as it seemed to be the most aligned with the open-close principle. Here how it approximately looks:
Let's configure it in Terraform:
router_init.sh contains a script to configure and launch the HAproxy service in a container. For illustration purposes, let's assume that we want to access two internal domain names in the network:
The last step is to check that our domains were added to the
hosts file on the instances in the main VPC and start making requests over HTTP.
In this article, I showed you how to integrate Direct Connect into your existing AWS infrastructure. I also talked about how you can efficiently manage it using Terraform.
Then I discussed what approach would be appropriate for a network routing configuration that would make the solution transparent and easy to maintain as much as possible.
Transit VPC, which is recommended by AWS to solve such challenges, was indeed easy to configure. And the approach we tried with router service within transit VPC to access the private network showed its proof of work. But it didn't seem to be any better than other alternatives.
Lastly, I introduced Terraform code snippets are will hopefully be useful for anyone who wants to do something similar.
I hope you enjoyed this article and found it helpful!