One of the most common questions I get as a Technical Account Manager (TAM) from clients considering a migration from their current on-premise data center to AWS is “what are the biggest differences I will need to address?” In an earlier post MySQL to the Cloud! Thoughts on Migrating Away from On-Premise I discussed some of the best practices to actually perform the migration. The goal of this post is to address some of the most common infrastructure and architectural differences I have encountered helping clients make the transition into AWS. For the sake of this post, let’s assume you are going to spin up EC2 instances and manage your own databases.
When comparing AWS EC2 to a traditional on-prem deployment, the flexibility and variance of the instance types can be overwhelming. In traditional on-premise deployments, you provision hardware based on your anticipated traffic and growth over a set time period. This often results in over-allocating hardware to the database tier. In the case where the workload doesn’t match projections or growth isn’t as fast as expected, you can be faced with lots of idle capacity.
As a DBA, while I would hope my projections are always accurate, I’d also much rather over-provision my hardware rather than limit my performance through inadequate processing power. I also would like the ability to tailor my hardware to the workload as needed rather than acquire several racks of identical machines and force the usage patterns onto my hardware.
With a wide variety of instance types (compute-optimized, memory-optimized, etc) available to target different workloads, you are much more likely to properly provision your servers. You also have the ability to start with smaller instances and scale up the hardware as your workload actually increases (or downsize if you don’t hit your projections).
Depending on how you currently manage networking in your on-premise deployment, there are likely some significant differences when you migrate to AWS. Features such as floating IPs (via commands such as ip or through a tool like keepalived ) that are fairly trivial when you fully control the network are not mapped 1:1 in AWS.
While you do have control over your network to some extent when using a VPC, IP addresses are attached differently. If you wanted to attach a specific IP address to an instance, you would need to associate an Elastic IP (which will be public). This will need to use AWS CLI tools, reference the instance IDs of the servers, and be a very different approach.
Along the same lines as networking, there are differences in the I/O options as well. When looking at traditional on-premise deployments, you typically have local storage for your active data directory and often some type of network storage (SAN, nfs, etc) for things like backups. You are generally locked into a single storage class across your fleet and standardize towards the general purpose.
However, in AWS, you have numerous options for managing your database tier’s IO layer. The primary options include:
When using ephemeral storage, your data won’t persist after stopping an instance. However, in some designs (such as synchronous replication with Percona XtraDB Cluster ), this isn’t a huge problem as durability comes from the cluster rather than the local storage.
When using EBS volumes, you can often achieve the performance needed while gaining extra flexibility. EBS volumes allow you to choose the storage tier (magnetic vs SSD) and even provision the IOPs capacity you need. If you have an I/O bound workload, the flexibility in EBS can allow you to tailor instances specific to your needs. These volumes can also be dynamically expanded so you don’t have to over-provision the size initially and hope it grows. Rather, you can size the storage tier to what you need now and grow as your storage requirements increase.
In part two of this post, I’ll cover the biggest differences I’ve encountered from an architectural point of view. Check back soon!