EC2 is considered as a basic AWS service. It has gone through long, bumpy path and it differs much from its archetype. Curious about its architectural changes which have made EC2 the fundamental part of most environments launched in AWS.
First 5m below sea level – PV
Generally, it’s believed that AWS started with Xen virtualization type. They began with PV (paravirtualization) which can be considered a lighter form of virtualization. The key factor is provisioning of near native speed in comparison to full virtualization. However, there’s a need for the guest system to be modified to provide awareness of hypervisor and to make efficient hypercells. In other words, these modifications allow the hypervisor to export modified version of underlying hardware to an instance. It’s considered a drawback, as you can image a situation where you’d like to recover or build an EC2 in another AWS region. You’d need to find a matching kernel – which might be time consuming and laborious.
Another thing worth to keeping in mind, is that in such a scenario the kernel would making hypercells instead of well-known privileged instructions, which in turn would lead to significant overheads. Additionally, the storage and network drivers, used by system would be paravirtualized.
10m below sea level – HVM
Having PV onboard, AWS started using another hypervisor configuration which was called HVM. In this scenario, the guest system runs as if it was placed on a bare metal platform and it’s not aware that it’s sharing processing time with other clients on co
At the end of the day HVM can use hardware extensions which provide fast access to underlying hardware without any modifications. If there was a need to take advantage of enhanced networking and GPU processing then HVM is your best choice.
12,5m below sea level – PVHVM
PVHVM brought another improvement to the situation. Firstly, as you’ve probably noticed, paravirtual guest systems tend to perform better with network and storage operations than their younger sibling HVM. Higher performance is achieved via usage of special drivers for I/O instead of emulating network and disk hardware (as in the case of HVM) which significantly reduces the final overhead. After PVHVM release, PV drivers started to be available for HVM guests. Those drivers are also called PV on HVM, which are commonly known as paravirt drivers that use HVM features.
After this change I was a little bit confused: the AWS presented instances that could be run as PV or HVM but as you dig in deeper you come to realize that both options were available: HVM could boot and then run PV drivers or alternatively just run PVHVM (paravirt on HVM drivers). The conclusion is, quite simply, that the instances with HVM label in AWS are HVM with
The key argument, saying that HVM is slower than PV and isn’t sing as much paravirt, became untrue and some companies using AWS on a large scale began seeing benefits in transition from PV to PVHVM for their workloads, mainly relying on CPU and memory levels.
15m below sea level – towards Enhanced Networking
In 2013, after the journey with hypervisors, AWS started introducing instances with hardware virtualization support for network interfaces called SR-IOV(Single Root I/O Virtualization). In SR-IOV model each NIC is a physical function (PF) with full-features PCIe functions and has multiple virtual functions (VF), lightweight PCIe functions which share the PF resources. VFs were designed solely to move the data in and out. The main outcome was the reduction of I/O bottleneck and, of course, a tradeoff between the number of virtual machines a physical server can realistically support. Contrary to Xen virtualized driver, SR-IOV driver running on a cloud instance can perform DMA (Direct Memory Access) to NIC hardware to achieve better performance. What’s really important here is that the DMA operation from the device to Virtual Machine memory does not compromise the safety of the underlying hardware.
Initially AWS implemented Intel 82599 with speed up to 10Gbps and after ENA (Enhanced Networking Adapter) driver announcement in 2016 they boosted it up to 25Gbps, reduced latency and increased the packet rate. After the first release in 2014 it was a stunning breakthrough but the world was still waiting for hardware virtualization for volumes.
AWS long-term goals regarding ENA driver were to take advantage of the higher bandwidth options in the future without the need to install newer drivers or to make other changes to the configuration. One of the features which enabled the performance improvement was Receive Side Steering.
15,5m below sea level – Receive Side Steering
To start from the beginning, we should explain two basic terms: hardware interrupt and softirq. Hardware interrupt is generally a signal from a device that is sent to the CPU when the device needs to perform an input or output operation. In other words, the device ‘interrupts’ the CPU to draw its attention when the CPU is doing something else. Then we’ve got softirq which is similar to hardware interrupt request but not as critical. So, when the data is copied to socket buffer it arrives at NIC, interrupt signal is generated for the CPU. After the interruption, the processor’s interrupt service routine reads the Interrupt Status Register to determine what type of interrupt signal occurred and what action needs to be taken. After that, the acknowledgement to NIC is sent with message “Hey, I’m ready to serve.” Basically, for this reason ‘interrupt’ work is combined with 2 things:
• First one, where the CPU ACK NIC saying “Hey, I’ve got it” at which point hardware interrupt is completed and the NIC returns to the previous job. What‘s worth mentioning is that hardware ‘interrupt’ needs to be quick so the system isn’t held up by prolonged interruptions.
• Second one, where CPU’s backlog queue is put as softirq so whenever it gets chance, it starts processing and moving the packet up to TCP/ IP stack.
Receive side steering is a hardware implementation of NIC which enables a single NIC rx queue to receive softirq workload distributed among several CPUs which in effect prevents network traffic from being bottlenecked on a single NIC hardware queue. In case of mono-queues only, the hardware interrupt generated is from single queue and same CPU is also responsible for processing softIRQ.
Scenarios with bottleneck also happened with RPS enabled on mono-queue, (Receive Packet Steering which is software implementation done for NICs), where the incoming packets are hashed, load is distributed across multiple CPU processors. Therefore, ec2 equipped with ENAs launched new era with greater performance, achieved via such improvements like described RSS.
20m below sea level – deeper EBS optimization
As I’ve mentioned before, ENA provided enhancements in ec2 networking part but the world expected new solutions for handling volumes. For this reason, Amazon Web Services launched EBS optimized instances, with i3 instances, which allowed its users to have dedicated link from an instance to EBS service, instead of sharing one with other AWS services. They used SR-IOV and the nvme (non-volatile memory host) storage driver.
And what is NVMe? So, NVMe is an interface protocol for accessing flash storage via a PCIe bus. Unlike the traditional all-flash architectures which are limited to a single queue, NVMe supports tens of thousands of parallel queues, each with the ability to support tens of thousands of concurrent commands.
With Xen hypervisor underneath, domO is involved in the I/O path. Dom0 as mgmt VM uses nvme driver to access EBS volume, whereas an active ec2 instance, Xen paravirtual split driver model for block network is used to handle I/O. Dom0 takes the request over shared ring from ec2, compiles it and modifies each request as nvme request. After completion dom0 sends the response to ec2.
The main goal of Nitro card release was to provide performance close to bare metal.
22,5m below sea level – local NVMe storage Nitro
In 2017 AWS implemented local nvme nitro cards which protected from unauthorized traffic from local ec2 instance to flash storage. Generally, the key that was mandatory for data retrieving from a flash device, was stored only on a nitro card instead of flash. Therefore, whenever an instance was a key was destroyed and data written on flash device irretrievable.
25m below sea level – 25Gbps Enhanced Networking
With the announcement the Intel chip elimination, AWS moved into a new era of high performance and scalable architecture. They left only the Nitro card where all the processes required for communication with EBS volumes and cryptographic part took place. It brought a lot of benefits, among which were the more available resources on ec2 instances – around 12,4% more for the largest c4 instances. It was all possible because there was no need for dedicated cores in dom0 for taking requests over shared ring from ec2 instances. A good example was the c5 instance which benefited from Nitro card not only in terms of networking but also of EBS connectivity. It used a combination of EBS nvme (custom silicon cards by Annapurnalabs) and ENA for enhanced networking and eliminated need for dom0 management VM.
AWS mentioned that eventually most, or probably all, instances were going to use the Nitro hypervisor.
50m under sea level – Bare Metal!
The most recent instances, announced by AWS during its last re:invent conference, were bare metal (i3.metal), adding no performance overhead. It’s a great choice for applications that need to run in non-virtualized environments, because of the licensing requirements.
As you see Amazon has come a long way to be in a place it’s right now, providing their customers wide variety of instances which allow to meet different kinds of requirements. Time will tell what their next step will be, but I am pretty sure that they haven’t said their last word in this era.