Load Distribution Strategies
Load Patterns - Resource Contention
Let's consider a single data center VM, which is an EC2 instance that we have provisioned on AWS. What are the limitations on the physical resources for this particular instance? It has limited:
- Compute Capacity The instance has a specified capacity of instruction execution. More complex requests require larger numbers of compute cycles to fulfill. Which means that as the volume of requests or the complexity of requests increases, the total number of requests that the instance can serve goes down.
- Memory Capacity The instance has a specified capacity of physical memory. Requests that use more memory may prevent other requests or tasks from running. Again as the number of requests goes up or if certain requests make higher demands on memory, the throughput of the application will most likely go down.
- Disk I/O The instance has a cap on the number of reads and writes that can be performed per second. Same as CPU and memory, as a request requires more reads and writes from the disk, the instance's ability to service requests diminishes.
- Network I/O - The instance also has a limit on how much network traffic can pass through the instance. Larger data packets on requests/responses will increase resource contention, and will slow down the traffic for all requests passing through the instance.
In practice any request causes resource contention on all of the resources mentioned above. What happens when a resource is overwhelmed? Let's assume that your service consumes all of the available memory and the OS needs to perform some operations to keep the instance alive. The lack of free memory, or Out of Memory (OOM), will result in yous system's inability to allocate more memory and the kernel usually kills the current running task. The outcome is usually reduced QoS and/or dropped requests.
So how to address this problem of resource contention? So far we have addressed this problem of resource contention by either provisioning bigger or more instances. But this costs more money. And even then, there are always limitations on capacity. We need to make sure that we distribute the load evenly so that all our data center VM instances are equally utilized.
Round Robin
Using Round-Robin, each request is distributed to the data center instances in a circular order, handling all processes without priority or starvation.
Why is Round Robin a good load distribution strategy? This approach is one of the simplest approaches to implement.
When could Round-Robin become ineffective? An important drawback to this approach is that certain resource-heavy client requests may aggregate and hit a subset of the data center VM, causing a few nodes to get overwhelmed while others remain underutilized. This means both cost and QoS are being compromised.
Will Round-Robin always struggle with heterogeneous loads? The short answer is no. The reason is that it depends on the amount of clients making requests and the number of data center instances serving the requests. So for a sufficiently large system and client pool, it is has a low probability of overwhelming a single data center VM. However it is not completely unlikely and it depends on the service and the expected load profile.
Intelligent Load Distribution Strategies
In light of what we discussed above, can we do better? The answer is of course yes but at a cost. Instead of simply routing requests in circular order (or even simpler, at random), before we do so, we can try to establish which is the best data center VM to service a particular request, before sending the request to that resource. How can we identify the best data center VM?
We need to observe the traffic to try to establish what type of resource contention we are dealing with for a request. We need to keep track of the resource utilization of each data center VM. We need to decide, based on the utilization, where to place the next incoming request, so that the QoS of each request is uniform. Can you think of a good strategy that will help achieve all of the above (some of these are described below)? Can you think of a good data structure to help you keep track of resource utilization of data center VMs and then decide which one to choose for the next request?
Is this approach strictly better than Round-Robin? The answer to this is no. As you can imagine, implementing all the steps above requires more resources, more computation and hence, introduces additional latency in the system. Frequent polling of the data center VMs may also trigger an overhead or resource contention at the data center VM. It is important to note that this overhead is beneficial only if it improves QoS and cost in a significant way.
Here are a couple of intelligent load distribution strategies.
Request execution time based strategies
These strategies use a priority scheduling algorithm, whereby request execution times are used in order to judge the most appropriate order of load distribution. The main challenge in using this approach is to predict the execution time of a particular request.
Resource utilization based strategies
These strategies use the resource utilization on each data center VM to balance the utilization across all data center VMs. The load balancer maintains an ordered list of data center VMs based on their utilization and thus directs requests to the least loaded node.