The Resource based algorithms described in this chapter seek to maintain the process mix hypothesis, which is:
Process Mix: The distribution of a balanced mix of CPU, memory and IO bound processes to each host will lead to the more effective use of resources and delay the onset of bottlenecking.
Difficulties arise from restrictions imposed by both the real time constraints of a computer system, and the implementation practicalities of a load distribution mechanism. Either of these restrictions can degrade or obscure the effect of load distribution, and make the algorithm useless for evaluating the hypothesis.
To truly maintain the process mix of a system, would require an instantaneous and free redistribution of all processes in the system. The problem can be simplified to one of reducing the imbalance in demand for individual resources within a host and between the hosts, which is a much more realistic proposition.
This approximation is equivalent to the process mix hypothesis, under the restriction that only one process or job may be distributed at any one time. This equivalence arises in the following way:
The remainder of this section is devoted to describing the implementation details of the resource based algorithms. The common metric and fitting techniques are developed in sections 8.3.1 to 8.3.2 and the resulting initial placement and process migration algorithms are presented in sections 8.3.3 and 8.3.4 respectively.
The n resources a host can provide to service processes can be considered to form an n dimensional space. In particular, if a host provides CPU, file IO and memory, then these form a 3 dimensional space, in which a single point locates the current state of the host.
If the values of resource use are constrained to the range of
, then a completely unloaded host would
locate at the origin <0,0,0> and a completely loaded host would
be located on the opposite corner at <1,1,1>. This space is
illustrated in figure
Figure 8.3: The 3 dimensional space used to describe current host state,
and the three points indicate potential host loads.
Examination of this space identifies two separate components of the load on a host:
To enable hosts to be ranked for placement suitability, a single
metric is required to combine these two load components. One metric
is to measure the Euclidean distance between the point and the origin,
or in slightly different terms, the norm (length) of the vector
from the origin to the point. This is shown graphically with two
resource dimensions in Figure
, is ranked equal to a
less balanced, yet more lightly loaded
host a.
This tradeoff between balance and load works due the property that a vector on the diagonal always encloses a greater volume than one of equal length offset to either side.
Figure 8.4: The different volumes enclosed by two vectors with the
same norm (length).
The area (load) alone is not suitable for ranking hosts a and b, as this ranking would always prefer a to b, and therefore does not account for the fact that resource 1 is closer to saturation. The variance of the host resources is also not sufficient on its own, as it provides no means of detecting an overloaded host (just an unbalanced one).
Thus if two hosts are to be compared, the norm provides a simple means of ranking them when both balance and load are significant.
Section 8.3.1 detailed how a 3 dimensional vector can be used to describe the current state of the host, and that the norm of this vector can be used to rank host loads. Jobs can also be described and ranked using these same resource dimensions, and if they are expressed in the same units as the host, the job can be added to and subtracted from the host vector. The unit that most naturally expresses both the resource availability on a host, and resource demand from a process is utilisation. This enables jobs to be trial fitted to hosts in order to select the one that is most suitable.
There are three standard methods of fitting resource requests to resources available that can be adapted from the domain of physical memory allocation.
There are a number of differences in the application of the physical memory fitting algorithms when used for allocating host resources. Firstly, physical memory is a finite resource and cannot be over allocated, whereas host resource limits are softer and can withstand a degree of overallocation yet still satisfy the request. Secondly, the ideal situation is to completely allocate a segment of physical memory leaving other segments completely free (ignoring fragmentation), this is opposite to the desired effect of host resource allocation, where it is desirable to have all hosts with the same degree of allocated resources (in other words, as condensed versus as spread, as possible).
At this point it is useful to describe the fitting methods in more detail before discussing the actual load distribution algorithms.
First fit consists of two functions, the selection and the fitting. These two phases are often intertwined as in the fastest response of V [TLC85], described in section 2.3.5. The essence of this system is that the first host to reply to a multicast request, by its very reply implies that it has sufficient resources to service the request. The first fit implementation described in this section has a more explicit fitting phase.
For a resource based algorithm, first fit can be implemented by adding the components of the job and host vectors, and comparing the results to the maximum value defined for each component. If none of the maximums are exceeded, the job is considered to fit, and is executed on that host.
The best fit from Cena et al. [CCG95] detailed in section 4.2.5, attempted a best fit based on specified job resource requirements. The goal of this scheme is to ensure the highest degree of remaining allocable resources in a heterogeneous system. It did not attempt to mix the processes on a host to ensure complementary behaviour.
Best fit requires that the resource request is most closely satisfied, which has the direct effect of loading up already loaded hosts, in order to leave the larger `more allocable' blocks free. This is contrary to the basic aim of load distribution, and for this reason alone, best fit is not pursued any further in this thesis.
Worst fit operates in the opposite way to best fit. Instead of
choosing the closest fit, worst fit selects the host that results
in the largest leftover. This is implemented by selecting the
host for which the addition of the job results in the smallest
norm as shown graphically by figure
Figure 8.5: In this 2D space formed by resources 1 and 2, there are two hosts M1 and M2, of equal load. M1 is heavily utilising resource 1 and host 2 is heavily utilising resource 2. Job j, which is resource 1 bound, is trial added to both hosts. The resulting length of L2 is less than L1, and therefore the job is allocated to M2.
This fitting technique is used to approximate the maintenance of the process mix on each host.
From this discussion on resource fitting it is clear that there are two fitting techniques suitable for load distribution, first and worst fit. Finding the worst fit approximates the maintenance of a balanced process mix, and is a reasonable approach for both initial placement and process migration. In contrast, first fit does not maintain the process mix nor is it a natural location policy for process migration, and as such, the only first fit algorithm implemented uses initial placement.
As the fitting techniques are borrowed from physical memory allocation, the names are somewhat unsuitable. I will however continue to refer to each technique by its original name to avoid confusion.
Resource based initial placement needs the resource usage of a job in order to fit it to the most suitable host. Since initial placement requires that jobs are scheduled before execution, any resource requirements must be provided by estimates based on historical behaviour.
First fit initial placement (ff-ip) is a direct implementation of section 8.3.2.
The candidate selection (or to be more accurate, the candidate eligibility) is based on the availability of predicted job characteristics, where the expected lifetime of a job j must be greater than the cost of placing it:
where,
The location selects the first host that can meet the
resource requirements of an incoming job
:
where
is the maximum amount of resource R that the host cs
can supply.
The implementation of the first fit algorithm uses the periodic global state
information provided by the periodic updates. First fitting occurs in a
cyclic fashion starting at the host following the last successful fit.
The new load is estimated after each placement and this is described
on page
.
Thus ff-ip can be summarised by:
First fit uses the standard prediction mechanism to provide job resource requirements prior to execution.
Worst fit initial placement (wf-ip) seeks to locate the host i that results in
the shortest norm of the sum of the host,
, and job,
,
vectors.
The candidate selection policy is the same as that used by average and first fit, with:
Given the job is eligible, the fit of the job is computed for each host, and the destination host is the:
No specific favour is given to the local host, other than in the computation of the job's contribution to IO utilisation, where the utilisation estimate uses the cheaper local IO cost.
In summary, wf-ip consists of the following:
As an alternative to using a load threshold, the location policy could be extended to consider the difference between the local and selected host's norms. If they are within some closeness factor, say a percentage difference, then the local host should be selected.
The load distribution mechanism supplies periodic global state updates to all of the load distribution policies. This presents a problem for the two initial placement fitting algorithms, as no immediate feedback to a job placement is provided. Therefore fittings subsequent to the first in each period are made with data that is known to be out of date, and similar problems to those encountered by the least loaded initial placement algorithm may arise.
One advantage of using predicted resource requirements is that the same trial fitting technique that is used to select the destination host, can be used to provide an estimate of the load after job placement. Thus the result of the fitting computation is used to directly update the load value stored for the destination host.
No explicit action is taken to correct the load estimates, even when a short job (less than the remaining period) is scheduled, as I have assumed that the periodic state update is sufficient to correct for this.
Process migration presents greater choice in the implementation of resource balancing than the equivalent initial placement scheme. The primary reason being, that any process being executed in the system may be selected at any time, compared to initial placement, where only an arriving job can be considered for remote execution.
Therefore two process migration approaches are presented for worst fit, rather than only one as with initial placement.
Whilst on the surface these two approaches appear similar, the first chooses the best candidate to suit the source and destination hosts, while the second chooses the destination to best suit the candidate.
The `better' alternative will be determined experimentally in chapter 9.
As with hbd-mig, all of the resource based process migration algorithms are periodic. They require global host state information, and as a consequence their activity follows the global state update. There is at most one migration in the system per global state update, and therefore both approaches must be considered as load levelling rather than load balancing. As migration is initiated after each global state update, any migration decisions are made with the most recently available data, and there is no need to estimate the load between updates.
Both migration approaches share the same eligibility test in the candidate selection policy, which is incidentally the same as for ll-mig:
where,
This algorithm consists of two major parts, the location selection and the candidate selection:
This is a sender initiated policy, where on receiving the global
state update, the machine which is most loaded identifies itself,
,
and also identifies the least loaded host,
. The metric is the
based on the norm:
¯
A tie is resolved by selecting the host with the lowest id. With the source and destination locations decided, the next step is to identify the candidate.
This algorithm is based on the process mix hypothesis, so there is no sense in selecting the largest process from the source host, and moving it to the destination without considering the effect on the process mix of the destination.
This being the case, the ideal candidate is the one that would decrease the norm of the source host by the maximum amount and increase the norm of the destination host by the minimum amount. These contrary demands can be compromised by selecting the candidate process p with the minimum:
Fairest worst fit is summarised as:
This differs from the previous algorithm in that the candidate process is selected before the destination host, and therefore there are no conflicting requirements to meet in the selection of the candidate process.
This is a sender initiated policy, where on receiving the global state update, the machine which is most loaded identifies itself, with the following metric based on the norm:
Any ties are again resolved by selecting the host with the lowest id. With the source location decided, the next step is to identify the candidate.
There is more flexibility in the selection of candidates for this algorithm, as there is no destination host yet specified to restrict the choice of candidates. Thus there are four different candidate ranking formulae presented:
¯
The destination selection policy must now locate the host which
is most suitable for the candidate
. This is done in exactly the
same manner as wf-ip:
Highest contributing worst fit is summarised as: