next up previous contents
Next: Implicit Load Distribution via Up: Related Work Previous: Probabilistic Models

Explicit Load Distribution

 

The following sections all detail different approaches to using some form of prediction to improve the quality of load distribution. Workload classification is a common research topic [CF86, BB94, Raa93], however there has been less research on how it can be applied to load distribution, two examples are discussed in sections 4.2.2 and 4.2.3.

Average

 

Prediction need not be complex, and in some situations, quite simple schemes can prove useful. If all processes in a system are homogeneous, or at least similar, then a static constant can be used to estimate the resource requirements. The simplest dynamic solution is persistence, where the resources required by a process during the most recent execution are predicted for the next execution. A running average is another simple approach for providing predictions, and can be combined with persistence to favour the values from more recent executions.

The use of averages is a common technique, examples include Bond [Bon93] who used averages of CPU, IO and memory in a prototype of the Simple Task Allocation using Resource Selection (STARS) system, and Osser [Oss92] who used average lifetime to exclude short jobs from remote execution.

Load Distribution without Host State

 

Goswami et al. [GDI93] used the state transition prediction mechanism developed by Devarakonda and Iyer [DI89] to provide estimates of process resource use. This estimate is then used to compute the host loads for placement.

State-Transition Model

The prediction system uses states and state transition probabilities to estimate the future resource requirements of a process.

A k-meansgif clustering algorithm was used to identify seven high density clusters in a 3 dimensional space (CPU, memory and IO). The centroid of each cluster was defined as a state representing process resource usage. The states in general agree with the categories of process found in Leland and Ott [LO86], with CPU bound, IO bound and neutral categories. In particular, an approximate total of 20% of all process instances were found to fall into the CPU bound and IO bound categories (roughly 10% each) and an additional category of heavy memory use was identified that consisted of around tex2html_wrap_inline5755 % of all instances. The processes falling into the remaining four categories were all light resource consumers.

With the observation that a process may change the state it occupies between separate executions, transition probabilities are defined between states for each program, and the last N program executions are used to estimate the state transition probabilities tPr:

displaymath5737

Prediction Scheme

Given the state-transition model for a program, its future resource usage is predicted by:

The program's resource requirements r, are computed by multiplying each of the transition probabilities tex2html_wrap_inline5769 , from the state occupied during the process's last execution l, with the centroid of each subcluster tex2html_wrap_inline5773 :

displaymath5761

where N is the number of clusters (seven).

Load Metric and Communication

Goswami et al. argue that conventional load distribution systems require too great an overhead if the load information is communicated frequently, and note that performance of the load distribution falls as the frequency of load updates is reduced. They propose that proper initial placement of processes based on predicted resource requirements will produce better response times with lower communication overheads.

The essential idea is that the load on a host need not be collected, but may be computed from the resource requirements of scheduled processes. This unconventional approach means that the only updates required by the load distribution mechanism are notification of process termination.

The scheduler now takes on the responsibility of providing the load on each host, without measuring the state of the host. This is achieved by treating predictions as `perfect' and recording the estimated resource usage from each placement made to a host. These values are then used to compute the `load' on each host before another placement is made:

  equation563

where

The load is updated when a process starts or finishes. When a process terminates it is buffered, and the updates are periodically sent to the scheduler. The scheduler removes the terminated processes from the host load computation and then passes the termination information to the prediction system for further learning.

Policies

Goswami et al. developed four load distribution policies, two centralised and two distributed. The centralised policies are sufficient to explain the approach, and as discussion of the distributed policies will not contribute to the explanation, they are omitted.

The first placement policy was a simple scheme that used the computed CPU load to select the least loaded host. The second policy estimates the expected response time tex2html_wrap_inline5793 that a tex2html_wrap_inline5795 will receive at each tex2html_wrap_inline5797 , and then selects the most favourable:

displaymath5791

where tex2html_wrap_inline5781 is the number of processes on tex2html_wrap_inline5797 .

This equation sums the CPU requirements of all processes on tex2html_wrap_inline5779 that are smaller than the new process, and for each process larger, the new process's CPU requirement is summed. The estimates the response time experienced with a round robin scheduling algorithm.

This is primarily to avoid problems with round robin scheduling and too long a quantum.

Summary

 

Goswami et al. use a prediction system for providing a form of a priori information about processes before execution. The information is used to compute the load on the hosts in the system, rather than obtaining this information from the hosts themselves. No attempt is made to perform any kind of candidate selection, other than a simple age filter in one of the distributed algorithms.

Due to the CPU load computation from equation 4.1, the load estimates will be exaggerated for interactive processes or processes waiting on timed events. This could reduce the load distribution to random, as no checking is performed to ensure that the load estimates are representative of the actual system load. The load computation method also requires that the load distribution is pervasive, as load that is not under direct scheduler control will invalidate the estimates.

Weighted Resource Allocation

 

Bond [Bon93] also employs classification in STARS to predict the requirements of a job prior to execution, but exploits this information to select the best site for executing the process in a system of heterogeneousgif hosts.

Prediction Scheme

The prediction scheme developed for STARS, classifies tasks incrementally and seeks to overcome shortcomings evident in the prediction mechanism from Devarakonda and Iyer [DI89]. In particular, Bond claims that additions to the state transition model require regeneration of the entire model, and this alone makes it unsuitable for incremental updating. In addition, Bond suggests that prediction can be improved by considering additional predictors, such as group and time rather than simply relying on program name.

Load Distribution

STARS begins by forming a set of compatible hosts for executing a new job. This set of eligible hosts is then ranked by capability, using a set of weights that reflect the relative importance of resources to the particular job. A typical weight vector for a CPU bound program may consist of:

displaymath5805

which gives tex2html_wrap_inline5807 weighting to an idle CPU, tex2html_wrap_inline5807 weighting to a powerful processor and tex2html_wrap_inline5811 weighting to a minimum load. These weights are then applied to the resource availability of each eligible host, and the highest value indicates the most suitable host for execution of the job.

STARS and the prediction system were not completely integrated, and the system tested in Bond's thesis used moving averages to tailor the resource weights for each job.

Tracing

 

In a novel approach, Ju et al. [JXY95] trace a process through the first second of its life, to identify processes suitable for remote execution. If the process is suitable, being non interactive and non IO bound, then the process is terminated and restarted on an appropriate host.

Predicting Future Behaviour

Processes are only traced for one second from the start of execution. Therefore Ju et al. require the assumption that a process will exhibit the same behaviour over its entire lifetime as it did in that first second.

The data collected during the trace period is intended to determine if the process is suitable for remote execution. Ju et al. collect the total memory requirement, the length of all opened files, the total CPU time and the amount of data read and written to memory, and to files. These values are used to estimate the CPU and IO utilisation of the process, and the speed of memory and file access.

The expected lifetime tex2html_wrap_inline5813 , of a process, is estimated from the time required to access all memory pages allocated to the process and to read all opened files:

  equation595

If the program has been executed before, then the values obtained during previous executions are available to adjust the lifetime estimate from equation 4.2.

Load Distribution

An eligible candidate is preferably CPU bound, has an expected lifetime tex2html_wrap_inline5813 of at least ten seconds and is non interactive.

The process is considered interactive if any of the open files are stdingif or stdout. The recorded CPU and IO utilisation is used to classify the process as CPU ( tex2html_wrap_inline5817 utilisation) bound, IO ( tex2html_wrap_inline5819 utilisation) bound or neutral.

If the candidate is eligible, it is terminated and restarted on the least loaded host.

Best Fit Resource Allocation

 

Cena et al. [CCG95] take a priori information as given, and concentrate on the actual distribution of load to hosts. Although a neural network is used, load distribution is explicit, as the a priori information is provided to rather than by the neural network.

Load Distribution via Best Fit

This location selection policy is designed for a heterogeneousgif distributed system, and as a consequence, focuses on the capability of a particular host to supply resources required by a job.

The prototype system divides available system resources into available memory and available CPU (normalised). Processes are characterised by the required amount of memory and the desired response time. A conventional best fit algorithm, as described in section 8.3.2, is used to allocate processes to the available resources in the following order:

  1. Determine the set of hosts that can satisfy the memory request.
  2. Select the host that leaves the minimum unallocated CPU while still satisfying the request.

If the memory request can not be satisfied, then the process can not be executed. If however there is insufficient CPU, the request is satisfied with the best available (as the response time is desired, not mandatory).

Neural Network Implementation

There is no requirement for the fitting to be implemented by a neural network, indeed, the training set was computed conventionally. However a stated goal of LAHNOS (Local Area Heterogeneous Operating System) for which the scheduler was designed, is to increase the `automation' of the system and therefore a brief description of the implementation is included for completeness.

The distribution algorithm was implemented by training a feedforward neural network using the Back Propagation (BP) learning rule [HN90].

The input vector encodes the requirements of the job and the current availability of the system (restricted to three machines in the prototype system). The output is a single value representing the selected host.


next up previous contents
Next: Implicit Load Distribution via Up: Related Work Previous: Probabilistic Models

Kris Bubendorfer
Fri Nov 1 11:26:21 NZDT 1996