Learning mechanisms are used to make placement decisions, in which, depending on the success of the last placement, the decision is reinforced or degraded. The candidate and location selection policies are combined into a single operation, and the policies are therefore implicit.
A basic stochastic learning automaton is a set of permissible
actions
(i.e., place a job on machine X), each of which has an
associated probability. The probabilities associated with each action
are modified depending on the success of each placement decision (the
learning rule). If a poor decision is made, the probability
associated with that action is decreased (penalised) and if the
decision is good, then the probability is increased (rewarded).
Kunz [Kun91b] extends this basic approach by giving the automaton a set of states, each of which contains a complete set of permissible actions and associated probabilities. A mapping derived from the system status (load) is used to select the appropriate state from which action probabilities are drawn.
Figure
Figure 4.1: A single stochastic learning automaton.
The mapping in Kunz is based on the current status (load conditions)
of the system as a whole. The use of multiple states increases the
stability of the stochastic learning automaton as it allows different
load conditions to be represented. State 1 in figure
The probabilities of the stochastic learning automaton are adjusted
depending on the success of a job placement. An error function is a
means of determining the success of a placement, and a learning rule is
used to adjust the weights. Kunz investigated three error functions
and three learning rules for training the automaton. The most basic
error function is a binary measure of goodness
, where a placement
is considered `good' if the destination host was underloaded, and
`bad' if the destination host was overloaded. The other two error
functions return quantitative rather than qualitative results:
where cs
is the source and rs is the destination host.
It is obvious that the two quantitative error functions require different learning rules to control the learning with the extra error information, however the binary case is simpler and provides a suitable example for demonstrating this approach.
The basic learning rule is the linear reward-penalty scheme, where the following reward and penalty functions are used to modify the current action probabilities:
where
Kunz's learning automaton does not consider the job being scheduled. This is not a restriction of the stochastic learning automaton itself, as the job could be used as a factor in the mapping of states.
A later study by Schaerf et al. [SST95] into multiagent stochastic learning for load distribution is also of interest, and further extends the approach outlined above.
Koch et al. [KRK94] developed a load distribution scheme based on
a single adaptive linear element (Adaline
), illustrated in figure
resulting in an output y.
Figure 4.2: The adaptive linear element or Adaline.
The input vector x, is comprised of elements representing the current load on the system. The job and machine are not represented in the input vector, but are instead used to select the weight vector w used by the Adaline to compute the weighted sum y. Each job and machine pair is represented by it's own weight vector, a strategy similar to Kunz's additional states, to improve the stability of the Adaline.
On presentation of a job j, the Adaline is used to predict the expected execution time y for j on each machine. The machine with the shortest expected response time for j is selected and j is initiated there.
The Adaline is trained by measuring the difference
between the
expected execution time y, and the actual execution time d:
The
term (error) is then used to update the weights using
the standard Widrow and Hoff Least Mean Square (LMS) delta rule:
where