Learning mechanisms are used to make placement decisions, in which, depending on the success of the last placement, the decision is reinforced or degraded. The candidate and location selection policies are combined into a single operation, and the policies are therefore implicit.
A basic stochastic learning automaton is a set of permissible actions (i.e., place a job on machine X), each of which has an associated probability. The probabilities associated with each action are modified depending on the success of each placement decision (the learning rule). If a poor decision is made, the probability associated with that action is decreased (penalised) and if the decision is good, then the probability is increased (rewarded).
Kunz [Kun91b] extends this basic approach by giving the automaton a set of states, each of which contains a complete set of permissible actions and associated probabilities. A mapping derived from the system status (load) is used to select the appropriate state from which action probabilities are drawn.
Figure 4.1: A single stochastic learning automaton.
The mapping in Kunz is based on the current status (load conditions) of the system as a whole. The use of multiple states increases the stability of the stochastic learning automaton as it allows different load conditions to be represented. State 1 in figure
The probabilities of the stochastic learning automaton are adjusted depending on the success of a job placement. An error function is a means of determining the success of a placement, and a learning rule is used to adjust the weights. Kunz investigated three error functions and three learning rules for training the automaton. The most basic error function is a binary measure of goodness , where a placement is considered `good' if the destination host was underloaded, and `bad' if the destination host was overloaded. The other two error functions return quantitative rather than qualitative results:
where cs is the source and rs is the destination host.
It is obvious that the two quantitative error functions require different learning rules to control the learning with the extra error information, however the binary case is simpler and provides a suitable example for demonstrating this approach.
The basic learning rule is the linear reward-penalty scheme, where the following reward and penalty functions are used to modify the current action probabilities:
Kunz's learning automaton does not consider the job being scheduled. This is not a restriction of the stochastic learning automaton itself, as the job could be used as a factor in the mapping of states.
A later study by Schaerf et al. [SST95] into multiagent stochastic learning for load distribution is also of interest, and further extends the approach outlined above.
Koch et al. [KRK94] developed a load distribution scheme based on a single adaptive linear element (Adaline), illustrated in figure
resulting in an output y.
Figure 4.2: The adaptive linear element or Adaline.
The input vector x, is comprised of elements representing the current load on the system. The job and machine are not represented in the input vector, but are instead used to select the weight vector w used by the Adaline to compute the weighted sum y. Each job and machine pair is represented by it's own weight vector, a strategy similar to Kunz's additional states, to improve the stability of the Adaline.
On presentation of a job j, the Adaline is used to predict the expected execution time y for j on each machine. The machine with the shortest expected response time for j is selected and j is initiated there.
The Adaline is trained by measuring the difference between the expected execution time y, and the actual execution time d:
The term (error) is then used to update the weights using the standard Widrow and Hoff Least Mean Square (LMS) delta rule: