Coevolving the "Ideal" Trainer
Description of the Problem
Onedimensional Cellular Automata
A onedimensional cellular automaton (CA) is a linear wraparound array composed of N cells
in which each cell can take one out of k possible states.
A rule is defined for each cell in order to update its state.
This rule determines the next state of a cell given its current state and the state
of cells in a predefined neighborhood.
For the model discussed in this proposal, this neighborhood is composed of cells whose distance
is at most r from the central cell.
This operation is performed synchronously for all the cells in the CA.
For the work presented in this proposal, the state of cells is binary (k = 2), N = 149 and
r = 3.
This means that the size of the rule space is 2^{2^{2 * r + 1}} = 2^{128}.
Cellular automata have been studied widely as they represent one of the simplest systems in
which complex emergent behaviors can be observed.
This model is very attractive as a means to study complex systems in nature.
Indeed, the evolution of such systems is ruled by simple, locallyinteracting components
which result in the emergence of global, coordinated activity.
The Majority Function
The task consists in discovering a rule for the onedimensional CA
which implements the majority function as accurately as possible.
This is a density classification task, for which one wants the state of the cells of the CA
to relax to all 0's or 1's depending on the density of the initial configuration (IC)
of the CA, within a maximum of M time steps.
Following [Mitchell et al., 1994], $\rho_c$ denotes the threshold for the classification
task (here, $\rho_c = 1/2$), $\rho$ denotes the density of 1's in a configuration and
$\rho_0$ denotes the density of 1's in the initial configuration.
Figure 1 presents two examples of the spacetime evolution of a CA
with $\rho_0 < \rho_c$ on the left and $\rho_0 > \rho_c$ on the right.
The initial configuration is at the top of the diagram and the evolution in time of
the different configurations is represented downward.
Figure 1: Two spacetime diagrams describing the evolution of CA states.
The task $\rho_c = 1/2$ is known to be difficult.
In particular, it has been proven that no rule exists that will result in the CA relaxing
to the correct state for all possible ICs [Land & Belew, 1995].
Indeed, the density is a global property of the initial configuration while individual
cells of the CA have access to local information only.
Discovering a rule that will display the appropriate computation by the CA with the highest
accuracy is a challenge, and the upper limit for this accuracy is still unknown.
Table 1 describes the performance for that task for different
published rules and different values of N along with a new best rule discovered
using the coevolutionary approach presented in this document.
table 1: Performance of different published CA rules and the new best rule
discovered by coevolution for the majority function.
N 
149 
599 
999 
Coevolution 
0.863 +/ 0.001 
0.822 +/ 0.001 
0.804 +/ 0.001 
Das rule 
0.823 +/ 0.001 
0.778 +/ 0.001 
0.764 +/ 0.001 
ABK rule 
0.824 +/ 0.001 
0.764 +/ 0.001 
0.730 +/ 0.001 
GKL rule 
0.815 +/ 0.001 
0.773 +/ 0.001 
0.759 +/ 0.001 
The GacsKurdyumovLevin (GKL) rule was designed in 1978 for a different goal than
the $\rho_c = 1/2$ task [Mitchell et al., 1994].
However, for a while it provided the best known performance.
[Mitchell et al., 1994] and [Das et al., 1994] used Genetic Algorithms (GAs) to explore
the space of rules.
This work resulted in an analysis of some of the complex behaviors exhibited by CAs
using ``particles''.
The GKL and Das rules are humanwritten while the AndreBennettKoza (ABK) rule has been discovered
using the Genetic Programming paradigm [Andre et al., 1996].
Learning in a Fixed Environment: Evolution

Canonical model of evolution
The traditional model for evolutionary approaches to problem solving is to design
a representation for solutions and a fitness function.
The absolute performance of individuals is evaluated with respect to that fitness
function and search operators are used to explore the state space.
As an example, we implemented such an evolutionary approach for the discovery of
CA rules for the $\rho_c = 1/2$ task.
This implementation is similar to the one described in [Mitchell et al., 1994].
Each rule is coded on a binary string of length $2^{2 * r + 1} = 128$.
Onepoint crossover is used with a 2%$ bit mutation probability.
The population size is n_R = 200 for rules and n_{IC} = 200 for ICs.
The population of ICs is composed of binary strings of length N = 149.
This population is fixed and it is initialized according to a uniform distribution
over $\rho_0 \in [0.0, 1.0]$.
The population of rules is also initialized according to a uniform distribution
over $[0.0, 1.0]$ for the density.
A natural definition for the fitness of rules is the number of Initial Configurations (ICs)
for which the CA relaxes to the correct state:
where:
Figure 2 describes the evolution of the distribution of the density of rules over time.
A common behavior resulting from such an evolutionary setup is that
the entire population focuses quickly to a small domain
of the search space in the neighborhood of the current best individual.
Eventually, some progress is observed over time.
The variance for the final performance over several runs is usually large.
Figure 2: Evolution of the distribution of the density of rules over time.

Maintaining diversity by introducing resource sharing
Several techniques have been designed to improve search in a fixed landscape.
Usually those techniques maintain diversity in the population in order to
avoid premature convergence.
[Mahfoud, 1995] presents different niching techniques that are
useful in particular for the exploration of multimodal landscape.
A technique that we successfully exploited in the past to maintain diversity
is called resource sharing [Juillé & Pollack, 1996].
Resource sharing implements a coveragebased heuristic by
giving a higher payoff to problems that few individuals can solve.
One way to implement this technique for the evolution of CA rules is to define
the fitness of rules as follows:
where:
In this definition, the weight of an IC corresponds to the payoff it returns
if a rule covers it.
If few rules cover an IC, this weight will be much larger than if a lot
of rules cover that same IC.
Figure 3 shows the evolution of the distribution of rules density
over several generations.
It can be observed that diversity is maintained for about 500 generations during
which different alternatives are explored simultaneously by the population
of rules.
Then, a solution is discovered that moves the search in the area where density is close
to 0.5.
Afterward, several niches might still be explored, however the coding used to construct
the figures doesn't allow the representation of those different niches since all
the rules have a similar density.
Usually, this technique results in better performance on average.
However, it takes also more time to converge.
This is a tradeoff between speed and quality of solutions.
Figure 3: Evolution of the distribution of the density of rules over time.
Learning in an Adapting Environment: Coevolution
The idea of using coevolution in search was introduced by [Hillis, 1992].
In his work, a population of parasites coevolves with a population of sorters.
The goal of parasites is to exploit weaknesses of sorters by discovering input vectors
that sorters cannot solve correctly while sorters evolve strategies to become
perfect sorting networks.
In coevolution, individuals are evaluated with respect to
other individuals instead of a fixed environment (or landscape).
As a result, agents adapt in response to other agents' behavior.
For the work presented in this document, I will define coevolutionary learning
as a learning procedure involving the coevolution of two populations:
a population of learners and a population of problems.
Moreover, the fitness of an individual in a population is defined only
with respect to members of the other population.
Two cases can be considered in such a framework, depending on whether the two populations
benefit from each other or whether they have different interests (i.e., if they are in conflict).
Those two modes of interaction are called cooperative and
competitive respectively.
In the following paragraphs, those modes of interaction are described experimentally
in order to stress the different issues related to coevolutionary learning.

Cooperation between populations
In this mode of interaction, improvement on one side results in positive feedback
on the other side.
As a result, there is a reinforcement of the relationship between the two populations.
From a search point of view, this can be seen as an exploitative strategy.
Agents are not encouraged to explore new areas of the search space but only to
perform local search in order to further improve the strength of the relationship.
Following a natural extension of the evolutionary case to the cooperative model,
the fitness of rules and ICs can be defined as follows for the $\rho_c = 1/2$ task:
For our experiments, the same setup as the one described previously is used.
The population size is n_R = 200 for rules and n_{IC} = 200 for ICs.
The population of rules and ICs are initialized according to a uniform distribution
over $[0.0, 1.0]$ for the density.
At each generation, the top 95% of each population reproduces to the next generation and
the remaining 5% is the result of crossover between parents from the top 95%
selected using a fitness proportionate rule.
Figure 4 presents the evolution of the density
of rules and ICs for one run using this cooperative model.
Without any surprise, the population of rules and ICs quickly converge to a domain
of the search space where ICs are easy for rules and rules consistently solve ICs.
There is little exploration of the search space.
Figure 4:
Evolution of the distribution of the density of rules (left) and the density of ICs (right) over time.

Competition between populations
In this mode of interaction, the two populations are in conflict.
Improvement on one side results in negative feedback for the other population.
Two outcomes are possible:
 Unstable behavior: alternatively, agents in one population
outperform members of the other population until those discover
a strategy to defeat the first one and vice versa.
This results in an unstable behavior where species appear
then disappears and reappears again in a later generation.
 a Stable state is reached: this happens if one population
consistently outperforms the other.
For instance, a very strong strategy is discovered by agents
in one population.
For the $\rho_c = 1/2$ task, the fitness defined in the cooperative case
can be modified as follows to implement the competitive model:
Here, the goal of the population of rules is to discover rules that defeat
ICs in the other population by allowing the CA to relax to the correct state,
while the goal of ICs is to defeat the population of rules by discovering initial
configurations that are difficult to classify.
Figure 5 describes an example of a run using this definition of the fitness.
In a first stage, the two populations exhibit a cyclic behavior.
Rules with low density have an advantage because there are more ICs with low density.
In response to this, ICs with high density have an evolutionary advantage and
have more representatives in the population.
In turn, rules with high density get an evolutionary advantage\ldots
This unstable behavior exhibited by the two populations is an instance
of the Red Queen effect [Cliff & Miller, 1995]: fitness landscapes are changing
dynamically as a result of agents from each population adapting in response
to the evolution of members of the other population.
The performance of individuals is evaluated in a changing environment,
making continuous progress difficult.
A typical consequence is that agents have to learn again what they already knew in the past.
In the context of evolutionary search, this means that domains of the state space
that have already been explored in the past are searched again.
Then, a stable state is reached: in this case, the population of rules adapts
faster than the population of ICs, resulting in a population focusing only
on rules with high density and eliminating all instances of low density rules
(a finite population is considered).
Then, low density ICs exploit those rules and overcome the entire
population.
Figure 5:
Evolution of the distribution of the density of rules (left) and the density of ICs (right) over time.

Resource sharing and mediocre metastable states
We have seen that by introducing resource sharing, it is possible to explore
simultaneously different alternatives by maintaining several niches in the
population.
This section presents some experiments where resource sharing is introduced
in the competitive mode of interaction presented in the previous section.
The fitness of rules and ICs is then defined as follows:
where:
and:
where:
This framework allows the presence of multiple niches in both populations,
each niche corresponding to a particular competence relevant to defeat some
members of the other population.
Figure 6 describes one run for this definition of the interaction between the two populations.
It can be seen that the unstable behavior which was observed in the previous section
doesn't occur anymore and that two species coexist in the population of rules:
a species for low density rules and another one for high density rules.
Those two species drive the evolution of ICs towards the domain of initial configurations
that are most difficult to classify (i.e., $\rho_0 = 1/2$).
However, the two populations have entered a mediocre metastable state.
This means that multiple average performance niches coexist in both populations
in a stable manner.
Figure 6:
Evolution of the distribution of the density of rules (left) and the density of ICs (right) over time.
Coevolving the "Ideal" Trainer
The central idea of the coevolutionary learning approach consists in
exposing learners to problems that are just beyond those they know how
to solve.
By maintaining this constant pressure towards slightly more difficult problems,
a arms race among learners is induced such that learners that adapt better
have an evolutionary advantage.
The underlying heuristic implemented by this arms race is that {\em adaptability} is
the driving force for improvement.
In order to achieve this result, our system tries to satisfy the following
two goals:
 to provide an ``optimum'' gradient for search.
This means that the training environment defined by the population of problems
can determine reliably which learners are the most promising at each stage
of the search process.
This means that problems must be informative.
If problems are too difficult or too easy, learners get little feedback and
there is no gradient to determine in which direction the search should
focus.
 to allow continuous progress.
The goal is to avoid the Red Queen effect by providing a training environment
which continues to test learners about problems they solved in the past.
In a sense, the training environment must also play the role of memory.
The difficulty resides in the accurate implementation of those concepts
in a search algorithm.
Experimental Results
It is believed, that ICs become more and more difficult to
classify correctly as their density gets closer to the $\rho_c$ threshold.
Therefore, our idea is to construct a framework that adapts the distribution
of the density for the population of ICs as CArules are getting better to solve
the task.
The following definition for the fitness of rules and ICs has been used
to achieve this goal:
where:
and:
where:
This definition implements the competitive relationship with resource
sharing.
However, a new component, namely $E(R_i, \rho(IC_j))$, has been added
in the definition of the ICs' fitness.
The purpose of this new component is to give a low payoff to IC_j if
ICs with density $\rho(IC_j)$ return little information with respect to
the rule R_i.
We consider that information is collected if rule R_i does better
or worse than random guessing over ICs with density $\rho(IC_j)$
(that is, R_i covers strictly more or strictly less than 50% of those ICs).
Following this idea, we defined E() as the complement of the entropy of
the outcome between a rule and ICs with a given density:
where: p is the probability that an IC with density $\rho(IC_j)$ defeats
the rule R_i and q = 1  p.
Figures 7 and 8 describe the evolution of the density of rules and ICs for two runs.
It can be seen that, as rules improve, the density of ICs is distributed on
two peaks on each side of $\rho = 1/2$.
In the case of figure 7, it is only after 1,300 generations
that a significant improvement is observed for rules.
It is only at that time that the population of ICs adapts dramatically
in order to propose more challenging initial configurations.
Figure 7:
Evolution of the distribution of the density of rules (left) and the density of ICs (right) over time.
Figure 8:
Evolution of the distribution of the density of rules (left) and the density of ICs (right) over time.
Download our papers about this work:
Coevolutionary Learning: a Case Study
(with J. B. Pollack),
Proceedings of the Fifteenth International Conference on Machine Learning (ICML98) ,
Madison, Wisconsin, USA, July 2426, 1998, to appear.
Coevolving the ``Ideal'' Trainer: Application to the Discovery of Cellular Automata Rules
(with J. B. Pollack),
Proceedings of the Third Annual Genetic Programming Conference (GP98) ,
Madison, Wisconsin, USA, July 2225, 1998, to appear.
References
[Andre et al., 1996] Andre, D., Bennet, F. H. & Koza, J. R. (1996). Evolution of intricate longdistance communication
signals in cellular automata using genetic programming. In Artificial Life V: Proceedings of the Fifth International
Workshop on the Synthesis and Simulation of Living Systems. Nara, Japan.
[Cliff & Miller, 1995] Cliff, D. & Miller, G. F. (1995). Tracking the red queen: Measurements of adaptive
progres in coevolutionary simulations. In the Proceedings of the Third European Conference on Artificial Life,
pp. 200218. SpringerVerlag. LNCS 929.
[Das et al., 1994] Das, R., Mitchell, M. & Crutchfield, J. P. (1994). A genetic algorithm discovers particlebased
computation in cellular automata. In Parallel Problem Solving from Nature  PPSN III, pp. 344353.
SpringerVerlag. LNCS 866.
[Hillis, 1992] Hillis, W. D. (1992). Coevolving parasites improves simulated evolution as an optimization procedure.
In Langton, C. et al. (Eds), Artificial Life II, pp. 313324. Addison Wesley.
[Juillé & Pollack, 1996] Juillé, H. & Pollack, J.B. (1996).
Coevolving intertwined spirals. In Proceedings of the Fifth Annual Conference on Evolutionary Programming,
pp. 461468. MIT Press.
[Land & Belew, 1995] Land, M. & Belew, R. K. (1995). No perfect twostate cellular automata for density
classification exists. Physical Review Letters, 74(25):15481550.
[Mahfoud, 1995] Mahfoud, S. W. (1995). Niching methods for genetic algorithms. Technical report,
University of Illinois at UrbanaChampaign. IlliGAL Report No. 95001.
[Mitchell et al., 1994] Mitchell, M., Crutchfield, J.P. & Hraber, P.T. (1994).
Evolving cellular automata to perform computations: Mechanisms and impediments.
Physica D, 75:361391.