Global-to-Local Design for Self-Organized Task Allocation in Swarms

,


Introduction
A primary challenge that complicates the spread of applications of large collections of embodied agents [1,2] is how to design individual agent controllers for a given desired collective behavior. The canonical, local-to-global approach [3] includes a trial and error refinement of individual agent control rules followed by a macroscopic analysis of resulting swarm behaviors [4,5] or a formal verification of specific properties of interest [6][7][8][9]. Designing agent controllers for a target swarm behavior in a noniterative way without continuous refinements has proven challenging. At present, only a few methods exist and they are tailored to specific scenarios (e.g., task allocation [10][11][12], formation control [13], selfassembly [14][15][16], and collective construction [17]). The challenge is to find a generic method to automate globalto-local programming [18].
To tackle this challenge, we propose a novel global-tolocal design approach. Our key idea is to compose a heterogeneous swarm [19][20][21][22] using groups of behaviorally different agents such that the resulting swarm behavior approximates a user input representing the desired behavior of the entire swarm. This idea is arguably related to the concept of population coding from neurosciences where certain cognitive phenomena (e.g., perception of directional movement) result from the average of different individual contributions from populations of neurons [23]. Analogously to population coding [24], we derive a global-to-local design method that performs a function approximation of the user input as a linear combination of basis vectors, each representing the global influence of a different agent controller.
We illustrate this idea by defining a prescriptive design method for self-organized task allocation [25][26][27]. Specifically, we tackle variants of the single-task robots, multirobot tasks problem (ST-MR) [28] with static, sequential, and periodic swarm allocations. This problem is generally known as the coalition formation problem and has been extensively studied in the multiagent community [29,30]. Standard multiagent approaches, however, require complex cooperation strategies with a priori negotiation or bidding for task assignments and unconstrained global communication. These approaches are not suitable for large-scale swarms of unreliable agents with high scalability requirements that need to avoid communication bottlenecks.
The swarm intelligence community developed a rich framework of algorithms for self-organized task allocation that are suitable for unreliable, embodied agents. Instead of forming a priori coalitions, self-organized multiagent systems achieve task allocation as a result of the continuous interaction among agents and between agents and the environment. Popular approaches include threshold-based algorithms [26,31] and recruitment strategies inspired by the foraging behavior of ant colonies [25,27]. More recently, Castillo-Cagigal et al. [32] investigated periodic binary task allocation and proposed a self-organized synchronization strategy whose macroscopic behavior results in a bimodal distribution of robots over two tasks-a scenario we consider here, too. All these studies, however, employ local-toglobal design approaches (i.e., bottom-up with challenges in anticipating global behaviors). Although the proposed strategies are often supported by descriptive macroscopic models [33], these studies do not provide a prescriptive design method as ours here.
A notable exception is the method proposed by Berman et al. [10] for the design of task allocation. In their work, the authors consider problems with more than two tasks. Each agent decides to switch tasks independently of other agents, and therefore, agents do not interact with each other. Their method, based on a linear continuous model, optimizes a set of transition rates between pairs of tasks that define a unique agent controller with the aim to converge as quickly as possible to a given swarm allocation. Differently from the probability distribution over swarm allocations considered here, they assume a single swarm allocation as input corresponding to the mode of our unimodal scenario (see Section 3.1). Due to the lack of interactions among agents, their method cannot achieve the nonlinear, oscillatory dynamics of the swarm that we describe in our multimodal scenarios (see Section 3.2). The approach of Berman et al. [10] has also been extended to incorporate feedback gathered from the environment by individual agents [11,12]. Agents keep track of the number of successfully completed tasks and report this information to a centralized authority (called the hive) that, in turn, updates the parameters of their stochastic control policy. Differently, our design approach provides a completely self-organized solution that does not require any centralized authority.
We consider a simple scenario where a user wants to design a swarm that allocates its members to only two tasks. Despite the simplicity of this task allocation problem, a few variations are possible. Let us say we have task 0 and task 1 and we want to design a swarm with 80% of agents working on task 0 and 20% working on task 1. In a trivial approach, we could statically assign agents to tasks before deployment. However, it is generally beneficial to require the swarm to allocate agents to tasks after deployment so as to increase robustness to individual agent failures. Agents have only local perception and cannot accurately estimate the number of agents currently assigned to either task.
Hence, the swarm behavior is inherently stochastic. Even for a good design of the agent controller, we can only hope to have 80% of the agents assigned to task 0 on average over time due to the variance introduced by each agent accuracy in assessing the current state of the swarm. A variant of this scenario arises when the user wants to define the variance of the swarm allocation (i.e., increasing it over the accuracylimited value), for example, to increase the swarm's potential for exploration over exploitation. Another variant is represented by sequential task allocation. For instance, we initially want the 80/20% allocation as above but followed in a later phase by a 30/70% allocation, possibly triggered by external factors. For example, in a surveillance task, a swarm may need to monitor the inside and outside of a facility allocating agents in different proportions during the day and night. Yet another variant is periodic task allocation. We allow the swarm to autonomously decide when and how often to switch from 80/20% to 30/70%.
In more formal terms, a swarm allocation corresponds to a partitioning of the agents into two working groups, one for each of the two tasks. The user provides a description of the desired swarm allocations as a probability distribution over the space of all possible swarm allocations. To define a specific swarm behavior, the user manipulates the number and positions of the distribution's modes (i.e., local probability maxima) with each mode corresponding to a target swarm allocation (e.g., as above with the two modes 80/20% and 30/70%). The user can specify a static task allocation scenario by means of a unimodal distribution. A sequential task allocation scenario is defined through a sequence of unimodal distributions and a criterion to trigger a switch in swarm allocation. Finally, a user can define a periodic task allocation scenario using a distribution with two or more modes. The swarm periodically and autonomously changes the allocation of agents as specified by the modes in a stochastic manner. While this approach may seem unnecessarily contrived for this simple case of binary task allocation, we believe it can be extended to more complex task allocation problems as well as other swarm problems that can be approached using probabilistic finite state machines (see Figure 1).
Our global-to-local design method hinges on three main ideas and assumptions: (a) nonprogrammable agents, (b) means of predicting global system behavior from local agent behavior, and (c) accepting and leveraging the probabilistic nature of swarm behavior. We give up the freedom of having programmable agents as we assume hard-wired behaviors of predefined controller types. However, we regain that freedom at the global level by composing heterogeneous swarms with wisely chosen doses of several agent controller types. For these predefined local agent controllers, we know their global swarm effect that we can model via the abovementioned basis vectors. By appreciating the probabilistic nature of swarms, we can model individual behaviors using probabilistic finite state machines (PFSMs), generalizing our approach to a wide range of scenarios representable by PFSM, and similarly also understand global swarm behavior via population models. We perceive the swarm as a stochastic dynamical system with the swarm making probabilistic 2 Intelligent Computing autonomous decisions switching between global states. We define an arbitrarily large number of agent controllers, that is, sets of predefined control rules. For each agent controller, we derive a basis vector that models its global-level contribution to the swarm dynamics. Specifically, each basis vector describes the transient dynamics of a homogeneous swarm where all agents run the same controller. The probability distribution over swarm allocations given by the user as input defines the desired asymptotic behavior of the swarm. From this input, we mathematically derive a response vector to describe the desired transient dynamics of the swarm. These transient dynamics are such that the swarm will asymptotically converge to the stationary distribution in input from the user. We then use the response vector as a reference to select the necessary agent controllers through a linear combination of our initial set of basis vectors. Finally, we systematically search for a proper composition of a heterogeneous swarm by estimating the coefficients in a lasso regression [34] between the response vector and a linear combination of basis vectors. We use penalized regression to limit the set of selected controller types to a few that are indeed required (i.e., basis vector with strictly positive coefficients) and the value of the coefficients to define the proportions of agents executing each of the selected controllers.

Experimental Design.
We build on the idea of behavioral heterogeneity to define a global-to-local design method for ST-MR task allocation problems. We consider the problem of designing a swarm of N agents that allocates its members to a pair of tasks (task 0 and task 1) as defined by a user input. The user input, formally defined in Section 2.2, prescribes a desired swarm allocation by means of a stationary probability distribution π defined over the space of all possible allocations of N agents to 2 tasks. We leverage the degrees of freedom that can be gained at the global level by mixing different agent controllers at the local level. Contrary to local-to-global approaches that manually explore a possibly infinite space of design solutions to obtain a single agent controller for an homogeneous swarm, we restrict our design-space to a large but finite number of alternatives and systematize our search to obtain a combinatorial solution. To do so, we consider a set of parameterized control rules as the basis of the individual agent controller and obtain a large albeit finite number of different controllers by varying these parameters (see Section 2.3). The outcome of this search-called swarm composition-is a heterogeneous swarm formed of groups of agents with different controllers.
Þg represent a swarm composition with m ≪ N agent controllers. hG i ; b i i represents the ith agent controller whereas c i gives the number of agents in the swarm with that controller. For each agent controller, parameters G i and b i define, respectively, the number of task allocations that an agent needs to observe from other agents before applying a decision rule and the enumeration index identifying the specific set of decision rules associated to the ith controller (see Section 2.3). We consider the problem of finding a swarm composition C that approximates the stationary distribution π defined by the user (see Section 2.2). We tackle this problem with a prescriptive model-driven approach by defining a macroscopic model that, given the local agent controllers, describes both the transient and the stationary behavior of the swarm, and a method to derive, from the desired stationary behavior π, a (reference) model of a transient swarm behavior that converges to π. The first model, introduced in Section 2.4, provides a space E of basis vectors, each describing the behavior of an homogeneous swarm executing a specific agent controller. The second model, introduced in Section 2.5, gives a response vector y that describes the behavior of the heterogeneous swarm as required by the user (i.e., the behavior of the target swarm composition). Finally, we can formulate a linear combination y = Eβ of basis vectors. As described in Section 2.6, the key is to find appropriate coefficients β to select basis vectors. They represent the desired swarm composition C via c i ∝ β i .

User Input.
Let ðX, N − XÞ represent a swarm allocation, where X ∈ X is the number of agents allocated to task 0 (respectively, N − X to task 1), and X = f0, 1, ⋯, Ng is the set of all possible macroscopic states of the swarm (i.e., all possible distributions of agents over the two tasks). The user inputs a desired stationary probability distribution π = ðπ 0 , ⋯, π N Þ, π i > 0, over the macroscopic state space. Entries π i , i ∈ X, give the probability that the swarm allocation is ði, N − iÞ; each mode of π (i.e., local probability maxima) defines a desired swarm allocation ðX, N − XÞ by virtue of representing those allocations that are most likely to realize at any given time. The number of modes of the user input determines the particular variant of the task allocation scenario. A distribution π with one unique mode corresponds to a single and static swarm allocation; given a sequence π 1 , π 2 , ⋯ of such distributions, we can design swarms for sequential task allocation by including a triggering criterium for agents to change their control rules. When the user input π is a multimodal distribution, the swarm behavior requested by the user is periodic task allocation. With a certain frequency, the swarm changes the allocation of its members according to the different swarm allocations determined by the modes of π. 3 Intelligent Computing 2.3. Agent Controllers. We consider agents with only local perception of their environment and local agent-to-agent communication. By building on these limited capabilities, we define a recipe to enumerate finitely many different agent controllers. We achieve this by considering a template of a control rule that can be instantiated with different configurations and that allows us to enumerate different agent controllers. We abstract from any domain-specific actions that an agent would need to execute in a particular task and application. Instead, we focus on the agent interactions and the decision-making necessary to fulfill the swarm allocations desired by the user. We consider a system where tasks are uniformly distributed in a closed environment in which agents (a) move randomly while working on either of the two tasks, (b) have the ability to perceive the allocation of their neighbors, and (c) to trigger a change in their allocation (one at a time).
Agents act stochastically and asynchronously. They repeatedly apply two control rules: self-switching and switch-or-recruit. When executing the self-switching rule, an agent changes its current allocation to the alternative task. The global effect of the self-switching rule can be assimilated to the spontaneous switching behavior of unreliable agents subject to internal noise [35,36]. Using the switchor-recruit rule, an agent has a greater influence on the current swarm allocation. It can decide to either increase or decrease the number of agents allocated to a task by one unit. As a function of its current allocation and those of its neighbors, the agent either self-switches to the other task or recruits a neighbor from those with the alternative allocation. When the agent acts as a recruiter, the recruited neighbor always switches its task allocation and it does so independently of its internal state and of its actual agent controller. That is, passively recruited agents always switch their task allocation without objections. Control rules are executed randomly by individual agents: self-switching with rate σ and switch-or-recruit with rate ρ (respectively, with probabilities p σ = σ/ðσ + ρÞ and p ρ = ρ/ðσ + ρÞ).
All agents only have local perception of the current global task allocation. We say that each agent perceives the currently assigned tasks from its neighbors (i.e., agents within proximity, for example, within communication range). Note that all agents move at all time, and neighborhoods are subject to change, that is, the underlying network is dynamic. Each agent knows a number N 0 of neighbors currently assigned to task 0, a number N 1 = G − 1 − N 0 of neighbors currently assigned to task 1, and its own currently assigned task forming a set of information from G ≪ N agents (G − 1 neighbors plus the considered agent). We define agent controllers hG ; bi, b ∈ f1, ⋯, 2 G−1 g, that differ from each other by the logical function Δ G,b . We use function Δ G,b to define the local task-switching behavior of an agent and to determine the global effect of the switch-orrecruit rule. Function Δ G,b takes as input a group of task allocations of size G. This group includes the task allocation of the agent applying the switch-or-recruit rule and that of its G − 1 neighbor agents. Parameter b is an index that encodes a particular task-switching behavior and ranges over all possible agent controllers based on the same group size G. For a group of task allocations of size G, we have G + 1 possible group compositions. We do not assign any action to homogeneous groups (i.e., groups with either 0 or G agents allocated to task 0). Therefore, Δ G,b has G + 1 possible inputs and three possible outputs (i.e., switch allocation, recruit a neighbor, no action). Moreover, since the no-action is fixed in each agent controller, we obtain 2 G−1 possible functions . Δ i gives the change of agents allocated to task 0 when an agent applies the switch-or-recruit rule over a group of allocations that contains i ∈ f1, ⋯, G − 1g entries for task 0. Given a particular choice of values for parameters G and b, we set Δ i = +1 if the ith bit of b (expressed in the binary numeral system) equals 0; otherwise, we set Δ i = −1. Table 1 shows an example of an agent controller defined by Δ 3,1 = ð+1,−1Þ; in this case, the switch-or-recruit rule corresponds to the majority rule often used in swarm robotics research [37,38]. By enumerating all Δ G,b for increasing values of G, we obtain an arbitrary large set B = fhG 1 ; 1i, Note three properties that result from the above defined controllers. (a) Agent controllers are independent of the controllers of neighboring agents and only require to know their task allocations. (b) The interplay between selfswitching and direct recruitment eases the mixing of task allocations among agents with different controllers. (c) Due to self-switching, the resulting decision-making process is ergodic which prevents its absorption at extreme states where all agents are allocated to one of the two tasks [39]. These properties are fundamental for the definition of our global-to-local design method because they allow us to predict the global behavior of a heterogeneous swarm.
We have developed a simple microscopic multiagent simulator to validate our design method. In our simulations, agents consist of situated mass-less points moving within an environment of 100 × 100 space units with a velocity of 2 space units per time step. We consider a time step of 0:1 seconds, and, at every time step, we update the position in space of each agent in the swarm. Since agents are asynchronous, we update their task allocation only when they execute a control rule. Agents are always assumed to work on one of the two tasks and to periodically execute their agent controller. Independently of the controller hG ; bi, agents perform a random walk. They do not collide with each other but can collide with the boundaries of the environment. In the case of a collision with a boundary, the agent bounces back with a mirrored angle of incidence. When executing the switchor-recruit rule, agents note their own task allocation and sample the task allocations of their G − 1 closest neighbors. In the following, we show the average of 10 4 simulations each lasting 10 5 seconds.

Basis Vectors.
For each agent controller, we use a discretetime Markov chain fXðtÞ ∈ X : ∀t ⩾ 0g to describe the global dynamics of a homogeneous swarm of N agents executing the same controller. In the derivation of the Markov chain, we assume for simplicity that at each time step, one agent in the 4 Intelligent Computing swarm executes a control rule. Note that this assumption does not lead to a slowdown of the allocation dynamics nor to any other loss in performance. Since agents act in real time-which is continuous-and are not synchronized, we have that the probability of two or more concurrent executions of control rules by different agents is zero (as a consequence of the time step of 0:1 seconds set in our multiagent simulations and of the choice of values for parameters σ and ρ, we rarely observed concurrent executions of control rules by two or more agents). The discretization of the time in terms of the number of control rule executions allows us to simplify our mathematical derivations without introducing approximations. In Section 3.2, we provide means to recover the time as a continuous entity from the number of control rule executions. A direct consequence of this assumption is that the number X of agents in the swarm allocated to task 0 changes by Δ ∈ f+1, 0,−1g units per time step. Therefore, the resulting Markov process is described by a tridiagonal matrix P G,b of size ðN + 1Þ × ðN + 1Þ. For a given agent controller hG ; bi with function Δ G,b = ðΔ i , ⋯, Δ G−1 Þ, the transition matrix P G,b is defined as Probability P G,b ðX, X + 1Þ models the transition X ⟶ X + 1 of winning one more agent being assigned to task 0. It is the sum of two contributions: the probability that an agent currently allocated to task 0 self-switches its allocation to task 1; and the probability that any agent increases the value of X by applying the switch-or-recruit rule with a change Δ k = +1. Occurrences of certain group compositions (i.e., assumed current task allocations of a considered agent and its current neighbors forming a set of size G) are modeled using the hypergeometric distribution. Equation (2) models the transition X ⟶ X without effect. P G,b ðX, XÞ results from those agents that do not execute any action as a result of the application of the switch-or-recruit rule over a homogeneous group with either 0 or G allocations for task 0. Equation (3) is derived similarly to Equation (1).
Equations (1)-(3) define an ergodic Markov chain P G,b . Based on P G,b , we derive a basis vector e G,b that gives the expected change e G,b ðXÞ of the swarm allocation X resulting from the next agent executing a control rule. We obtain Given a system state X, the function e G,b ðXÞ returns the expected change 1/T∑ T⟶∞ t=0 Xðt + 1Þ − XðtÞ of X. We obtain an arbitrary large space E by considering all basis vectors e G,b , b ∈ f1, ⋯, 2 G−1 g, for groups of increasing sizes G. For intervals on X with e G,b ðXÞ > 0, the transient behavior of the swarm drives the task allocation process in the direction of the extreme allocation to the right, towards X = N. For intervals on X with e G,b ðXÞ < 0, there is a push to the left, towards X = 0. Zeros e G,b ðXÞ = 0 would indicate stable and unstable fixed points in a deterministic system but need to be interpreted here as random dynamical attractors [40] due to the system's stochasticity. These points represent task allocations X that either attract or repel the swarm allocation process. Attraction points identify the modes of the stationary distribution π G,b of P G,b . Basis vectors are pairwise symmetric with each other around p σ ð1 − X/NÞ, X ∈ X; that is, for each b, there exists b ′ such that e G,b = p σ ð1 − X/NÞ − e G,b ′ . Therefore, an equal number of agents with controllers hG ; bi and hG ; b ′ i cancel each other's effect of the switchor-recruit rule and leave only the contribution of the selfswitching rule.

Response Vector.
In our global-to-local design method, we obtain the response vector y, which represents the expected change of the user-desired swarm, from the stationary distribution π (see Section 2.2). We first construct a Markov chain P y that converges to π itself and then computes y from P y with Equation (4).
The stationary distribution π of an ergodic Markov chain with transition matrix P can be uniquely determined by solving the system of equations πP = π [41]. The inverse problem, however, is less trivial, and its solution is in general Table 1: Example of logical function Δ G,b with G = 3 and b = 1. Symbol a gives the current task allocation of the focal agent, N 0 is the number of neighbors allocated to task 0, and ðΔ 1 = +1, Δ 2 = −1Þ define the outcome of Δ G,b (in this case a majority rule). Symbol "-" represents no action.
Computing not unique. In the case of our tridiagonal transition matrix, this problem implies the exploration of a manifold fPg characterized by 2N dimensions. This number of dimensions is due to the sparse structure of tridiagonal matrices and to the fact that transition matrices are row-stochastic (i.e., row entries are nonnegative and sum up to 1). As a consequence, in order to construct our response vector y, we need to find a set of 2N additional constraints.
The stationary distribution π defined by the user imposes a set of N + 1 linear constraints on this manifold through equation The intuitive interpretation is that the probability π i of state i has to be the sum of all influxes from any state j to i (including i = j). Due to the linear relation ∑ i∈X π i = 1, one of these constraints is redundant and the stationary distribution π reduces the number of dimensions of fPg from 2N to N. Therefore, a general transition matrix P y that converges to π can be parameterized by N constant values referred to as ψ = ðψ 1 , ⋯, ψ N Þ. By constraining the transition matrix P y to be row-stochastic, we obtain the set of inequalities Any choice of values for parameters ψ = ðψ 0 , ⋯, ψ n−2 Þ that satisfies the above set of inequalities defines a transition matrix P y that satisfies πP y = π. Since probabilities π i , i ∈ X, are nonnegative by definition, all entries in the parameter vector ψ can always be chosen to be sufficiently small to satisfy the set of inequalities in Equation (6). Using Equation (6), we have obtained N of 2N constraints necessary to determine a transition matrix P y that asymptotically converges to π.
In order to uniquely determine a transition matrix P y , we still require N additional constraints. By inspecting Equation (2), we see that all agent controllers hG ; bi, b ∈ f1, ⋯, 2 G−1 g, have equal diagonal entries P G,b ðX, XÞ. Furthermore, the probabilities P G,b ðX, XÞ converge for increasing group sizes G as indicated by example group sizes G ∈ f2, ⋯, 15g shown in Figure 2(b). This implies that, by making a simple initial guess for parameters G, ρ, and σ, we can easily impose an additional set of N + 1 linear constraints and uniquely determine a matrix P y . As we will see in the following, this initial guess of parameters is not binding and can be revised during the application of the method.
For a desired stationary distribution π and initial parameters G, ρ, and σ, we can solve πP y = π and obtain the transition matrix P y . The solution of the system of equations is subject to two constraints: the diagonal entries of P y are constant and equal to diag ðP y Þ = diag ðP G,b Þ (for any choice of b ∈ f1, ⋯, 2 G−1 g); all rows of P y are nonnegative and sum up to 1. Since the first and last rows of P y have only two nonzero entries, these two constraints suffice to compute P y ð0, 1Þ and P y ðN, N − 1Þ. We compute all remaining entries P y ðX, X − 1Þ and P y ðX, X + 1Þ recursively following the sequence

Intelligent Computing
Finally, the response vector y is obtained from P y by computing its expected change as in Equation (4).

Regression Problem.
Starting from an arbitrary set B = fhG 1 ; b 1 i, hG 2 ; b 2 i, ⋯g of agent controllers, Equation (4) allows us to define our search space using a matrix E whose columns are the transposed basis vectors e G,b , hG ; bi ∈ B. The response vector y is derived from the stationary distribution π using Equations (7)-(10) and (4). In order to determine our swarm composition C, we need to find a column vector β of regression coefficients that satisfies Coefficients β i are required to form a conical combination. Therefore, we requireβ i ⩾ 0, so that c i ≃ Nβ i results in a nonnegative number of agents with controller hG i ; b i i.
In general, the accuracy of a solution to the regression problem in Equation (11) increases with the number of basis vectors whose coefficient β i is greater than zero (i.e., a greater number of involved basis vectors helps to fine-tune the result). However, that would mean to use many different agent controllers in rather small subpopulations. This increased heterogeneity would, for example, complicate production and handling of robot swarms. More importantly, it might compromise the robustness of the designed swarm. In fact, a swarm composition based on many different agent controllers is more affected by agent failures because each agent controller is likely to be represented by only a few agents in the swarm. As a result, in the case of agent failures, the actual swarm composition might soon depart from the designed one. In contrast, a swarm with few agent controllers but big subpopulations for each suffers less from the loss of agents because these losses are more likely to be homogeneously distributed across agent controllers. The swarm is more robust as it will still approximately allocate its agents as specified by the user input. To design for robustness, we seek to maximize the number of agents using each of the selected agent controller and therefore to minimize the number of different agent controllers used in the designed swarm composition.
We therefore search for a solution that minimizes the number of nonzero coefficients β i . The perfectly suited method for this objective is to define the regression problem as a lasso problem [34] with positivity constraints The first summand implements the actual minimization. The regularization coefficient λ in the second summand determines the weight of the ℓ 1 -penalization term and controls the sparsity of the solution b β. Given a solution b β of the lasso problem (12), we normalize each coefficient β i according to β i = β i /∑ m j=1 β j so that the coefficients β i sum to 1 and satisfy the physical conservation of swarm size. The final swarm composition C is obtained by computing the number c i of agents with controller hG i ; b i i as c i = Nβ i and rounding these values in order to have integer numbers of agents for each controller and a swarm of any desired size N.

Results
We apply our method to design heterogeneous swarms for both unimodal and multimodal user inputs π. As discussed above, heterogeneous swarms formed by many different agent controllers might not be robust to failures. Hence, we minimize the number of agent controllers and give priority to the robustness of the designed solution. We prefer qualitative over quantitative accuracy in the approximation of π. In the following, we design swarms with N = 100 agents. Since π is independent of the magnitude of ρ and σ but only depends on probabilities p ρ and p σ , we set ρ = 1 and vary σ in ½0 ; 1. In the multiagent simulations, ρ and σ are divided by a factor of 100.

Unimodal User Input. Figures 3(a) and 3(b)
show the results of the proposed method applied to a unimodal user input. The red solid line in Figure 3(a) represents the user input π which defines the desired allocation ð25, 75Þ. From π, we derive a response vector y by first constructing an equivalent Markov chain as in Equations (7)-(10) and successively applying Equation (4). We initially set parameters to G = 6 and σ = 0:1. The resulting response vector y (red solid line in Figure 3(b)) shows sudden jumps for values of X∈½10 ; 40. These jumps can be reduced by tuning the initial values of G and σ. However, we observe that tuning is not necessary and might even worsen the accuracy of our design method.
We consider asymmetric agent controllers for G ∈ f3, ⋯, 6g and solve the lasso problem (12) for λ = 1. We obtain the swarm composition C 1 = fðh6 ; 7i, 39Þ, ðh6 ; 11i, 5Þ, ðh6 ; 15i, 56Þg that consists of three agent controllers with G = 6. Due to the requirement of sparsity, the expected changê y fitted computed from C 1 using the Markov chain does not accurately fit the response vector y (see Figure 3(b)). This also applies to the expected changeŷ agent that we measured empirically in multiagent simulations. The designed solution fulfills the essential requirements, such as the zeros and the signs of y in the region of interest (½10 ; 40). This suffices to design a composition C 1 that closely meets the user input as shown in Figure 3(a) by the distribution b π fitted predicted using both the Markov chain model (blue circles) and the distribution b π agent resulting from multiagent simulations (histograms).
Similarly to the solution proposed by Berman et al. [10], our method can also be used to implement sequential task allocation. Let us consider a series of user inputs π 1 , ⋯, π k . By applying our method to each user input, we can derive a set of swarm compositions fC 1 , ⋯, C k g. Individual agents in the swarm could be programmed to change their controller over time according to fC 1 , ⋯, C k g. Depending on the scenario, the change of agent controllers can be coupled to external signals broadcasted by the designer, a predefined time schedule, or changing environmental cues. We performed a simple experiment where the agents in the swarm 7 Intelligent Computing change their agent controller after a certain predefined time. Initially, the swarm is required to allocate its agents around the swarm allocation ð25, 75Þ as specified by the distribution π 1 = π given in Figure 3(a) and uses the swarm composition C 1 1 = fðh6 ; 7i, 39Þ, ðh6 ; 11i, 5Þ, ðh6 ; 15i, 56Þg. In a second time period, the swarm is required to change the distribution. The second distribution π 2 over swarm allocations (not shown here) defines the swarm allocation ð75, 25Þ and is obtained by the swarm composition C 2 1 = fðh6 ; 19i, 59Þ, ðh6 ; 20i, 41Þg. A video recording of this simulation is provided in Movie 1 (see supplementary material). Agents are initialized using the swarm composition C 1 1 and readily converge to π 1 . At time t = 500 seconds, the agents in the swarm change their agent controllers from the initial swarm composition C 1 1 to the second swarm composition C 2 1 . Soon after the change of agent controllers, the swarm updates its allocation and converges to π 2 .

Multimodal User Input.
A multimodal stationary distribution π defines a task allocation scenario characterized by multiple swarm allocations ðX 1 , N − X 1 Þ, ⋯, ðX k , N − X k Þ with one for each mode of π. The result of such a user input is a swarm that periodically switches between different swarm allocations. Contrary to the above discussed case of sequential tasks, switches between pairs of swarm allocations are stochastic and characterized by a certain mean period of time (see [42] for a bimodal example). Thus, multimodal user inputs define a periodic task allocation scenario. Figures 4(a) and 4(b) show an example application of our method to a bimodal user input. The red solid line in Figure 4(a) defines a scenario with two swarm allocations: ð30, 70Þ and ð70, 30Þ. The response vector y (red line in Figure 4(b)) has been derived using initial parameters G = 6 and σ = 0:575. We consider all agent controllers hG ; bi resulting from group sizes G ∈ f3, ⋯, 9g and solve the lasso problem for λ = 3. The solution of Equation (12) yields the swarm composition C 2 = fðh9 ; 78i, 92Þ, ðh9 ; 141i, 4Þ, ðh9 ; 207i, 4Þg characterized by 3 agent controllers with group size G = 9. With respect to the unimodal scenario, we increased the value of the regularization parameter λ to obtain a sparse solution b β. Both the stationary distribution b π fitted predicted using the Markov chain and the distribution b π agent computed from multiagent simulations (shown in Figure 4(a)) qualitatively match the user input π with only a small deviation of the distribution around X ∈ ½45 ; 55. A video recording of a multiagent simulation of the bimodal scenario can be found in Movie 2 (see supplementary material).
Additionally, the user might also express requirements over the mean switching time T X 1 ⟶X 2 , that is, the time necessary for the swarm to reallocate its agents from ðX 1 , N − X 1 Þ to ðX 2 , N − X 2 Þ. Using the Markov chain model resulting from C 2 , we can compute the mean and the variance of the number of control rule executions necessary for this purpose. By multiplying these statistics by the mean duration ρ −1 N + σ −1 N between the execution of two control rules, we can obtain T 30⟶70 as a function of the rates ρ and σ. We recover the time in its continuous form from the discrete number of executions of control rules. Figure 4(c) shows the prediction of the Markov chain (T 30⟶70model , shaded area) compared to multiagent simulations (T 30⟶70agent box-plots) when σ = 0:575ρ and ρ −1 ∈ ½25 ; 420. We obtain a good agreement of both means (dashed line versus diamonds symbols) and variances of the two models.
Finally, we apply our global-to-local design method to a trimodal user input π. The stationary distribution π (red line in Figure 4(d)) defines a task allocation scenario where the swarm alternates its workforce among three possible allocations: ð10, 90Þ, with the majority of agents working on task 1; ð50, 50Þ with agents equally allocated to both tasks; and ð90, 10Þ with the majority of agents working on task 0. We compute the response vector y using initial parameters G = 5 and σ = 0:2 (data not shown). We define the minimization 8 Intelligent Computing problem (12) by considering only asymmetric agent controllers with G = 7 (i.e., the first 32 basis vectors). The solution of the lasso problem for λ = 1 gives the swarm composition C 3 = fðh7 ; 22i, 56Þ, ðh7 ; 26i, 44Þg. The distribution b π fitted predicted using the Markov chain and the distribution b π agent resulting from simulations qualitatively suit the user requirements (blue circles and histograms).

Discussion
We have shown that our method can be used to design swarms that allocate their agents as defined by a user input. The user input is a stationary probability distribution over swarm allocations and defines the probability of any possible swarm allocation. Our method allows the user to specify scenarios with a single swarm allocation using a unimodal distribution and scenarios where the swarm alternates between different swarm allocations using a multimodal distribution. A unimodal stationary distribution π defines a task allocation scenario with a single swarm allocation ðX, N − XÞ. In principle, this scenario could be tackled by a static assignment of agents to each of the two tasks. However, such an approach is not robust to individual agent failures. Consider, for example, a collective construction scenario where task 0 and task 1 require, respectively, to dig and to remove the excavation material from a construction site. Due to workload disparity between tasks, agents are likely to experience uneven failure rates. Over time, the swarm might significantly depart from the desired allocation ðX, N − XÞ. Without complete knowledge of individual agent failures, a designer would be prevented from restoring the initial static allocation (e.g., by deploying new agents). Our design method is robust to such situations. Since agents repeatedly switch tasks, the workload is shared equally among agents. Agents are thus equally subject to wear as well as failures and the desired proportions of agents with each controller is preserved. Over the system's lifetime, an operator can add new agents to the system with the same proportion of agent controllers as originally designed to counter degrading swarm performance. Note that the addition of new agents in the swarm can even be used as a means to reprogram the 9 Intelligent Computing swarm behavior by considering the original swarm composition as an additional constraint in our design method.
As discussed in Section 3.1, we can use a chronological sequence of unimodal distributions to design sequential task allocation scenarios. This is achieved by letting agents change their agent controllers over time and results in a swarm that switches from a swarm allocation to the next in the sequence. Our method can be used to design a swarm composition for each distribution in the sequence. However, this approach to sequential task allocation requires agents with a mechanism (e.g., based on an external signal or fixed time scheduling) that triggers changes of agent controllers.
Alternatively, the user might provide a multimodal distribution as input. In this case, with a single swarm composition that does not change over time, we obtain a swarm behavior that naturally oscillates between the swarm allocations defined by the modes of the user input. This type of self-organizing swarm behavior is similar to the one investigated by Silk et al. [43] for the design of self-organizing networks. Periodic task allocation offers an alternative approach to implement sequential task allocation. It might be useful in extreme applications where hardware limitations prevent agents from perceiving external signals or from being programmable (e.g., hard-wired controllers in nanorobotics applications [44]). This alternative approach to sequential task allocation could be useful, for example, to increase the penetration of nanobots into tumors [45,46]. Nanobots with multifunctional capabilities (e.g., sensing, imaging, and therapy) [47] could be designed to initially perform tissue penetration and diagnosis and later to deliver the drugs in their payloads.
Through the example application of task allocation, we introduced the idea that swarms that achieve user-specified objectives can be designed by leveraging on behavioral heterogeneity. This idea was inspired by the concept of population coding from neurosciences [24], where a population of neurons performs a function approximation by combining different heterogeneous contributions. Leveraging behavioral heterogeneity substantially differs from most existing global-to-local design approaches. Largely focused on selfassembly [14] and formation control problems [13], existing methods tend to be tailored for their respective applications. The method proposed by Klavins, for example, makes use of graph grammars and optimizes their execution rates to design a system that self-assembles into simple controlled shapes [15]. Rubenstein et al. [16] proposed a distributed algorithm for self-assembly and experimented with a swarm of more than a thousand robots. In their study, robots are given a blueprint of the desired shape and follow only local cues to incrementally position themselves according to the blueprint. In collective construction, Werfel et al. [17] propose a compilation method that decomposes a userspecified structure into a set of construction paths that robots follow to build a desired artifact. Similarly to our approach, all these studies use optimization methods to explore a (possibly constrained) design space.
The potential of behavioral heterogeneity has been previously investigated in an aggregation scenario [48]. In this study, the authors show, by means of evolutionary computation techniques, that heterogeneous swarms can outperform their homogeneous counterparts. Our method has some similarities with the approach used by Hamann et al. [49] to analyze collective motion in locust swarms; they use a linear combination of polynomials to fit a network model to macroscopic measurements of simulations. In their approach, the regression coefficients provide information about the spatial distribution of agents in the swarm. We have previously published an approach that is conceptually similar to the one we present here [50,51]. One of the main differences is that we used evolutionary computation to select controller types instead of the more sophisticated and efficient optimization technique used in this paper. In addition, we also made use of simulations to estimate global effects in contrast to the formal approach we propose in this study.
In future work, we plan to extend the method and to apply it to task allocation scenarios with more than two tasks. This extension will require the definition of other linear constraints in addition to those defined in Section 2.5 that are necessary to uniquely derive a response vector from the user input. This could be achieved, for example, by considering different priorities among the tasks to be executed. We note that the total number of agent controllers is an exponential function of the number of tasks. However, penalized regression techniques allow us to consider highdimensional search spaces and to investigate a reasonable range of application scenarios. We also plan to perform a thorough algebraic characterization of our basis vectors and response vectors with the aim to improve the performance of the design method. We believe that our design idea of behaviorally heterogeneous agents has potential for a wider range of applications beyond task allocation. Our primary goal is therefore to deepen our understanding of the fundamental principles of behavioral heterogeneity. We want to extend our approach to many different swarm scenarios, such as collective decision-making and spatially organizing tasks.
We believe that our proposed approach is a fundamentally novel paradigm for the design of robot swarms and that the idea of programming the swarm at a global level by following a recipe that describes how to put together the right amounts of different robot controller types, almost as if they were ingredients of a cake, is particularly intriguing. With our approach, the swarm can be reprogrammed on a global level at runtime by adding robots of different robot controller types, without the need for the individual robots to be programmable.