Global-to-Local Design 1 for Self-Organized Task Allocation in Swarms

7 Programming robot swarms is hard because system requirements are formulated at the 8 swarm level (i.e., globally) while control rules needs to be coded at the individual robot level 9 (i.e., locally). Connecting global to local levels or vice versa through mathematical modeling to 10 predict the system behavior is generally assumed to be the grand challenge of swarm robotics. 11 We propose to approach this problem by programming directly at the swarm level. Key to this 12 solution is the use of heterogeneous swarms that combine appropriate subsets of agents whose 13 hard-coded agent behaviors have known global effects. Our novel global-to-local design method- 14 ology allows to compose heterogeneous swarms for the example application of self-organized task 15 allocation. We define a large but finite number of local agent controllers and focus on the global 16 dynamics of behaviorally heterogeneous swarms. The user inputs the desired global task al- 17 location for the swarm as a stationary probability distribution of agents allocated over tasks. 18 We provide a generic method that implements the desired swarm behavior by mathematically 19 deriving appropriate compositions of heterogeneous swarms that approximate these global user 20 requirements. We investigate our methodology over several task allocation scenarios and vali- 21 date our results with multi-agent simulations. The proposed global-to-local design methodology 22 is not limited to task allocation problems and can pave the way to formal approaches to design 23 other swarm behaviors. 24


25
A primary challenge that complicates the spread of applications of large collections of embodied 26 agents [1,2] is how to design individual agent controllers for a given desired collective behavior. 27 The canonical, local-to-global approach [3] includes a trial and error refinement of individual agent 28 control rules followed by a macroscopic analysis of resulting swarm behaviors [4,5] or a formal 29 verification of specific properties of interest [6][7][8][9]. Designing agent controllers for a target swarm 30 behavior in a non-iterative way without continuous refinements has proven challenging. At present, 31 feedback gathered from the environment by individual agents [11,12]. Agents keep track of the number of successfully completed tasks and report this information to a centralized authority (called 74 the hive) that, in turn, updates the parameters of their stochastic control policy. Differently, our 75 design approach provides a completely self-organized solution that does not require any centralized 76 authority. 77 We consider a simple scenario where a user wants to design a swarm that allocates its members to 78 only two tasks. Despite the simplicity of this task allocation problem, a few variations are possible. 79 Let's say we have task 0 and task 1 and we want to design a swarm with 80% of agents working on 80 task 0 and 20% working on task 1. In a trivial approach we could statically assign agents to tasks 81 before deployment. However, it is generally beneficial to require the swarm to allocate agents to tasks 82 after deployment so as to increase robustness to individual agent failures. Agents have only local 83 perception and cannot accurately estimate the number of agents currently assigned to either task. 84 Hence, the swarm behavior is inherently stochastic. Even for a good design of the agent controller, we 85 can only hope to have 80% of the agents assigned to task 0 on average over time due to the variance 86 introduced by each agent accuracy in assessing the current state of the swarm. A variant of this 87 scenario arises when the user wants to define the variance of the swarm allocation (i.e., increasing it 88 over the accuracy-limited value), for example, to increase the swarm's potential for exploration over 89 exploitation. Another variant is represented by sequential task allocation. For instance, we initially 90 want the 80/20% allocation as above but followed in a later phase by a 30/70% allocation, possibly 91 triggered by external factors. For example, in a surveillance task a swarm may need to monitor the 92 inside and outside of a facility allocating agents in different proportions during the day and night. 93 Yet another variant is periodic task allocation. We allow the swarm to autonomously decide when 94 and how often to switch from 80/20% to 30/70%. 95 In more formal terms, a swarm allocation corresponds to a partitioning of the agents into two 96 working groups, one for each of the two tasks. The user provides a description of the desired swarm 97 allocations as a probability distribution over the space of all possible swarm allocations. To define a 98 specific swarm behavior, the user manipulates the number and positions of the distribution's modes 99 (i.e., local probability maxima) with each mode corresponding to a target swarm allocation (e.g., 100 as above with the two modes 80/20% and 30/70%). The user can specify a static task allocation 101 scenario by means of a unimodal distribution. A sequential task allocation scenario is defined through 102 task 0 task 1 T 0→1 T 1→0 Figure 1: Finite state machine of the agent controller. Agent controller as probabilistic finite state machine for the simplistic two-task allocation. Transition conditions T 0→1 and T 1→0 are defined by the respective agent controller type and need to depend on locally measurable features only.
we regain that freedom at the global level by composing heterogeneous swarms with wisely chosen 114 doses of several agent controller types. For these predefined local agent controllers we know their 115 global swarm effect that we can model via the above mentioned basis vectors. By appreciating the 116 probabilistic nature of swarms, we can model individual behaviors using probabilistic finite state ma-117 chines (PFSM), generalizing our approach to a wide range of scenarios representable by PFSM, and 118 similarly also understand global swarm behavior via population models. We perceive the swarm as 119 a stochastic dynamical system with the swarm making probabilistic autonomous decisions switching 120 between global states. We define an arbitrarily large number of agent controllers, that is, sets of pre-121 defined control rules. For each agent controller, we derive a basis vector that models its global-level 122 contribution to the swarm dynamics. Specifically, each basis vector describes the transient dynam-123 ics of a homogeneous swarm where all agents run the same controller. The probability distribution 124 over swarm allocations given by the user as input defines the desired asymptotic behavior of the 125 swarm. From this input, we mathematically derive a response vector to describe the desired tran-126 sient dynamics of the swarm. These transient dynamics are such that the swarm will asymptotically 127 converge to the stationary distribution in input from the user. We then use the response vector as a 128 reference to select the necessary agent controllers through a linear combination of our initial set of 129 basis vectors. Finally, we systematically search for a proper composition of a heterogeneous swarm 130 by estimating the coefficients in a lasso regression [34] between the response vector and a linear 131 combination of basis vectors. We use penalized regression to limit the set of selected controller types 132 to a few that are indeed required (i.e., basis vector with strictly positive coefficients) and the value 133 of the coefficients to define the proportions of agents executing each of the selected controllers. We build on the idea of behavioral heterogeneity to define a global-to-local design method for ST-

137
MR task allocation problems. We consider the problem of designing a swarm of N agents that 138 allocates its members to a pair of tasks (task 0 and task 1) as defined by a user input. The user 139 input, formally defined in Sec. 2.2, prescribes a desired swarm allocation by means of a stationary 140 probability distribution π defined over the space of all possible allocations of N agents to 2 tasks. 141 We leverage the degrees of freedom that can be gained at the global level by mixing different agent 142 controllers at the local level. Contrary to local-to-global approaches that manually explore a possibly infinite space of design solutions to obtain a single agent controller for an homogeneous swarm, we restrict our design-space to a large but finite number of alternatives and systematize our search Let (X, N −X) represent a swarm allocation, where X ∈ X is the number of agents allocated to task 0 168 (respectively, N − X to task 1), and X = {0, 1, . . . , N } is the set of all possible macroscopic states 169 of the swarm (i.e., all possible distributions of agents over the two tasks). The user inputs a desired 170 stationary probability distribution π = (π 0 , . . . , π N ), π i > 0, over the macroscopic state space.

171
Entries π i , i ∈ X , give the probability that the swarm allocation is (i, N − i); each mode of π (i.e., 172 local probability maxima) defines a desired swarm allocation (X, N − X) by virtue of representing 173 those allocations that are most likely to realize at any given time. The number of modes of the user 174 input determines the particular variant of the task allocation scenario. A distribution π with one 175 unique mode corresponds to a single and static swarm allocation; given a sequence π 1 , π 2 , . . . of such 176 distributions, we can design swarms for sequential task allocation by including a triggering criterium 177 for agents to change their control rules. When the user input π is a multimodal distribution, the

Agent controllers
We consider agents with only local perception of their environment and local agent-to-agent com-183 munication. By building on these limited capabilities, we define a recipe to enumerate finitely many 184 different agent controllers. We achieve this by considering a template of a control rule that can be 185 instantiated with different configurations and that allows us to enumerate different agent controllers. 186 We abstract from any domain-specific actions that an agent would need to execute in a particular 187 task and application. Instead, we focus on the agent interactions and the decision-making necessary 188 to fulfill the swarm allocations desired by the user. We consider a system where tasks are uniformly to either increase or decrease the number of agents allocated to a task by one unit. As a function of 198 its current allocation and those of its neighbors, the agent either self-switches to the other task or 199 recruits a neighbor from those with the alternative allocation. When the agent acts as a recruiter, 200 the recruited neighbor always switches its task allocation and it does so independently of its internal 201 state and of its actual agent controller. That is, passively recruited agents always switch their 202 task allocation without objections. Control rules are executed randomly by individual agents: self-203 switching with rate σ and switch-or-recruit with rate ρ (respectively, with probabilities p σ = σ/(σ+ρ) 204 and p ρ = ρ/(σ + ρ)).

205
All agents only have local perception of the current global task allocation. We say that each agent 206 perceives the currently assigned tasks from its neighbors (i.e., agents within proximity, for example, 207 within communication range). Note that all agents move at all time and neighborhoods are subject 208 to change, that is, the underlying network is dynamic. Each agent knows a number N 0 of neighbors 209 currently assigned to task 0, a number N 1 = G−1−N 0 of neighbors currently assigned to task 1, and 210 its own currently assigned task forming a set of information from G ≪ N agents (G−1 neighbors plus 211 considered agent). We define agent controllers ⟨G; b⟩, b ∈ {1, . . . , 2 G−1 }, that differ from each other 212 by the logical function ∆ G,b . We use function ∆ G,b to define the local task switching behavior of an 213 agent and to determine the global effect of the switch-or-recruit rule. Function ∆ G,b takes as input a 214 group of task allocations of size G. This group includes the task allocation of the agent applying the 215 switch-or-recruit rule and that of its G − 1 neighbor agents. Parameter b is an index that encodes a 216 particular task switching behavior and ranges over all possible agent controllers based on the same 217 group size G. For a group of task allocations of size G, we have G+1 possible group compositions. We 218 do not assign any action to homogeneous groups (i.e., groups with either 0 or G agents allocated to 219 task 0). Therefore, ∆ G,b has G + 1 possible inputs and three possible outputs (i.e., switch allocation, 220 recruit a neighbor, no action). Moreover, since the no-action is fixed in each agent controller, we Table 1: Example of logical function ∆ G,b with G = 3 and b = 1. Symbol a gives the current task allocation of the focal agent, N 0 is the number of neighbors allocated to task 0, and (∆ 1 = +1, ∆ 2 = −1) define the outcome of ∆ G,b (in this case a majority rule). Symbol '-' represents no action.  Table 1 shows an example of an agent controller design method because they allow us to predict the global behavior of a heterogeneous swarm. 237 We have developed a simple microscopic multi-agent simulator to validate our design method.

263
For a given agent controller ⟨G; b⟩ with function Probability P G,b (X, X + 1) models the transition X → X + 1 of winning one more agent being 264 assigned to task 0. It is the sum of two contributions: the probability that an agent currently 265 allocated to task 0 self-switches its allocation to task 1; and the probability that any agent increases    In our global-to-local design method, we obtain the response vector y, which represents the expected 292 change of the user-desired swarm, from the stationary distribution π (see Sec. 2.2). We first construct 293 a Markov chain P y that converges to π itself and then compute y from P y with Eq. (4).

294
The stationary distribution π of an ergodic Markov chain with transition matrix P can be  This number of dimensions is due to the sparse structure of tridiagonal matrices and to the fact 299 that transition matrices are row-stochastic (i.e., row entries are non-negative and sum up to 1). As 300 a consequence, in order to construct our response vector y we need to find a set of 2N additional 301 constraints.

302
The stationary distribution π defined by the user imposes a set of N + 1 linear constraints on 303 this manifold through equation The intuitive interpretation is that the probability π i of state i has to be the sum of all influxes from 305 any state j to i (including i = j). Due to the linear relation i∈X π i = 1 one of these constraints 306 is redundant and the stationary distribution π reduces the number of dimensions of {P } from 2N 307 to N . Therefore, a general transition matrix P y that converges to π can be parameterized by N 308 constant values referred to as ψ = (ψ 1 , . . . , ψ N ). By constraining the transition matrix P y to be 309 row-stochastic we obtain the set of inequalities Any choice of values for parameters ψ = (ψ 0 , . . . , ψ n−2 ) that satisfies the above set of inequalities 311 defines a transition matrix P y that satisfies πP y = π. Since probabilities π i , i ∈ X , are non-312 negative by definition, all entries in the parameter vector ψ can always be chosen to be sufficiently 313 small to satisfy the set of inequalities in Eq. (5). Using Eq. (5) we have obtained N of 2N constraints 314 necessary to determine a transition matrix P y that asymptotically converges to π.

315
In order to uniquely determine a transition matrix P y , we still require N additional constraints.
(2), we see that all agent controllers ⟨G; b⟩, b ∈ {1, . . . , 2 G−1 }, have equal diagonal 317 entries P G,b (X, X). Furthermore, the probabilities P G,b (X, X) converge for increasing group sizes G 318 as indicated by example group sizes G ∈ {2, . . . , 15} shown in Figure 2b. This implies that, by making 319 a simple initial guess for parameters G, ρ, and σ, we can easily impose an additional set of N + 1 320 linear constraints and uniquely determine a matrix P y . As we will see in the following, this initial 321 guess of parameters is not binding and can be revised during the application of the method.

322
For a desired stationary distribution π and initial parameters G, ρ, and σ, we can solve πP y = π and obtain the transition matrix P y . The solution of the system of equations is subject to two constraints: the diagonal entries of P y are constant and equal to diag (P y ) = diag (P G,b ) (for any choice of b ∈ {1, . . . , 2 G−1 }); and all rows of P y are non-negative and sum up to 1. Since the first and last rows of P y have only two non-zero entries, these two constraints suffice to compute P y (0, 1) and P y (N, N − 1). We compute all remaining entries P y (X, X − 1) and P y (X, X + 1) recursively following the sequence · · · P y (X, P y (X, X + 1) = 1 − P y (X, X) − P y (X, X − 1) .
Finally, the response vector y is obtained from P y by computing its expected change as in Eq. (4). ⟨G; b⟩ ∈ B. The response vector y is derived from the stationary distribution π using Eqs. (6-9) 327 and Eq. (4). In order to determine our swarm composition C, we need to find a column vector β of 328 regression coefficients that satisfies Coefficients β i are required to form a conical combination, that means we require β i ⩾ 0, so 330 that c i ≃ N β i results in a non-negative number of agents with controller ⟨G i ; b i ⟩.

331
In general, the accuracy of a solution to the regression problem in Eq. 10 increases with the num-332 ber of basis vectors whose coefficient β i is greater than zero (i.e., a greater number of involved basis 333 vectors helps to fine-tune the result). However, that would mean to use many different agent con- q q q q q q q q q q q q q q q q q q q q q q q q π π f i tt ed π agent (a)  3 Results

356
We apply our method to design heterogeneous swarms for both unimodal and multimodal user 357 inputs π. As discussed above, heterogeneous swarms formed by many different agent controllers 358 might not be robust to failures. Hence, we minimize the number of agent controllers and give 359 priority to the robustness of the designed solution. We prefer qualitative over quantitative accuracy 360 in the approximation of π. In the following, we design swarms with N = 100 agents. Since π is 361 independent of the magnitude of ρ and σ, but only depends on probabilities p ρ and p σ , we set ρ = 1 362 and vary σ in [0; 1]. In the multi-agent simulations, ρ and σ are divided by a factor of 100. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q π π f i tt ed π agent (d) Figure 4: Results for the bimodal and multimodal scenarios. Illustration of the design method and comparison with multi-agent simulations. For the bimodal scenario a) depicts the stationary distribution, b) the expected change, and c) the mean switching time. Figure d) depicts the stationary distribution of the trimodal scenario.
not necessary and might even worsen the accuracy of our design method. 371 We consider asymmetric agent controllers for G ∈ {3, . . . , 6} and solve the lasso problem (11) 372 for λ = 1. We obtain the swarm composition C 1 = {(⟨6; 7⟩, 39), (⟨6; 11⟩, 5), (⟨6; 15⟩, 56)} that consists 373 of three agent controllers with G = 6. Due to the requirement of sparsity, the expected changeŷ fitted 374 computed from C 1 using the Markov chain does not perfectly accurately fit the response vector y 375 (see Figure 3b). This also applies to the expected changeŷ agent that we measured empirically in Additionally, the user might also express requirements over the mean switching time T X1→X2 , 418 that is, the time necessary for the swarm to reallocate its agents from (X 1 , N − X 1 ) to (X 2 , N − X 2 ).

419
Using the Markov chain model resulting from C 2 , we can compute the mean and the variance of the  As discussed in Section 3.1, we can use a chronological sequence of unimodal distributions to 459 design sequential task allocation scenarios. This is achieved by letting agents change their agent 460 controllers over time and results in a swarm that switches from a swarm allocation to the next in 461 the sequence. Our method can be used to design a swarm composition for each distribution in the 462 sequence. However, this approach to sequential task allocation requires agents with a mechanism (e.g., based on an external signal or fixed time scheduling) that triggers changes of agent controllers.
to those defined in Section 2.5 that are necessary to uniquely derive a response vector from the user 504 input. This could be achieved, for example, by considering different priorities among the tasks to 505 be executed. We note that the total number of agent controllers is an exponential function of the 506 number of tasks. However, penalized regression techniques allow us to consider high-dimensional 507 search spaces and to investigate a reasonable range of application scenarios. We also plan to perform 508 a thorough algebraic characterization of our basis vectors and response vectors with the aim to 509 improve the performance of the design method. We believe that our design idea of behaviorally 510 heterogeneous agents has potential for a wider range of applications beyond task allocation. Our 511 primary goal is therefore to deepen our understanding of the fundamental principles of behavioral 512 heterogeneity. We want to extend our approach to many different swarm scenarios, such as collective 513 decision-making and spatially organizing tasks. 514 We believe that our proposed approach is a fundamentally novel paradigm for the design of 515 robot swarms and that the idea of programming the swarm at a global level by following a recipe 516 that describes how to put together the right amounts of different robot controller types, almost as 517 if they were ingredients of a cake, is particularly intriguing. With our approach, the swarm can 518 be reprogrammed on a global level at runtime by adding robots of different robot controller types, 519 without the need for the individual robots to be programmable. [2] E. Şahin, "Swarm robotics: From sources of inspiration to domains of application," in Swarm