1. Introduction
Approaching or chasing a moving target via optimal control has been a common task in natural and human settings, when e.g. animals like lions and sharks forage for prey, phagocytes chase and kill bacteria, predatory bacteria feed on other bacteria (Dashiff et al. Reference Dashiff, Junka, Libera and Kadouri2011; Pérez et al. Reference Pérez, Moraleda-Muñoz, Marcos-Torres and Muñoz-Dorado2016), missiles intercept invading aircraft, and shooters aim for running targets. The optimal foraging of natural creatures may remain elusive; however, similar chasing applications in defence and robotic systems have become mature thanks to the optimal control theory.
Among these scenarios, the controlled agent and the target that it approaches are in a dry environment such as air or in a liquid-filled wet environment. In the air, the agent and target, unless closely gapped or in a specific configuration (e.g. a missile in the wake of a high-speed aircraft), may not affect the motion of each other effectively by disturbing the air flow; that is, they can sense weakly the additional aerodynamic force induced by the motion of the other. In a liquid environment, the interaction between the agent and target is no longer weak because the viscosity of liquid is larger than that of air by three orders. Hence a moving agent in liquid such as water can disturb its surrounding flow that exerts a considerable hydrodynamic force on its target nearby. This feature results in hydrodynamic interactions between the agent and target, which can influence significantly the chasing dynamics and the associated optimal chasing strategies. The effects of hydrodynamic interactions become especially pronounced and more long-ranged when the agent approaches/chases its targets in a low-Reynolds-number flow. This scenario occurs commonly for microorganisms or millimetre-scaled organisms swimming to approach motile particulate objects (e.g. bacteria or phytoplankton cells) and non-motile counterparts such as organic debris (Kiørboe et al. Reference Kiørboe, Andersen, Langlois, Jakobsen and Bohr2009). Both type of swimmer and size of target span a wide range (Jabbarzadeh & Fu Reference Jabbarzadeh and Fu2018): a typical planktonic grazer is much larger than its prey (Hansen, Bjornsen & Hansen Reference Hansen, Bjornsen and Hansen1994; Kiørboe Reference Kiørboe2016); an organism swims towards a similarly sized member of the same species during bacterial conjugation (Clark & Adelberg Reference Clark and Adelberg1962) or mating of copepods (Strickler Reference Strickler1998); the target can also be much larger than the approacher, exemplified by a spermatozoon swimming towards the egg or marine microbes targeting biological debris for nutrients uptake and habitation (Kiørboe Reference Kiørboe2003). Besides such natural events, similar situations might arise in the applications of future medical microrobots, which need to approach targets such as bacteria and human cells of varying sizes (Nelson, Kaliakatsos & Abbott Reference Nelson, Kaliakatsos and Abbott2010; Ceylan et al. Reference Ceylan, Yasa, Kilic, Hu and Sitti2019). In these low-Reynolds-number flow configurations featuring important, long-ranged hydrodynamic interactions, prior explored predator–prey dynamics of territory/aerial animals or high-Reynolds-number aquatic animals together with the optimal chasing/approaching strategies would not apply. These tiny swimmers have evolved a variety of unique strategies suited for the viscous environment. For instance, zooplankton achieve feeding by means of ambushing (Kiørboe et al. Reference Kiørboe, Andersen, Langlois, Jakobsen and Bohr2009), generating currents (Fenchel Reference Fenchel1980), cruising (Kiørboe et al. Reference Kiørboe, Andersen, Langlois, Jakobsen and Bohr2009) and colonizing marine snow aggregates.
A decent understanding of the predator–prey dynamics in viscous low-Reynolds-number flows would benefit analysing the predatory and evasive behaviours of microorganisms, and exploring their evolutionary advantages. Also, designing the optimal predatory and evasive strategies will be potentially useful in manipulating future medical microrobots to capture bacteria or escape from hostile immune cells. Apart from a substantial amount of studies on a related topic – nutrients uptake and feeding of swimming microorganisms (Magar, Goto & Pedley Reference Magar, Goto and Pedley2003; Langlois et al. Reference Langlois, Andersen, Bohr, Visser and Kiørboe2009; Michelin & Lauga Reference Michelin and Lauga2011; Tam & Hosoi Reference Tam and Hosoi2011; Lambert et al. Reference Lambert, Picano, Breugem and Brandt2013; Kiørboe et al. Reference Kiørboe, Jiang, Gonçalves, Nielsen and Wadhwa2014; Dölger et al. Reference Dölger, Nielsen, Kiørboe and Andersen2017; Andersen & Kiørboe Reference Andersen and Kiørboe2020) – work has addressed the interaction between a swimming predator and an individual particle or prey nearby. Without considering swimming-induced hydrodynamic effects, Sengupta, Kruppa & Löwen (Reference Sengupta, Kruppa and Löwen2011) proposed and investigated a discrete chemotactic predator–prey model that describes a chasing predator and an escaping prey, which sense the diffused chemicals released from each other. Pushkin, Shum & Yeomans (Reference Pushkin, Shum and Yeomans2013) studied theoretically and numerically the advection of a tracer and a material sheet of tracers when a microswimmer moves along an infinite straight path. Mathijssen, Jeanneret & Polin (Reference Mathijssen, Jeanneret and Polin2018) combined experiments, theory and simulations to perform a deep analysis of hydrodynamic entrainment of a particle by a swimming microorganism. Using a bispherical coordinate system, Jabbarzadeh & Fu (Reference Jabbarzadeh and Fu2018) studied analytically the scenario of a forced spherical particle approaching another; they also investigate numerically the head-on approach of a self-propelling swimmer to another passive particle. Słomka et al. (Reference Słomka, Alcolombri, Secchi, Stocker and Fernandez2020) conducted a modelling study on the ballistic encounter between elongated model bacteria and a much larger marine snow particle that is sedimenting. Very recently, Borra et al. (Reference Borra, Biferale, Cencini and Celani2022) have studied a pair of point predator and prey considering their hydrodynamic interactions; they used a multi-agent reinforcement learning scheme to explore efficient, physically explainable predatory and evasive strategies. Besides these works, Visser & Kiørboe (Reference Visser and Kiørboe2006) has discussed the influence of ballistic or diffusive mobility patterns of planktonic organisms on the encounter rates between them and their prey, mates or predators.
In this work, we explore the optimal strategies of a finite-size swimming predator chasing a non-motile prey represented by a tracer point or a finite-size sphere. The motion of the tracer is driven purely by the propulsion-induced disturbance flow of the predator, whereas for the spherical prey, we consider the two-way hydrodynamic coupling between the predator and prey based on numerical simulations. To seek the most time-saving or energy-efficient pursuing strategies of the predator, we adopt a numerical optimal control approach for a point prey and reinforcement learning (RL) for general cases. The RL-based optimal solutions agree qualitatively with and capture the essential features of the globally optimal solutions identified by the former. We will demonstrate the emergence of non-intuitive optimal solutions in the seemingly simple configurations. We will also interpret physical mechanisms of the optimal strategies and discuss their implication on developing synthetic microrobots designed for capturing moving objects.
2. Problem set-up, assumptions and methods
2.1. Problem set-up
We consider a microscale predator that swims to approach a prey in low-Reynolds-number flows (see figure 1a). A spherical squirmer of radius $A$ is adopted to model the predator, which attains propulsion and rotation based on its surface actuation described by a slip velocity
$\bar {\boldsymbol {u}}_{{s}}(\theta ^{\prime },\phi ^{\prime })$, where
$\theta ^{\prime } \in [0,{\rm \pi} ]$ and
$\phi ^{\prime } \in [0,2{\rm \pi} ]$ are the polar and azimuthal angles with respect to the squirmer's swimming orientation
$\boldsymbol {e}_{{s}}$; here,
$\boldsymbol {e}_{{s}}$ coincides with
$\boldsymbol {e}_{z^{\prime }}$ of the reference coordinates system
$\boldsymbol {e}_{x^{\prime } y^{\prime } z^{\prime }}$ translating and rotating with the squirmer. The prey is modelled by a passively moving point tracer or a finite-size spherical particle of radius
$\mathcal {A}$. The ratio
$\chi =\mathcal {A}/A$ is defined to indicate the size of prey compared to that of predator, which is zero for a point prey. From here on,
$\bar {\ }$ is used to denote dimensional variables. It is worth noting that the squirmer model was proposed by Lighthill (Reference Lighthill1952) and Blake (Reference Blake1971) for ciliary propulsion of microorganisms such as Paramecium and Volvox. This model has been used successfully to study microscale propulsion in the context of rheological complexity (Datt et al. Reference Datt, Zhu, Elfring and Pak2015; Lintuvuori, Würger & Stratford Reference Lintuvuori, Würger and Stratford2017; Li, Lauga & Ardekani Reference Li, Lauga and Ardekani2021), stratified fluids (More & Ardekani Reference More and Ardekani2020), viscosity gradients (Datt & Elfring Reference Datt and Elfring2019), effects of boundaries (Spagnolie & Lauga Reference Spagnolie and Lauga2012; Ishimoto & Gaffney Reference Ishimoto and Gaffney2013; Zhu, Lauga & Brandt Reference Zhu, Lauga and Brandt2013), suspension of active particles (Ishikawa, Brumley & Pedley Reference Ishikawa, Brumley and Pedley2021), and so on.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_fig1.png?pub-status=live)
Figure 1. (a) A squirming predator chases a point prey on the $y=0$ plane (finite-size spherical prey will be investigated in § 4.2). Here,
$\boldsymbol {e}_{x^{\prime } y^{\prime } z^{\prime }}$ denotes the local coordinate system translating and rotating with the squirmer, while
$\boldsymbol {e}_{xyz}$ is the global counterpart. (b) The predator captures the prey at time
$t=T$, when the latter is within a cut-off distance
$\varepsilon \ll 1$ away from the surface of the former. The coordinates of the squirmer centre and prey are
$\boldsymbol {r}_{{s}}$ and
$\boldsymbol {r}_{{p}}$, respectively;
$\boldsymbol {r} = \boldsymbol {r}_{{p}} - \boldsymbol {r}_{{s}} = r( \sin \theta \,\boldsymbol {e}_x + \cos \theta \,\boldsymbol {e}_z )$ indicates their relative displacement. The orientation
$\boldsymbol {e}_{{s}}$ of the squirmer deviates from
$\boldsymbol {e}_z$ by an angle
$\alpha$. The travelling distances of the predator and prey are denoted by
$L$ and
$\tilde {L}$, respectively. The insets show the disturbance velocity fields in the
$y=0$ plane (in the lab frame) generated by a squirmer using the
$B_{{F}}=1$,
$B_{{L}}=1$ and
$B_{{S}}=1$ modes. The variables here are all dimensionless.
Now we describe our predator–prey problem. At time $\bar {t} = 0$, the orientation
$\boldsymbol {e}_{{s}}$ of the squirming predator is in the
$\boldsymbol {e}_z$ direction; the predator's centre located at
$\bar {\boldsymbol {r}}_{{s}}$, and the prey's centre at position
$\bar {\boldsymbol {r}}_{{p}}$, are in the
$\bar {y}=0$ plane. To simplify the setting, we assume a symmetric surface actuation velocity
$\bar {\boldsymbol {u}}_{{s}}$ about the plane, resulting in zero
$\bar {y}$ component of the predator’ swimming velocity and of its disturbance velocity in this plane. The latter implies that the prey will remain in the plane, as shown in figure 1(b); the angle
$\alpha$ between the squirmer's orientation
$\boldsymbol {e}_{{s}}$ and the
$\boldsymbol {e}_z$ axis indicates that
$\boldsymbol {e}_{{s}} = \sin {\alpha }\,\boldsymbol {e}_x + \cos \alpha \,\boldsymbol {e}_z$. Also, we assume that the predator detects the instantaneous position of the prey, following the previous works on intelligently controlled swimmers (Gazzola et al. Reference Gazzola, Tchieu, Alexeev, de Brauer and Koumoutsakos2016; Mirzakhanloo, Esmaeilzadeh & Alam Reference Mirzakhanloo, Esmaeilzadeh and Alam2020). This assumption is made here to minimize the complexity of the problem, but we should note that most natural predators or prey do not ‘see’ the nearby motile organisms but detect their presence based on the hydrodynamic (Jakobsen Reference Jakobsen2001; Visser Reference Visser2001) or chemical (Svensen & Kiørboe Reference Svensen and Kiørboe2000) cues. The predator is actuated by four squirming modes,
$\bar {B}_{{F}} (\bar {t})$,
$\bar {B}_{{L}} (\bar {t})$,
$\bar {B}_{{R}} (\bar {t})$ and
$\bar {B}_{{S}} (\bar {t})$; the first two modes allow it to translate forward and laterally, respectively, with respect to its orientation
$\boldsymbol {e}_{{s}}$, corresponding to the
$\boldsymbol {e}_{z^{\prime }}$ and
$\boldsymbol {e}_{x^{\prime }}$ directions;
$\bar {B}_{{R}}$ enables its rotation about the
$\boldsymbol {e}_y$ axis;
$\bar {B}_{{S}}$ introduces a stresslet flow. We bound the strength of the surface actuations as
$\bar {B}_{i} \in [-\bar {B}^{{max}}_{i},\bar {B}^{{max}}_{i}]$,
$i={F},{L},{R},{S}$. These modes will be described in their dimensionless form below.
We choose $A$ and
$4 \bar {B}^{{max}}_{{F}}/A^3$ as the characteristic length and velocity, respectively. To ease the calculation, we define the dimensionless displacement
$\boldsymbol {r}=\boldsymbol {r}_{{p}} - \boldsymbol {r}_{{s}}$ between the predator and prey, which remains on the
$y=0$ plane and can be described by its magnitude
$r = |\boldsymbol {r}|$ and the angle
$\theta$ between
$\boldsymbol {r}$ and
$\boldsymbol {e}_z$ as
$\boldsymbol {r} = r( \sin \theta \,\boldsymbol {e}_x + \cos \theta \,\boldsymbol {e}_z )$. The dimensionless slip velocity
$\boldsymbol {u}_{{s}}(\theta ^{\prime },\phi ^{\prime })$ on the surface of the squirmer in its own reference frame
$\boldsymbol {e}_{x^{\prime } y^{\prime } z^{\prime }}$ reads (Pak & Lauga Reference Pak and Lauga2014)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn1.png?pub-status=live)
We will seek the best time sequences of the predator's surface actuations $[\bar {B}_{{F}}, \bar {B}_{{L}}, \bar {B}_{{R}}, \bar {B}_{{S}}] (\bar {t})$ leading to the optimal predation. A standard optimum goal is to minimize the predating time
$\bar {T}$. We call such an optimization the time-optimal (TO) optimization. In addition to the capture time, we are also concerned about and will optimize the predatory energy efficiency
$\eta$ defined as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn2.png?pub-status=live)
where $\mu$ is the dynamic viscosity of the fluid,
$\bar {d}_0 = |\bar {\boldsymbol {r}}(\bar {t}=0)|-A-\mathcal {A}$ denotes the initial surface-to-surface distance between prey and squirmer, and
$\bar {T}$ and
$\bar {E}$ represent the time and energy used by the predator to capture the prey, respectively. The numerator of (2.2) indicates the energy consumption for dragging a dead predator over a distance
$\bar {d}_0$ to reach the prey within the capture time
$\bar {T}$. Using
$4{\rm \pi} \mu \bar {B}^{{max}}_{{F}}/3A$ as the characteristic energy scale, we write
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn3.png?pub-status=live)
where $E=\int _0^T P(t)\,{\rm d} t$, and
$P$ denotes the dimensionless power consumption of the squirmer. Accordingly, maximizing the energy efficiency
$\eta$ is termed the efficiency-optimal (EO) optimization problem.
2.2. Optimal control for a point prey
We adopt a numerical optimal control approach when the prey is modelled by a point tracer, as described here. The point prey is advected passively by the flow, hence the translational and rotational velocities of the predator and the translational velocity of the prey in dimensionless form can be derived as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn4.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn5.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn6.png?pub-status=live)
Also, the dimensionless power consumption of the squirmer reads
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn7.png?pub-status=live)
Since $d_0 = |\boldsymbol {r}(t=0)| - 1 -\chi = |\boldsymbol {r}(t=0)| - 1$ (realizing that
$\chi =\mathcal {A}/A=0$ for a point prey) does not change over time, maximizing the predating efficiency
$\eta$ is equivalent to minimizing the product of time
$T$ and energy
$E$. The optimal control problem for a point prey becomes: given an initial relative displacement
$\boldsymbol {r}(t=0) = \boldsymbol {r}_{0}=r_0 ( \sin \theta _0\,\boldsymbol {e}_x + \cos \theta _0\,\boldsymbol {e}_z )$ with
$|r_0|>1$ between the predator and the prey, the squirming predator will capture the prey at time
$t=T$ when
$|\boldsymbol {r}(t=T)|-1 \leq \varepsilon$; that is, the point prey is within a small cut-off distance
$\varepsilon \ll 1$ from the predator's surface. The small parameter
$\varepsilon$ is introduced here for theoretical convenience: when the prey moves very close to the squirmer's surface,
$|\boldsymbol {r}|\rightarrow 1$, the relative velocity between them will approach zero because our squirmer adopts only the tangential but no radial surface actuation; hence the prey will never exactly touch the squirmer's surface mathematically. In real situations, when they are sufficiently close, other physical ingredients would come into play; for example, diffusion via Brownian motion would allow them to touch each other (Jabbarzadeh & Fu Reference Jabbarzadeh and Fu2018). In this work, we will use a fixed value
$\varepsilon =4\times 10^{-3}$ unless otherwise specified. We have checked that varying
$\varepsilon$ in the range
$[10^{-3},10^{-1}]$ would not alter the optimal chasing paths qualitatively, though the capture time will increase with decreasing
$\varepsilon$ as anticipated. Without losing generality, initially the predator is oriented in the
$\boldsymbol {e}_z$ direction, i.e.
$\alpha =0$ when
$t=0$. This predatory process corresponds to the evolution of
$\boldsymbol {r}$ described by
$r$ and
$\theta$. Using
${{\rm d} \boldsymbol {r}}/{{\rm d} t} = {{\rm d} \boldsymbol {r}_{{p}}}/{{\rm d} t}-{{\rm d} \boldsymbol {r}_{{s}}}/{{\rm d} t}$ and (2.4), we obtain the dynamical system characterizing the predatory process:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn8.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn9.png?pub-status=live)
We will seek the optimal sequences of the bounded actuation modes $[B_{{F}},B_{{L}},B_{{R}},B_{{S}}](t)$ to minimize the capture time
$t=T$ or to maximize the predating efficiency
$\eta$ with these modes subject to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn10.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn11.png?pub-status=live)
Unless otherwise specified, $B^{{max}}_{{L}} = B^{{max}}_{{R}} =B^{{max}}_{{S}} = 1$. This optimal control problem is solved numerically by an open source library ‘FALCON.m’ (Rieck et al. Reference Rieck, Bittner, Grüter, Diepolder and Piprek1999) implemented in MATLAB. The state variables are discretized in time by the trapezoidal collocation method. The nonlinear optimization problem is solved by the built-in open source library IPOPT (Wächter & Biegler Reference Wächter and Biegler2006).
2.3. Reinforcement learning for a point or finite-size prey
Besides considering a passively moving point prey, we will also model the prey as a finite-size spherical particle of a dimensionless radius $\chi = \mathcal {A}/A$ that would interact hydrodynamically with the squirmer. The velocities of predator and prey cannot be derived analytically in the closed form as in (2.4), and thus will be solved numerically. Accordingly, it is inconvenient to use the numerical optimal control approach as for the point prey, and we will instead adopt a deep RL scheme to identify the optimal predatory strategy. Naturally, the RL scheme can also be applied for the point prey model, as we will demonstrate in § 4.
We will extend an extensively validated solver using the boundary integral method (BIM) to emulate the hydrodynamic scenario of a swimming squirmer approaching a spherical prey. Different variants of the solver have been developed to study the microlocomotion inside a tube (Zhu et al. Reference Zhu, Lauga and Brandt2013) or a droplet (Reigh et al. Reference Reigh, Zhu, Gallaire and Lauga2017), dynamics of a particle-encapsulated droplet in shear flow (Zhu & Gallaire Reference Zhu and Gallaire2017), and a sedimenting sphere near a corrugated wall (Kurzthaler et al. Reference Kurzthaler, Zhu, Pahlavan and Stone2020). A brief description of the BIM implementation in its dimensionless form is provided below.
In the spirit of the BIM, we express the dimensionless velocity $\boldsymbol {u}(\boldsymbol {x})$ at position
$\boldsymbol {x}$ everywhere in the domain as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn12.png?pub-status=live)
where $\boldsymbol {q}$ is the the density of the so-called single-layer potential on the surface of the squirmer
$S_{{s}}$ and that of the prey
$S_{\mathrm {p}}$. Here,
$\boldsymbol{\mathsf{G}}(\boldsymbol {x},\boldsymbol {x}^{\prime }) = {\boldsymbol {I}}/{|\boldsymbol {x}-\boldsymbol {x}^{\prime }|} + {( \boldsymbol {x}-\boldsymbol {x}^{\prime })( \boldsymbol {x}-\boldsymbol {x}^{\prime })}/{|\boldsymbol {x}-\boldsymbol {x}^{\prime }|^3}$ is the free-space Green's function, which is also known as the Stokeslet. Both the squirmer and finite-size prey are spherical, which can be discretized by zero-order quadrilateral elements. For either of the two, the hydrodynamic force and torque exerted on it are zero. This condition is used to determine their translational and rotational velocities.
Having introduced the BIM implementation, we now describe the RL algorithm. Compared to the optimal control theory that requires the predator–prey dynamics (2.6), RL does not rely on prior knowledge of the dynamics but allows the squirmer as the predating agent to learn the dynamics, and adapt and optimize its chasing strategy (or policy in the language of RL) via interacting continuously with the environment. It is worth noting that RL algorithms have been used recently in similar swimming-involved scenarios, e.g. to optimize the swimming gaits or navigation routes of microswimmers at low Reynolds number (Colabrese et al. Reference Colabrese, Gustavsson, Celani and Biferale2017; Schneider & Stark Reference Schneider and Stark2019; Mirzakhanloo et al. Reference Mirzakhanloo, Esmaeilzadeh and Alam2020; Tsang et al. Reference Tsang, Tong, Nallan and Pak2020; Muiños-Landin et al. Reference Muiños-Landin, Fischer, Holubec and Cichos2021; Nasiri & Liebchen Reference Nasiri and Liebchen2022; Qiu et al. Reference Qiu, Mousavi, Gustavsson, Xu, Mehlig and Zhao2022) and in turbulent flows (Alageshan et al. Reference Alageshan, Verma, Bec and Pandit2020; Qiu et al. Reference Qiu, Huang, Xu and Zhao2020), or macroscopic swimmers such as fish in viscous flows (Gazzola et al. Reference Gazzola, Tchieu, Alexeev, de Brauer and Koumoutsakos2016; Verma, Novati & Koumoutsakos Reference Verma, Novati and Koumoutsakos2018) or in the potential flow (Jiao et al. Reference Jiao, Ling, Heydari, Kanso, Heess and Merel2021). In particular, recent pioneering experiments (Muiños-Landin et al. Reference Muiños-Landin, Fischer, Holubec and Cichos2021) have demonstrated using RL for real-time navigation of micron-sized thermophoretic particles, opening a new horizon for developing swimming microrobots endowed with artificial intelligence.
In this work, we adopt the open-source deep RL framework ‘Tensorforce’ (Kuhnle, Schaarschmidt & Fricke Reference Kuhnle, Schaarschmidt and Fricke2017) and use a policy-based RL scheme – proximal policy optimization (PPO) (Schulman et al. Reference Schulman, Wolski, Dhariwal, Radford and Klimov2017) – to train the agent. The general idea behind the policy-based RL methods consists in parametrizing the policy function ${\rm \pi} _{\varTheta }$ by an artificial neural network (ANN) with a set of weights
$\varTheta$. The agent equipped with the parametrized policy identifies certain characteristic information, the state
$s$ of the environment, as the input to the ANN, and then selects an action
$a$ according to the ANN's output. For the predator–prey system considered here, the state that the predator can observe is the position of the prey relative to itself (
$r$ and
$\theta$), and the actions of the agent are the bounded squirming modes
$B_i$. The selected action advances the environment from the current to the next state, and its effectiveness is quantified by an instantaneous reward
$R$. An appropriate reward function will favour the actions allowing the predator to approach the prey.
To achieve the TO predation, we choose the reward function
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn13.png?pub-status=live)
where $t$ is equivalent effectively to the capture time
$T$, and
$\beta _{{T}}$ is the lower-bound estimation of
$T$. Here,
$-r$ promotes the predator to take only the necessary actions to approach the prey because any unnecessary ones will decrease the accumulated reward. The term
$\varGamma$ contributes in two ways: first, it penalizes the agent for wandering far from the prey (
$r > 4$) by activating a substantial negative reward; second, it stimulates the predator to expedite the capture by offering a positive reward
$\propto 1/( t-\beta _{{T}})$ upon a successful capture. Note that we choose
$\beta _{{T}} \approx d_0$ as the initial surface-to-surface distance between predator and prey. For the EO setting, the reward function is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn14.png?pub-status=live)
where $\hat {\alpha }$ is a positive weight introduced to reduce the instantaneous power consumption
$P$, and
$\hat {\alpha } = 0.1$ is chosen here. Also,
$\beta _{{E}}$ is a lower-bound estimation of
$ET$, chosen as half the value for a squirmer with
$B_{{F}}=1$ travelling a distance of
$d_0$.
Having defined the reward function, we describe the training process. The objective of RL here is to equip the agent with the optimal policy maximizing the expected cumulative reward. After the agent executes the current ($k$th) policy
${\rm \pi} _{\varTheta _k}$, we collect a set of trajectories
$\{\nu _i\}$ (a trajectory is a sequence of states and actions,
$\nu = \{s_0, a_0,\ldots, s_n, a_n,\ldots \}$) to determine the new (
$(k+1)$th) policy
${\rm \pi} _{\varTheta _{k+1}}$ via
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn15.png?pub-status=live)
Here, $L_{{clip}}$ is the so-called clipped surrogate advantage (Achiam Reference Achiam2018) that measures the performance of a general policy
${\rm \pi} _{\varTheta }$ relative to the current one
${\rm \pi} _{\varTheta _k}$:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_eqn16.png?pub-status=live)
Here, $\mathbb {E}$ denotes the mathematical expectation, and
$p(\varTheta )= {\rm \pi}_{\varTheta }(a\,|\,s)/{\rm \pi} _{\varTheta _k}(a\,|\,s)$ is the probability ratio, where its numerator and denominator represent the probabilities of taking action
$a$ in state
$s$ at the
${\rm \pi} _{\varTheta }$ and
${\rm \pi} _{\varTheta _{k}}$ policies, respectively;
$\hat {A}^{{\rm \pi} _{\varTheta _k}}(s, a)$ is the advantage function, describing the advantage of choosing a specific action
$a$ in state
$s$ over that of a random choice according to the current policy
${\rm \pi} _{\varTheta _k}$. The function
$\text {clip}[]$ indicates that
$p ( \varTheta )$ is bounded in the range
$[1-\hat {\epsilon },1+\hat {\epsilon }]$ via necessary clipping; here, the hyperparameter
$\hat {\epsilon }$ indicates how much the new policy is allowed to deviate from the current one, and is fixed as
$\hat {\epsilon }=0.2$ in this work.
3. Results: optimal control for a point prey (
$\chi =0$)
We start to investigate a point prey with $\chi =0$. In all cases, initially the squirming predator is aligned with the
$\boldsymbol {e}_z$ axis, i.e.
$\alpha (t=0)=0$; the initial distance between predator and prey is
$|r_0|=3$, and
$d_0=|r_0|-1=2$. We vary the initial bearing of the prey with respect to the predator, namely, the angle
$\theta -\alpha$ at
$t=0$ between the predator–prey displacement vector
$\boldsymbol {r}$ and the squirmer's orientation
$\boldsymbol {e}_{{s}}$ (see figure 1). This initial angle recovers to
$\theta _0$ by realizing
$\alpha (t=0)=0$. Here,
$\theta _0=0^{\circ }$,
$90^{\circ }$ and
$180^{\circ }$ correspond to when the prey is in front, on the right side and in the rear of the predator, respectively.
3.1. A predator combining forward and lateral motions
We first consider a predator swimming forward and laterally, respectively, via the $B_{{F}}$ and
$B_{{L}}$ squirming modes. This combination is denoted ‘
$F+L$’. Without combining them, the predator with either of them alone can swim only vertically (in
$\boldsymbol {e}_z$) or horizontally (in
$\boldsymbol {e}_x$) and thus cannot reach the prey located in an arbitrary bearing
$\theta _0$. Figure 2(a) compares the predator and prey trajectories of the TO and EO cases. The EO predator swims directly towards the prey by attaining the maximum forward movement (
$B_{{F}}=-1$) and zero lateral movement (
$B_{{L}}=0$), which follows an intuitive predatory strategy by taking the straight thus shortest path. In contrast, the TO predator chooses an L-shaped route oriented first in the north-western direction then followed by a sharp
$90^\circ$ turn towards the north-eastern direction. Correspondingly,
$B_{{F}}=-1$ producing the maximum forward motion holds during the whole course, while
$B_{{L}}$ jumps sharply from
$-1$ to
$1$ at the turning time
$t=\tau$. We now discuss the mechanism behind this peculiar strategy. Initially, the predator lags behind the prey by a distance
$3$ in
$\boldsymbol {e}_z$. Both the TO and EO predators adopt
$B_{{F}}=-1$ to maintain the same maximum movement in that direction; the difference in the capture time
$T$ then depends on the prey's velocity
$\tilde {U}^z$ in
$\boldsymbol {e}_z$. Figure 2(b) compares
$\tilde {U}^z$ of the two prey and their accumulative travelling distances
$\tilde {L}^z$ along the
$\boldsymbol {e}_z$ direction. An important observation is that the EO prey's
$\tilde {U}^z$ remains positive, indicating its consistent motion away from the predator, whereas the TO prey's velocity
$\tilde {U}^z$ becomes negative when
$t<\tau$, implying its motion towards the predator. The inset of figure 2(b) depicts the instantaneous flow field around the squirmer right before
$t=\tau$, reflecting the negative velocity
$\tilde {U}^z$ experienced by the prey (red dot). To sum up, the EO predator swimming straight towards the prey generates a flow field that always repels the prey away from it; while the TO predator adjusts its position (with respect to the prey) and surface actuation for best exploiting its disturbance flow field to attract the prey, leading to the initially left-upward movement. In addition to the
$\theta =0^{\circ }$ orientation, similar trajectories of the TO and EO predators are found for an arbitrary orientation
$\theta _0 \neq 45^{\circ }$ of the prey, as shown in figure 2(e).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_fig2.png?pub-status=live)
Figure 2. Time-optimal (TO) and efficiency-optimal (EO) predating strategies using combined forward and lateral squirming modes of magnitudes $B_{{F}}$ and
$B_{{L}}$, respectively. (a) Trajectories of the predator (blue) and point prey (red), where the latter is initially ahead of the former (
$\theta _0=0^{\circ }$); circles and stars denote their initial and ending positions. The bottom images show how the two modes evolve in time. The left- and right-hand images depict the TO and EO strategies, respectively. (b) The prey’s velocity
$\tilde {U}^z$ and accumulative travelling distance
$\tilde {L}^z$ in
$\boldsymbol {e}_z$;
$\tau$ indicates when the TO predator takes the sharp turn. The inset illustrates its disturbance flow field in the
$y=0$ plane right before
$t=\tau$. (c) Capture time
$T$ and predating efficiency
$\eta$ of the TO and EO predators versus the initial bearing
$\theta _0$ of the prey with respect to the predator. (d) Similar to (c), but for the predator’s consumed energy
$E$ and travelling distance
$L$. (e) The TO and EO predating behaviours when
$\theta _0 = 30^{\circ }$ and
$45^{\circ }$.
We then examine how the initial orientation $\theta _0\in [0,90]^{\circ }$ of the prey with respect to the predator affects the predating dynamics; the results for
$\theta _0\in [90,180]^{\circ }$ are not shown, by realizing the fore–aft symmetry. We depict the capture time
$T$ and efficiency
$\eta$ of the TO and EO strategies in figure 2(c), and the predator's energy consumption
$E$ and travelling distance
$L$ in figure 2(d). All the quantities exhibit a mirror symmetry about
$\theta _0=45^{\circ }$ when the prey is right in the north-eastern direction. In this particular configuration, the TO and EO predators adopt the same strategy – swimming straight towards the prey, as shown by figure 2(e). This symmetry can be anticipated because the forward
$B_{{F}}$ or lateral
$B_{{L}}$ mode alone allows the predator to approach the prey at
$\theta _0 = 0^{\circ }$ or
$90^{\circ }$, respectively. For the two modes sharing the same magnitude
$B^{{max}}_{{F}}=B^{{max}}_{{L}}=1$, the predatory scenario for the
$90-\theta _0$ orientation can be obtained by interchanging the time sequences of
$B_{{F}}$ and
$B_{{L}}$ for the
$\theta _0$ counterpart. In addition, we see that the EO predators achieve an optimal efficiency of
$\eta _{{eo}}\approx 0.15$ independent of
$\theta _0$, which almost doubles that of the TO predator when
$\theta _0=0^{\circ }$ and
$90^{\circ }$. In contrast, the TO predator captures the prey slightly faster than the EO counterpart, reducing the capture time at most by around
$0.2$, which occurs when the prey is ahead of or beside the predator. Both the TO and EO predators catch their prey fastest that are initially in the north-eastern direction (
$\theta _0=45^{\circ }$), and take more time when they deviate from that orientation. Also, compared to the EO predator, the TO predator consumes more energy.
3.2. The stresslet squirming mode facilitates predation
Having observed how the predator exploits its squirming-induced disturbance flow to catch the prey faster, we then examine the influence of the stresslet mode $B_{{S}}$ known to vary the disturbance flow without affecting the swimming speed of an isolated squirmer (Blake Reference Blake1971). According to our definition (with a sign difference compared to classical definitions),
$B_{{S}}<0$ corresponds to a puller microswimmer, e.g. the biflagellated algae Chlamydomonas;
$B_{{S}}>0$ indicates a pusher counterpart exemplified by most flagellated microorganisms. The puller attracts the fluid from its front and rear towards itself, while the pusher drives the flow oppositely. Compared to the baseline ‘
$F+L$’ predator using only the
$B_{{F}}$ and
$B_{{L}}$ modes, we show in figure 3 that introducing the stresslet mode
$B_{{S}}$ can enhance significantly the all-round predatory performance under the EO policy. Figure 3(a) shows that the ‘
$F+L+S$’ (with
$B_{{F}}$,
$B_{{L}}$ and
$B_{{S}}$ modes) predator captures the prey faster than the ‘
$F+L$’ (with
$B_{{F}}$ and
$B_{{L}}$ modes) competitor for all the bearings of the prey
$\theta _0$, except for when
$\theta _0=45^{\circ }$. Moreover, incorporating the stresslet enhances considerably the predatory efficiency
$\eta _{{eo}}$ of the EO predators, with a maximum relative enhancement approaching
$70\,\%$ when
$\theta _0=0^{\circ }$, as well as decreasing its energy consumption
$E_{{eo}}$ and travelling distance
$L_{{eo}}$ for all the prey's initial bearings (see figure 3b).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_fig3.png?pub-status=live)
Figure 3. Adding a stresslet mode $B_{{S}}$ on top of the EO predator with
$B_{{F}}$ and
$B_{{L}}$ modes enhances its all-round performance. (a) Capturetime
$T_{{eo}}$ and predatory efficiency
$\eta _{{eo}}$ versus the initial bearing
$\theta _0$ of the prey. (b) Similar to (a), but for the predator’senergy consumption
$E_{{eo}}$ and travelling distance
$L_{{eo}}$. (c) Trajectories of the EO predator and prey with (left-hand image) and without (right-hand image) the
$B_{{S}}$ mode; the bottom images show the time sequences of the squirming modes. (d) The prey's velocity
$\tilde {U}_{{eo}}^z$ and accumulative travelling distance
$\tilde {L}_{{eo}}^z$ in
$\boldsymbol {e}_z$, with and without the
$B_{{S}}$ mode. The inset indicates the flow field when the
$\tilde {U}_{{eo}}^z$ reaches the minimum.
In particular, we examine in figure 3(c) the straight trajectories of EO predators with and without the stresslet mode, together with those of their respective prey. We observe that the prey has moved from the stresslet-equipped predator by a negligible distance compared to that chased by the stresslet-free one. The former predator with a negative $B_{{S}}$ mode (shown in the left bottom image of figure 3c) behaves as a puller swimmer, which sucks the prey ahead towards it significantly. The associated sucking disturbance flow generated by the predator is depicted in the inset of figure 3(d). This mechanism of stresslet-accelerated predatory process is also confirmed by the evident backward motion – negative
$\tilde {U}_{{eo}}^z$ and
$\tilde {L}_{{eo}}^z$ – of the prey shown in figure 3(d). In retrospect, it was found analogously that biflagellated organisms can enhance their feeding performance by adopting a puller-style locomotory gait (Dölger et al. Reference Dölger, Nielsen, Kiørboe and Andersen2017). As expected, when the prey is initially on the right side of the predator (
$\theta _0=90^{\circ }$), the latter would activate a positive
$B_{{S}}$ mode; accordingly, the prey is attracted laterally towards to a pusher-style swimmer, which is not shown here.
Intuitively, we infer that a puller predator can exploit such a stresslet disturbance flow to accelerate capturing the prey. Hence the TO predator would naively turn on the full gear of $B_{{S}}$ mode for the fastest capture. This intuition is confirmed by figure 4(a), showing that
$T_{{to}}$ decreases monotonically with increasing
$B^{{max}}_{{S}}$ when
$\theta _0 = 0^{\circ }$ and
$60^{\circ }$. On the other hand, the growing stresslet will produce higher power consumption of the predation due to the stronger viscous dissipation of the fluid. Figure 4(b) depicts that the energy consumption
$E_{{to}}$ decreases weakly with
$B^{{max}}_{{S}}$ when
$B^{{max}}_{{S}} <1$, but increases sharply with
$B^{{max}}_{{S}} >1$. The slightly negative relation between
$E_{{to}}$ and
$B^{{max}}_{{S}}<1$ is due to: first, in this regime, the major power consumption is not from the stresslet flow but from the forward and lateral motions of squirmer; second, the decreasing time
$T_{{to}}$ in this regime with
$B^{{max}}_{{S}}$ tends to lower the total energy. When
$B^{{max}}_{{S}}$ keeps growing from around
$1$, the stresslet-induced power becomes increasingly dominant, because the swimming power scales with these modes quadratically, while the forward (
$B_{{F}}$) and lateral (
$B_{{L}}$) modes are bounded to
$1$.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_fig4.png?pub-status=live)
Figure 4. Varying the maximum magnitude $B^{{max}}_{{S}}$ of the stresslet mode for the TO predator that swims using the ‘
$F+L+S$’ combination of squirming modes. (a) Capture time
$T_{{to}}$ versus
$B^{{max}}_{{S}}$ for two initial orientations,
$\theta _0=0^{\circ }$ and
$\theta _0=60^{\circ }$, of the prey with respect to the predator. (b) Energy consumption
$E_{{to}}$ and travelling distance
$L_{{to}}$ versus
$B^{{max}}_{{S}}$. (c) Similar to (a), but for the predating efficiency
$\eta _{{to}}$. (d) Trajectories of the predator and prey for
$B^{{max}}_{{S}}=0.1$,
$0.75$ and
$2$ marked in (c).
Next, we show in figure 4(c) that the predating efficiency $\eta _{{to}}$ depends non-monotonically on
$B^{{max}}_{{S}}$, attaining the maximum when
$B^{{max}}_{{S}}$ is approximately 0.7–0.8. This non-monotonic dependence can be explained by the variation of
$T_{{to}}$ and
$E_{{to}}$ according to
$B^{{max}}_{{S}}$. We then examine in figure 4(d) three characteristic chasing scenarios for the
$\theta _0=0^{\circ }$ situation. In the case
$B^{{max}}_{{S}}=0.1$, the predator first approaches the prey straight forward. Then it takes a two-fold zigzag path as a reminiscent of the TO predatory strategy in the absence of
$B_{{S}}$ mode shown in figure 2(a). The initial straight chasing reflects the predator's tendency to utilize the
$B_{{S}}$-induced flow for sucking the prey. Increasing
$B^{{max}}_{{S}}$ to
$0.75$ results in the optimal efficiency
$\eta _{{to}} \approx 0.23$ that exceeds double the efficiency of the
$B^{{max}}_{{S}}=0.1$ and
$5$ predators. The suction flow of this
$B^{{max}}_{{S}}$ level is strong enough to overcome the forward flow generated by the
$B_{{F}}$ mode. Hence initially the prey moves backward towards the predator, and then it moves forward as their distance is decreased. When
$B^{{max}}_{{S}}$ grows to
$2$, the stresslet-induced suction completely dominates the forward flow, hence enabling the prey to move continuously backward until being captured.
3.3. Incorporating rotation or using only translations?
In the above scenarios, the predator with a zero rotational mode $B_{{R}}=0$ does not rotate, hence its orientation remains in the
$\boldsymbol {e}_z$ direction. To capture a prey at an arbitrary orientation, the predator must activate both the forward and lateral translational modes. However, if allowed to rotate freely by adopting a non-zero
$B_{{R}}$ mode, the predator needs only one translational mode. We then ask how such a combined rotational and translational mode compares with the combination of pure translational modes in the performance of TO predation. We show in figure 5(a) the minimal capture time
$T_{{to}}$, and in figure 5(b) the corresponding predatory efficiency
$\eta _{{to}}$ of the predator using three combinations of squirming modes: (1) forward plus rotational, ‘
$F+R$’; (2) forward plus lateral, ‘
$F+L$’; and (3) forward plus lateral plus rotational, ‘
$F+L+R$’. The minimal capture time
$T_{{to}}$ of these three combinations diminishes in order regardless of the prey's orientation
$\theta _0$ with respect to the predator. Also, the ‘
$F+L+R$’ combination outperforms the other two in efficiency for most of the range of
$\theta _0$. Moreover, for the ‘
$F+R$’ combination,
$T_{{to}}$ (resp.
$\eta _{{to}}$) increases (resp. decreases) with
$\theta _0$ monotonically. This trend is in stark contrast to the symmetric (about
$\theta _0=45^{\circ }$) profiles of
$T_{{to}}(\theta _0)$ and
$\eta _{{to}}(\theta _0)$ for the ‘
$F+L$’ and ‘
$F+L+R$’ combinations.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_fig5.png?pub-status=live)
Figure 5. Effects of the rotational mode $B_{{R}}$ of the squirmer on its TO predatory performance. (a) Capture time
$T_{{to}}$, and(b) predating efficiency
$\eta _{{to}}$, are plotted versus the initial orientation
$\theta _0$ of the prey with respect to the predator. Three combinations of squirming modes, ‘
$F+R$’, ‘
$F+L$’ and ‘
$F+L+R$’, are adopted. Chasing dynamics of the ‘
$F+R$’ mode are illustrated for (c)
$\theta _0=0^{\circ }$, (d)
$\theta _0=45^{\circ }$, and (e)
$\theta _0=90^{\circ }$; those of the ‘
$F+L+R$’ case are shown for (f)
$\theta _0=0^{\circ }$,and (g)
$\theta _0=90^{\circ }$. The insets plot the time evolutions of the modes.
We then analyse the detailed chasing dynamics for a better understanding. We first illustrate in figures 5(c–e) how the ‘$F+R$’ predator chases its prey oriented at
$\theta _0 = 0^{\circ }$,
$45^{\circ }$ and
$90^{\circ }$. Intuitively, to reach an arbitrarily oriented prey, the predator using one rather than two translational modes has to rotate to align its swimming direction exactly towards the prey. For a special case when the prey with
$\theta _0=0^{\circ }$ is initially ahead of the predator, no rotational motion is required for the predator, as shown in figure 5(c). When
$\theta _0=45^{\circ }$, the predator adopts a rotational mode
$B_{{R}}=1$ in full gear during
$t\in [0,\tau _1]$, and then switches it off to swim straight towards the prey during
$t>\tau _1$. It is revealed that for a non-zero
$\theta _0$, the capture time comprises two parts: the first for rotational orientation and the second for straight swimming. The first part for orientation clearly increases with the initial angular difference
$\theta _0$, which explains the monotonically increasing capture time
$T_{{to}}$ with
$\theta _0$. A less intuitive scenario occurs for
$\theta _0=90^{\circ }$, when the prey is initially on the predator's right side: first, the predator moves backward and rotates rightward simultaneously during
$t<\tau _1$, with both modes in full gear; then it stops the backward translation, while maintaining the full right rudder until
$t=\tau _2$, when it exactly faces the prey; finally, the predator swims straight forward to the prey. The initial backward movement of predator seems awkward, which tends to retard predating by lengthening the predator–prey distance at first glance. In fact, this seemingly awkward strategy embodies the wisdom of retreating in order to advance. Moving backward actually reduces the angle
$\theta -\alpha$ (i.e.
$\theta _0$ at
$t=0$ shown in figure 1b) between the predator–prey displacement and the predator's orientation, thus decreasing the time needed for orientation to achieve effectively a net time saving.
As discussed above, replacing the lateral mode by the rotational mode, i.e. shifting from the ‘$F+L$’ to ‘
$F+R$’ combination, results in asymmetric distributions of
$T_{{to}}(\theta _0)$ and
$\eta _{{to}}(\theta _0)$ about
$\theta _0=45^{\circ }$. As reflected by the distinctive chasing dynamics for
$\theta _0=0^{\circ }$,
$\theta _0=45^{\circ }$ and
$90^{\circ }$, this asymmetry is caused by the predator's necessity for a rotational orientation to face the prey. Hence we postulate that the ‘
$F+L+R$’ squirming predator might exhibit similar asymmetric profiles of
$T_{{to}}(\theta _0)$ and
$\eta _{{to}}(\theta _0)$ owing to the rotational mode at play. In reality, this postulation is disproved by the symmetric profiles shown in figures 5(a,b), which can be elucidated by scrutinizing how the ‘
$F+L+R$’ predator chases its prey initially at
$\theta _0=0^{\circ }$ and
$90^{\circ }$ as depicted by figures 5(f) and 5(g), respectively. The set of trajectories of the predator and prey for
$\theta _0=0^{\circ }$ matches that for
$\theta _0=90^{\circ }$ in shape, and applying a
$90^{\circ }$ rotational transformation allows them to overlap each other. In contrast to the ‘
$F+R$’ predator rotating to exactly face the prey when
$\theta -\alpha =0^{\circ }$, this ‘
$F+L+R$’ predator also orients itself but instead to face the prey sitting in its north-west (
$\theta -\alpha =-45^{\circ }$) or north-east (
$\theta -\alpha =45^{\circ }$) direction before swimming straight towards the prey. This particular magnitude of
$45^{\circ }$ enables the predator to exploit its maximum translational speed,
$\sqrt {(B^{{max}}_{{F}})^2+(B^{{max}}_{{L}})^2}=\sqrt {2}$, to reach the prey in the post-rotation period
$t>\tau _2$. This maximum translational speed exploited by the ‘
$F+L+R$’ predator leads to its faster predation compared to the ‘
$F+R$’ and ‘
$F+L$’ counterparts, as shown in figure 5(a). Indeed, the latter two can translate at a maximum velocity of
$1$ rather than
$\sqrt {2}$. We comment that the difference between the initial bearing
$\theta _0=0^{\circ }$ and
$90^{\circ }$ explains the clockwise and counter-clockwise rotations of the predator, resulting in
$\theta -\alpha =-45^{\circ }$ and
$45^{\circ }$, respectively. This reasoning also justifies the
$90^{\circ }$ rotational mapping between the two sets of trajectories associated with the two bearings.
4. Results: RL for a point prey and a finite-size prey
4.1. RL-based optimization in the case of a point prey
To substantiate our study, we adopt an RL scheme to seek the optimal predatory strategies for a predator limited to the ‘$F+L$’ squirming modes. Before studying a finite-size prey (
$\chi >0$), we first address the scenario of a point prey (
$\chi =0$), where the optimal solutions based on the optimal control approach (see figure 2) can be regarded as the globally optimal solutions for benchmarking. As shown in figure 6, both the minimal capture time
$T_{{to}}$ and maximum efficiency
$\eta _{{eo}}$ obtained by RL agree well with their counterparts by optimal control. Also as expected, a close inspection of figure 6 shows that RL performs slightly worse than the optimal control, which implies further that the latter has indeed provided the globally optimal solutions. Comparing further in figure 7 the learned trajectories of the TO predator and its prey to their counterparts based on optimal control, we observe that the RL-trained TO predator learns to execute a two-fold zigzag path identified by the optimal control approach. In particular, when
$\theta _0=0^{\circ }$, the trajectories obtained from the two approaches almost collapse on each other; the sharp turn in the trajectory and the associated steep jump in the lateral model (action)
$B_{{L}}$ are captured quantitatively by RL, as shown in figures 7(a,d). However, as the initial bearing
$\theta _0$ increases to
$15^{\circ }$ and
$30^{\circ }$, the RL solutions can reproduce the two-fold path only qualitatively, but fail to capture its sharp turn or the sudden jump of the swimming action (see figures 7(b,c,e,f). In fact, the degrading performance of RL at a larger bearing angle
$\theta _0 < 45^{\circ }$ can be rationalized. For a two-fold path identified by optimal control, we define its two-foldedness
$\tau /T$ as the time
$\tau$ when the predator turns sharply scaled by the capture time. The foldedness decreases monotonically with
$\theta _0\in [0, 45]^{\circ }$, becoming zero at
$\theta _0=45^{\circ }$ corresponding to a straight chasing path (see the inset of figure 6b). This trend implies that time saving gained by executing a two-fold path diminishes with increasing
$\theta _0$; or, in other words, the extra time required by executing the straight path as a sub-optimal solution instead of the globally optimal version decreases with growing
$\theta _0<45^{\circ }$. Hence at a sufficiently large
$\theta _0$ featuring a negligible difference in the capture time between the sub-optimal and globally optimal strategies, it becomes challenging for the RL algorithm to pinpoint exactly the globally optimal one.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_fig6.png?pub-status=live)
Figure 6. Using RL to obtain the minimal capture time $T_{{to}}$ and maximum efficiency
$\eta _{{eo}}$ of an ‘
$F+L$’ predator chasing a point prey. The results are compared against those based on the optimal control approach.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_fig7.png?pub-status=live)
Figure 7. Trajectories of the TO predators (blue) and their point prey (red) with initial bearing (a) $\theta =0^{\circ }$, (b)
$\theta =15^{\circ }$, and (c)
$\theta =30^{\circ }$, based on optimal control (curves) and RL (symbols). The time evolutions of the lateral mode (action)
$B_{{L}}$ of the predators are shown in (d), (e) and (f), respectively. The forward mode
$B_{{F}} \approxeq -1$ as in figure 2(a) for both methods, and is thus not shown here. The inset of (b) shows
$\tau /T$ versus
$\theta _0$ of the optimal control solutions.
Returning to figure 7, strictly speaking, we will not regard our RL-trained strategies as globally optimal when $\theta _0=15^{\circ }$ and
$30^{\circ }$. On the other hand, they indeed capture the essential features – two-fold path – of the globally optimal solutions. This promising observation thus motivates us to employ RL to optimize the predating strategy of a squirmer chasing a finite-size prey when the globally optimal solutions are not available, as we will present in § 4.2. We add two comments before proceeding. First, despite the RL solutions deviating from the globally optimal ones at increasing
$\theta _0<45^{\circ }$, when
$\theta _0=45^{\circ }$, featuring a globally optimal straight path depicted in figure 2(e), RL can again reproduce exactly this solution. Also, in perfect agreement with the optimal control approach, RL can identify the straight paths reaching the optimal efficiency regardless of the initial bearing of the prey. These optimal straight paths are not shown here. Second, we have realized the crucial role of
$\varGamma$ (in (2.9)) that represents a positive reward upon the capture time. Without this reward, the RL approach results in straight chasing trajectories regardless of the initial orientation
$\theta _0$ of the prey, evidently being trapped in locally optima.
4.2. RL-based optimization in the case of a finite-size prey (
$\chi >0$)
Here, we use RL to optimize the predatory strategies of an ‘$F+L$’ squirmer for capturing a finite-size spherical prey of dimensionless radius
$\chi > 0$. For the BIM-based RL environment, we use a cut-off distance
$\epsilon =5\times 10^{-2}$ larger than the previous value
$4\times 10^{-3}$ due to the degrading accuracy of BIM as involving two sufficiently close surfaces. The finite-size effect of the prey does not change the typical two-foldedness of the TO chasing path, as exemplified by a
$\chi =0.25$ prey initially at
$\theta =0^{\circ }$ and
$30^{\circ }$ (see figures 8(a) and 8(b), respectively). By examining further in figure 8(c) the
$\chi$-dependent TO predatory trajectory for
$\theta _0=15^{\circ }$, we observe its decreasing two-foldedness in the case of a larger prey. For a sufficiently large prey of
$\chi =6$, the TO path becomes straight eventually. We here provide a phenomenological understanding of this change. As discussed in § 3.1, the two-fold path executed by the TO ‘
$F+L$’ predator enables exploiting its propulsion-induced disturbance flow to adjust the advection of the prey. The effect, however, diminishes for a larger prey being more difficult to advect due to its increased hydrodynamic resistance coefficient. This trend will be responsible for the decreasing two-foldedness of the TO predatory path with increasing
$\chi$. Furthermore, we depict in figure 8(d) the dependence of the minimal capture time
$T_{{to}}$ on the prey size
$\chi$. Despite the unchanged symmetric profile of
$T_{{to}}(\theta _0)$ regardless of the varying
$\chi$, the capture time
$T_{{to}}$ declines monotonically with the prey size. This negative relation can be rationalized by realizing that the disturbance flow of the ‘
$F+L$’ squirmer, overall, expels the prey away from it, as evidenced by the prey's path (see figures 7 and 8a,b). Therefore, a larger expelled prey travels a shorter distance, reducing the capture time. We also note that the wide range of chosen
$\chi$ has been motivated by the various realistic scenarios introduced in § 1.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220621235356407-0236:S0022112022004761:S0022112022004761_fig8.png?pub-status=live)
Figure 8. Trajectories of the RL-trained TO ‘$F+L$’ predator and its finite-size spherical prey of radius
$\chi =0.25$ initially located at (a)
$\theta =0^{\circ }$ and (b)
$\theta =30^{\circ }$; the lateral
$B_{{L}}$ and forward
$B_{{F}}$ modes of the squirmer are depicted in the bottom images. The circles (resp. stars) denote the initial (resp. final) positions of the predator and prey. (c) Dependence of the TO trajectory on the prey size
$\chi \in [0, 6]$ when
$\theta _0=15^{\circ }$. (d) Minimal capture time
$T_{{to}}$ versus initial bearing
$\theta _0$ and size
$\chi$ of the prey;
$\chi =0$ corresponds to a point prey studied via both optimal control and RL, whereas the results for a finite-size prey (
$\chi >0$) are obtained by RL. For all cases here, the cut-off distance is
$\varepsilon = 5\times 10^{-2}$, and the initial surface-to-surface distance between predator and prey is
$|d_0| = 2$ as in § 3.
5. Conclusion
In this work, we study, in the creeping flow regime, a swimming predator modelled by a spherical squirmer chasing a non-motile point or finite-size spherical prey advected by the disturbance flow generated by the former. Using optimal control for a point prey and RL for general situations, we optimize the predatory strategies of the squirmer that can translate forward (‘$F$’) and laterally (‘
$L$’), generating a stresslet (‘
$S$’) flow or rotating (‘
$R$’) in the fluid. We have identified the best time sequences of the squirming modes to achieve time-optimal (TO) or efficiency-optimal (EO) predating, in order to minimize the capture time or to maximize the predatory efficiency, respectively.
We first focus on a point prey. The EO ‘$F+L$’ predator swims straight towards the prey regardless of its initial bearing with respect to the predator. In contrast, the TO predator follows an L-shaped route, hence travelling a longer distance than the EO predator. This chasing strategy can be understood by examining how the disturbance flow of the predator advects the prey: the EO predator generates a flow that persistently expels the prey away from it; the TO counterpart has been optimized to adapt its orientation with respect to the prey, such that its disturbance flow can be harnessed to advect the latter towards itself to some extent. This peculiar route may not be revealed easily intuitively. Also, we show that incorporating an additional stresslet mode of magnitude
$B^{{max}}_{{S}}=1$ allows the ‘
$F+L+S$’ EO predator to outperform considerably the ‘
$F+L$’ counterpart in every aspect of the predatory performance; generally, the former captures the prey faster, consumes less energy, travels a shorter distance, and gains a higher predatory efficiency (see figure 3). We recall that for an isolated squirmer, introducing the stresslet mode does not change its speed, but does increase the energy expenditure and decrease its efficiency (Blake Reference Blake1971). For predation, the counterintuitive energy-saving and efficiency enhancement result from the predator's largely reduced capture time and travelling distance. This reduction is most pronounced when the prey is initially ahead of the predator, achieved by utilizing the stresslet flow to suck the prey towards itself. A similar scenario was revealed by Tam & Hosoi (Reference Tam and Hosoi2011) for a biflagellated swimmer that exploits its strokes-induced currents to achieve optimal nutrients uptake. Then we have examined how the maximum magnitude
$B^{{max}}_{{S}}$ of the stresslet mode influences the performance of an ‘
$F+L+S$’ TO predator. Increasing the magnitude reduces the predator's capture time and travelling distance as expected; however, it increases significantly the consumed energy. The competition results in the non-monotonic variation of the predatory efficiency versus
$B^{{max}}_{{S}}$; accordingly, the TO predator attains the highest predatory efficiency at an optimal value
$B^{{max}}_{{S}} \approx 0.7\unicode{x2013}0.8$. In addition, we have also investigated the potential role of rotational motion in the TO predation. Compared to a translating ‘
$F+L$’ predator, the ‘
$F+R$’ counterpart combining the forward translation and rotation spends more time catching the prey, while the ‘
$F+L+R$’ squirmer using two translational motions and rotation seizes the prey faster than the former two compeers. Unlike the ‘
$F+L$’ TO predator following an L-shaped path, the ‘
$F+R$’ compeer first rotates to face the prey exactly and then swims straight to it (see figure 5d). Thus the total capture time comprises two parts – one used for rotational reorientation, and the other for straight chasing. For a prey initially exactly on the right side requiring a considerable rotation, the TO predator has been optimized to adopt a non-intuitive strategy of retreating in order to advance: it first swims backwards, leaving rather than approaching the prey in appearance; this trick reduces effectively the angular difference between its orientation and the prey's bearing, and thus the corresponding time for rotation, leading to a net time saving. We comment that the activated rotational motion of the predator would fail to represent the actual situation due to the unphysical zero power consumption of this motion. Future improvement in modelling is needed to account for the reorientation of the predator.
Besides optimal control, we have used RL to seek the optimal strategies of an ‘$F+L$’ squirmer chasing a point prey or a spherical one of radius
$\chi >0$. For the latter, a BIM solver is developed to emulate the RL environment. For a point prey, we have observed that our RL-based solutions reproduce perfectly the globally optimal EO strategies (featuring straight chasing paths) derived by optimal control, while for the TO counterpart, the former qualitatively (even quantitatively in some cases) agree with the globally optimal ones. In particular, RL can capture the non-intuitive two-foldedness of the TO path. Applying RL to a spherical prey of radius
$\chi >0$, we have identified that the two-foldedness of the TO path decreases with increasing
$\chi$, and the TO path eventually becomes straight at a sufficiently large
$\chi$. We have also shown that the minimal capture time decreases monotonically with
$\chi$ because a larger prey is more difficult to be advected by the predator.
It is worth noting that despite the substantial amount of work applying RL to optimize the locomotory gaits or path planning of different swimmers, to the best of our knowledge, no individual work implies that the RL-trained swimming strategy represents or resembles the globally optimal one. It is indeed well-known that RL is easily trapped to local optima (Liepins & Vose Reference Liepins and Vose1991; Lehman & Stanley Reference Lehman and Stanley2011; Sutton & Barto Reference Sutton and Barto2018). The only exception might be the very recent work Nasiri & Liebchen (Reference Nasiri and Liebchen2022), having used RL to find asymptotically optimal navigating strategies of a point swimmer, which replicate closely the globally optimal solutions identified previously by Daddi-Moussa-Ider, Löwen & Liebchen (Reference Daddi-Moussa-Ider, Löwen and Liebchen2021). Together with Daddi-Moussa-Ider et al. (Reference Daddi-Moussa-Ider, Löwen and Liebchen2021) and Nasiri & Liebchen (Reference Nasiri and Liebchen2022), our work indicates that RL-based optimization of swimming gaits or paths can be trapped to locally-optimal solutions as anticipated, but it can also identify the global optima by using a proper RL implementation.
We finally discuss the squirmer as the model predator. The model is adopted here for its mathematical simplicity and important features such as time-dependent strokes and the resulting disturbance flows, as well as its finite-size effect. This model does not capture the critical merits of certain predatory microflagellates in planktonic environments. The marine microflagellates are mostly sessile predators rather than free swimmers as studied here, and they intercept the prey or nutrient particles by using their flagella, filters or slender tentacles (Fenchel Reference Fenchel1986) instead of chasing directly the prey or particles.
Acknowledgements
We acknowledge the useful discussion with Prof C.J. Ong and support from the developing team of ‘FALCON.m’. We also thank the anonymous reviewer for bringing the literature of planktonic predation to our attention.
Funding
L.Z. thanks the A*Star Advanced Manufacturing and Engineering Young Individual Research Grants, AME-YIRG (A2084c0175), support from BUA-NUS Strategic Research Partnership for Global Health Initiative (R-265-000-A35-133), and the start-up grant (R-265-000-696-133) provided by the National University of Singapore. The computational work for this article was performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg).
Declaration of interests
The authors declare no conflicts of interests.
Author contributions
L.Z. designed the research. L.Z. and G.Z. built the numerical implementation. W.F. and G.Z. performed the research. All authors analysed the results and made the figures. L.Z. and G.Z. wrote the manuscript.