1 Introduction
Multi-agent systems (MASs) are decentralized environments in which agents must communicate and coordinate with their neighbours to achieve tasks. Trust between agents is a vital component to ensure agents interact with competent and cooperative agents to complete tasks to a high standard (Castelfranchi & Falcone Reference Castelfranchi and Falcone1998). We adopt as the definition of trust the expectation an agent has in another to achieve a task satisfactorily (Gambetta Reference Gambetta2000). Methods to assess trust and reputation have been developed for domains including P2P networks (Kamvar et al., Reference Kamvar, Schlosser and Garcia-Molina2003; Tahta et al., Reference Tahta, Sen and Can2015; Xiong & Liu, Reference Xiong and Liu2004), pervasive computing (D’Angelo et al., Reference D’Angelo, Rampone and Palmieri2017; Klusch & Gerber, Reference Klusch and Gerber2002), the internet of things (Chen et al., Reference Chen, Guo and Bao2016) and many more. While existing trust models are often tailored to particular contexts or require domain-dependent information, recent trust models increasingly use machine learning to generalize trust models (Liu et al., Reference Liu, Tredan and Datta2014; Taylor et al., Reference Taylor, Barakat, Miles and Griffiths2018). However, two main challenges remain, namely, how to assign a trust value in the face of no evidence and how to capture and detect dynamic behaviours (Traverso et al., Reference Traverso, Cordero, Nojoumian, Azarderakhsh, Demirel, Habib and Buchmann2017). Some trust and reputation literature states how adapting to changes in agent behaviour is an essential requirement of a trust and reputation model (Fullam et al., Reference Fullam, Klos, Muller, Sabater, Schlosser, Topol, Suzanne Barber, Rosenschein, Vercouter and Voss2005; Salehi-Abari & White, Reference Salehi-Abari and White2012); however, existing techniques used to address this, namely, sliding windows and forgetting factors, are limited in highly dynamic and distributed environments, which we discuss further in Section 2.3.
Examples of context-dependent causes of behaviour change in MAS include smart grid demand shifts from seasonal changes, device faults, malicious behaviour (Hales & Edmonds, Reference Hales and Edmonds2003; Salehi-Abari & White, Reference Salehi-Abari and White2012; Srivasta et al., Reference Srivasta, Xiong and Liu2005) and phenomena such as inflation rate fluctuations which go on to affect behaviour (Harries et al., Reference Harries, Sammut and Horn1998). Some of these changes can be random and unpredictable, whereas others may be cyclical patterns. Additionally, agent behaviours may change as a result of being in a group (Griffiths, Reference Griffiths2006; Nguyen & Bai, Reference Nguyen and Bai2018). Group and coalition formation can improve the scalability and robustness of a network by improving task delegation amongst a smaller set of trusted agents. Traditional static coalition formation algorithms are not appropriate in MAS because of their dynamic, decentralized nature, and therefore trust and reputation have proven important in dynamic coalition formation (Klusch & Gerber, Reference Klusch and Gerber2002). Groups in an MAS impact on trust assessment because agents rely on other members of the group to delegate subtasks to, therefore an individual agent’s reputation may be reflective of group activity. Tasks are completed to a standard dependent on the behaviour of the whole group, with individual members’ contributions unknown. Assigning reputation to agents completing subtasks is considered in group trust literature (Nguyen & Bai, Reference Nguyen and Bai2014) and is out of the scope of our work, and we assume that agents from a group have a similar behaviour. Groups can also cause dynamic behaviour as the relationships between the agents of the group can change due to changes in their motivations, a changing group population or external factors such as a changing environment (Nguyen, Reference Nguyen2017).
In this paper, we propose RaPTaR, a method that sits on top of existing trust and reputation methods to detect and adjust to behaviour changes to supply the underlying trust and reputation algorithm with only relevant data by deleting old instances using a modification of the Adaptive Windowing (AdWin) algorithm. The point of behaviour change is identified by monitoring the window of outcomes from a group of agents using the Kolmogorov–Smirnov (K-S) statistical test. Furthermore, transitions between behaviour changes are recorded to learn patterns in group behaviour for future exploitation. RaPTaR also provides an initial assessment of an agent belonging to a group for any trust model that uses an a priori trust estimate based on both recent behaviour and any learnt behavioural patterns of the group.
RaPTaR offers several contributions to improving trust and reputation assessment in groups. First, RaPTaR allows agents to react to any statistically detected changes without a strong dependence on any prior tuned parameters. RaPTaR has only one input parameter, and we demonstrate RaPTaR is not overly sensitive to this value. This allows agents to adapt to their situation, which in MAS can vary between agents. Second, inspired by the Reactive Proactive (RePro) concept drift algorithm (Wu & Zhu, 2005), agents can learn patterns in behaviour such that any future changes can be predicted. Third, RaPTaR calculates expected behaviour considering all known behaviours weighted by the probability that they are active, which addresses a limitation of RePro which can only predict the most likely behaviour at the time. Fourth, RaPTaR learns how long agents spend acting with a particular behaviour, with the intuition that if the behaviour occurs again, it may have a similar duration, and therefore a change in behaviour can be anticipated. Finally, RaPTaR includes a probability that the agent will switch to an unknown behaviour. We compare our work to sliding windows and forgetting factors, the techniques commonly used by trust and reputation models to account for changes in behaviour. We also evaluate RaPTaR with different trust algorithms, to demonstrate its performance with continuous and discrete inputs.
Our results show that RaPTaR is significantly more robust to dynamic behaviours than existing methods. If a group of agents change their behaviour such that they are no longer the most trustworthy group, RaPTaR adapts quickly. The extent of RaPTaR’s improvement increases as the frequency of change in the ranking of agents increases. When behaviour is static, RaPTaR will perform equally to other methods as they all converge on learning the true behaviour of an agent. However, we show that in general, RaPTaR offers statistically significant improvements over other techniques for managing agent information, for all values of
$\alpha$
in RaPTaR demonstrating.
The remainder of this paper is organized as follows. In Section 2, we describe the related work, including existing trust and reputation models, techniques to manage interaction data in trust and reputation models, and methods for concept drift detection. The RaPTaR algorithm is described in Section 3. We present the evaluation environment in Section 4 and our results in Section 5. Finally, Section 6 concludes the paper.
2 Related work
In this section, we introduce the use of trust and reputation to assess agent behaviour, techniques for coping with dynamic behaviour and mechanisms for detecting concept drift which inspire our approach.
2.1 Trust and reputation
Selecting the most appropriate trust and reputation model depends on what information is available, including representations of outcomes, task decomposition, agent connectivity and social connections amongst others. Additionally, some algorithms require information about the environment either in advance or at run-time, to accurately tune the parameters of the model. RaPTaR is a complimentary method to existing trust and reputation models, which improves their estimates by managing agents’ experience data to adapt to dynamic behaviours. Therefore, we include a brief review of relevant trust and reputation literature to understand the application of RaPTaR, the models we evaluate it with and why they are limited in coping with dynamic behaviours.
An agent assessing a potential interaction partner with a trust algorithm uses their direct past experiences with them to estimate how they will behave in the future (Keung & Griffiths, Reference Griffiths2010). Reputation mechanisms are similar, but additionally collect, aggregate and distribute feedback about agents’ past behaviour from witnesses (Resnick et al., Reference Resnick, Kuwabara, Zeckhauser and Friedman2000). In the remainder of this paper, we refer to trust and reputation interchangeably as we use aggregated witness reports in our trust assessments. Trust and reputation literature is extensive and addresses a variety of problems, which arise in trust and reputation assessment. Biased reputation sources can be statistically altered to account for liars or perception bias using models such as FIRE, TRAVOS and BLADE (Huynh & Jennings, Reference Huynh and Jennings2004; Regan et al., Reference Regan, Poupart and Cohen2006; Teacy et al., Reference Teacy, Patel, Jennings and Luck2006). Similarly, HABIT is a Bayesian inference system, which addresses the need for a general behaviour representation and improves computational efficiency (Teacy et al., Reference Teacy, Luck, Rogers and Jennings2012). Stereotyping techniques improve assessments of newcomers or agents for whom there is no past experience (Burnett et al., Reference Burnett, Norman and Sycara2010; Sensoy et al., Reference Sensoy, Yilmaz and Norman2016). More recently, trust assessment has been viewed as a machine-learning problem with a trust model including a learning component and a predictive component (Lu & Lu, Reference Lu and Lu2017; Taylor et al., Reference Taylor, Barakat, Miles and Griffiths2018).
There are no existing reputation systems that focus on statistically identifying whether witness reports are representative of the target agent’s current behaviour given that it may have changed. The main contribution of RaPTaR is to provide this feature to existing trust and reputation models, and in this paper we select Beta Reputation System (BRS) (Jøsang & Ismail, Reference Jøsang and Ismail2002), Dirichelet Reputation System (DRS) (Josang & Haller, Reference Josang and Haller2007) and TRAVOS (Teacy et al., Reference Teacy, Patel, Jennings and Luck2006) as representative trust mechanisms.
A trustor agent, tr, using BRS to assess a trustee, te, can perceive interaction outcomes as good or bad and collects the sum of good and bad experiences with te,
$r_{te}$
and
$s_{te}$
, respectively. All the past interactions tr has had with trustees are stored in a history,
$\mathcal{O}_{tr}$
. These are the input parameters to a beta probability density function (PDF), whose expected value is the estimate of the trustee’s behaviour, trust(te). An agent can aggregate witness reports from available agents to improve the accuracy of the assessment. For now, we assume that these values are relevant data about the current behaviour of te. The belief,
$b_{te}$
, in te is their expected behaviour based solely on their previous interactions, as calculated below:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn1.png?pub-status=live)
BRS weights older interactions less according to a forgetting factor, described below in Section 2.3. To account for uncertainty in this belief, an a priori, a, is weighted by an uncertainty factor,
$u_{te}$
:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn2.png?pub-status=live)
which decreases as more interaction experiences are collected, signifying a higher confidence in the belief factor. The default value for the a priori, a, is 0.5, which assumes that an agent is neither good nor bad, unless the default can be replaced by the output of another model:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn3.png?pub-status=live)
This is discussed further in Section 2.2. The overall expected behaviour according to BRS, trust(te), is calculated using subjective logic, namely:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn4.png?pub-status=live)
The values of
$r_{te}$
and
$s_{te}$
are an aggregation of good and bad interactions with te from all available trustors known as opinion providers,
$op \in \mathcal{A}_{tr}$
, defined as:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn5.png?pub-status=live)
In BRS, the witness reports are assumed to be honest, and therefore both direct and indirect experiences are weighted equally (Burnett et al., Reference Burnett, Norman and Sycara2010; Jøsang & Ismail, Reference Jøsang and Ismail2002). If it is not possible to assume that reputation providers will be honest or unbiased, then TRAVOS can be used. TRAVOS expands on BRS, by altering witness reports to weight opinions which have statistically proven to align with the trustor’s true experiences in the past. The accuracy of a witness is calculated from tr’s perspective by comparing the beta distribution of the opinion provider’s reports about any te, and the beta distribution of the actual outcomes tr ultimately received. However, this is computationally expensive because it requires collating witness reports from all trustors whenever reputation information is needed.
DRS is a generalization of BRS, which divides interaction outcomes into k possible values. While BRS is a prominent, mathematically rigorous model, the limitation of only two possible outcomes from an interaction can be unrealistic. For example, if
$k=5$
, interactions might fall into one of the following categories: 1-bad, 2-mediocre, 3-average, 4-good or 5-excellent (Josang & Haller, Reference Josang and Haller2007). When tr calculates the reputation of te at time t using DRS, the set of tr’s past interactions with te,
$\mathcal{O}^{te}_{tr}\subset \mathcal{O}_{tr}$
, are each labelled with one of k values to become the input parameters to a multivariate Dirichlet PDF. From a Dirichlet distribution, a reputation value can be calculated using point estimate representation, which is easily computable and human understandable. Generalizing storing interaction outcomes from BRS, agents have an evidence vector,
$\overrightarrow{R}_{te} = (R_{te}(i) | i = 1...k)$
, where
$R_{te}(i)$
indicates the number of interactions from
$\mathcal{O}^{te}_{tr}$
that were rated in the
$i^{th}$
category. Each category i is given a point value,
$v(i) =\frac{i-1}{k-1}$
, such that the values are evenly distributed in the range [0,1]. An overall reputation value is a weighted average calculated as
$reputation(te) = \sum_{i=1}^k v(i)S(i)$
given that:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn6.png?pub-status=live)
where each element,
$S_{te}(i)$
, in the vector
$\overrightarrow{S}_{te}$
describes the multinomial probability reputation of the
$i^{th}$
element given the aggregate reports in
$\overrightarrow{R}_{te}$
, and the value of C weights the importance of the a priori. The literature this work is proposed in, recommend a value of
$C=2$
, but optionally a higher value for C will lower the importance of the a priori (Josang & Haller, Reference Josang and Haller2007). The a priori, a(i), is the prior probability of the
$i^{th}$
category. Without any other information, all k intervals are given an equal probability of the a priori. Stereotype techniques can provide a more informative a priori value,
$ a \in [0,1]$
for an agent, which is substituted into DRS by assigning
$a(i)=1$
to the
$i^{th}$
category that a falls into, i.e.
$\lfloor a \times k\rfloor$
, and all other
$a(i)=0$
:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn7.png?pub-status=live)
A disadvantage of using point estimates is that no standard deviation is calculated, and so an agent who behaves well half the time but badly otherwise, will have an equally good reputation compared to an agent who consistently performs averagely.
2.2 Groups
Trust and reputation models are data-based approaches that may perform poorly when assessing a new agent for which there are no previous interactions. This is a reoccurring problem in dynamic groups where agents may have had past interactions with other agents who were previously in their neighbourhood, but none with agents who they are now connected to. One possible solution is to provide the trust algorithm with an a priori value for behaviour, until the agent can collect direct experiences to learn from Burnett et al., (Reference Burnett, Norman and Sycara2010), Teacy et al., (Reference Teacy, Luck, Rogers and Jennings2012), and Nguyen & Bai (Reference Nguyen and Bai2018). This may be a base value representing average performance, or an inferred value based on interactions with other agents. Grouping agents together by how they appear and behave was initially proposed in the Swift Trust algorithm (Debra et al., Reference Debra, Weick and Kramer1995). This method is known as stereotypes and uses the assumption that agents who are observably similar may act similarly. For example, a decision tree can be trained on past experience data, learning correlations between agents’ observable features and the trust they have in that agent (Burnett et al., Reference Burnett, Norman and Sycara2010). The a priroi is given less precedence as direct experiences become available. HABIT uses the average value of the first interactions with other agents in the group the trustee is associated with. This represents the notion that if previously interacting with an agent in that group for the first time was typically bad, the trustee might also be bad. StereoTrust assigns agents an a priori assessment of their behaviour based on the trust they have in other agents who are observably similar, weighted by the extent of that similarity (Liu et al., Reference Liu, Datta, Rzadca and Lim2009). If agents of a group can be assumed to have the same static behaviour, we extend this to assume their behaviour at any time point is similar, given that it can change.
2.3 Dynamic behaviour
Dynamic behaviour is a challenge to trust assessment because behaviour can change at a variety of speeds and times for the agents in an MAS, rendering traditional methods of forgetting old data at a constant rate ineffective. Despite this limitation, prominent trust and reputation models still rely on techniques such as sliding windows and forgetting factors (Huynh & Jennings, Reference Huynh and Jennings2004; Jøsang & Ismail, 2002; Liang & Shi, Reference Liang and Shi2005; Regan et al., Reference Regan, Poupart and Cohen2006; Teacy et al., Reference Teacy, Patel, Jennings and Luck2006). While other literature mentions the importance of coping with dynamic behaviour, it remains an open challenge (Anders et al., Reference Anders, Seebach, Steghöfer, Reif, André, Hähner, Müller-Schloer and Ungerer2016).
A sliding window of size n retains the most recent n experiences. The intuition is that the most recent n instances capture an agent’s current behaviour, and older interaction records that are no longer relevant are deleted. For a new instance,
$o^t$
, at time t and a window, W, of size n the window is updated as follows.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn8.png?pub-status=live)
A forgetting factor, also referred to as a longevity factor, forgets at a constant rate with all instances being retained and more recent interactions weighted higher. An interaction from time t at the current time, t’, which can be used in trust assessment is weighted as:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn9.png?pub-status=live)
where m is the elapsed time since
$o^t$
was recorded, that is,
$m=t'-t$
. Forgetting factors can be implemented recursively by storing one value which is updated over time instead of storing all values, reweighting and collating them at every time point. This improves on both time and space efficiency as older interactions do not need to be searched or saved. However, not saving old data loses potentially useful information, especially about behaviour change over time.
Unfortunately, several issues impact on the generality of using sliding windows or forgetting factors to aggregate information. First, choosing the window size or the rate of forgetting is problematic. If an agent forgets old instances too quickly, they will lose relevant data that could help them make more accurate calculations. Conversely, if too much information is retained, then agents will make assessments based on data that no longer represents current behaviours. The optimal values will depend on the application. Second, the optimal values for window sizes or forgetting factors can vary over time. For example, behaviours may change frequently for a period and then remain static, and this would require using very recent data initially and then a larger window size. Third, agents may not change their behaviour at the same rates, and so a global window size or forgetting factor will not suit all agents. Finally, sliding windows and forgetting factors are only effective in coping with gradual change rather than sudden changes, which may occur at different times for different agents.
Existing MAS literature outside of the scope of trust and reputation has looked at identifying the best behaviour policy to use from a set of policies, by identifying other agents’ behaviours and responding to that. Part of this task involves identifying when other agents have changed their own policy (Hernandez-Leal et al., Reference Hernandez-Leal, Taylor, Rosman, Enrique Sucar and De Cote2016). One reason we do not compare against policy change literature is in our context, agent behaviours are not a series of static policies that are swapped between, but rather are continuous values that can change gradually or suddenly. Agents need to learn behaviours through outcome rewards and cannot identify discrete action choices of other agents immediately. Identifying agent behaviour changes through analyzing a continuous variable is a different problem to policy detection, and this depends on the information available in the environment (Hernandez-Leal et al., Reference Hernandez-Leal, Zhan, Taylor, Sucar and de Cote2017). Finally, as discussed with all existing techniques on estimating agent behaviour, work on reacting to policy changes also does not discuss the probability of an upcoming change.
In this paper, we propose a method that uses concept drift techniques to identify changes in behaviour. Machine learning models find correlations between a feature set, X, and a class value, Y, typically assuming that the joint probability of the features and the target remains static over time, that is,
$P_t(XY) = P_{t^{\prime}}(XY)$
for all times t and t’ (Webb et al., Reference Webb, Hyde, Cao, Nguyen and Petitjean2016). Concept drift techniques aim to detect when such relationships change over time, meaning
$P_t(Y|X) \neq P_{t^{\prime}}(Y|X)$
(Tsymbal, Reference Tsymbal2004). In stereotyping, agents are grouped based on their observable features, and the correlation between those features and agent behaviour is learned. In this paper, we address the problems that arise from agent behaviour changing over time meaning that the correlation between features and behaviour may change.
A concept is a learned correlation between the features and the target, or class value, which in our context is the trust value. If the correlation between a set of feature values and a class value instantly changes to a new class value, then sudden drift has occurred. A slow progression of one concept becoming less prevalent while another becomes more prevalent is known as gradual drift. Gradual drift is harder to detect, especially if the differences between the two concepts are not empirically large but are important (Hoens et al., Reference Hoens, Polikar and Chawla2012). Sliding windows and forgetting factors will have an optimal size or value to handle only one speed of change; however, agents may change behaviour at different times and rates and a single window size or forgetting factor may be insufficient. The method proposed in this paper handles agent behaviours changing at different speeds or at different times without requiring parameter tuning.
In our approach, we draw upon concept drift techniques that can identify the point of change in the data, allowing the agent to update its interaction history. The Adaptive Windowing (AdWin) model uses a variable window size which expands with new, incoming data if that data is statistically assessed to be drawn from the same distribution as the rest of the data in the window (Bifet & Gavaldà, Reference Bifet and Gavaldà2007). If there is a split of the window such that it is believed the two subsets come from different distributions, then change has been detected and data from the oldest subset is forgotten, which is synonymous with shrinking the window. The method we propose in this paper uses a similar adaptive window as part of its reactive component. Once AdWin deems a concept to have ended, the data from that concept is forgotten forever. RePro retains such information with the aim of improving predictions in an environment where concepts might reoccur (Wu & Zhu, 2005). Once a concept has changed, RePro remembers the concept and records the transition to it from the previous concept. If the concept reoccurs in future, it can be proactively predicted given the concept which came before it, without being relearned from scratch, therefore saving time and improving predictive accuracy. RePro was designed for categorical predictions; however, we present a simple adaptation for continuous predictions. RePro can also only predict one concept at a time, as the proactive mode selects the concept that most frequently follows the current concept and there is no consideration of unknown concepts. The proactive mode requires a trigger, such as detecting a small amount of error, to determine whether a different concept has taken over. Work on detecting concept drift in the error of agent behaviour estimates exists (Hernandez-Leal et al., Reference Hernandez-Leal, Zhan, Taylor, Sucar and de Cote2017); however, in the context of trust and reputation, estimates of behaviour can be highly inaccurate but not affect the ability to choose good partners. We find it more prudent to analyze interaction outcomes for concept drift, which have not been biased by agents’ potentially inaccurate trust estimates, which allows RaPTaR to detect changes effectively regardless of which trust algorithm is being used.
3 Reacting to dynamic agent behaviours and predicting trust and reputation
In this section, we propose a method, RaPTaR (Reacting and Predicting in Trust and Reputation), to detect and adapt to changes in agent behaviour such that trust assessment is as relevant as possible to an agent’s current behaviour. By improving the accuracy of trust assessment and identifying the current best partner to interact with given that agents’ behaviours can change, we aim to improve the average utility agents receive. The following assumptions are made regarding the environment in which agents are situated. First, we assume that agent behaviour can change at any rate or time. Second, we assume that agents may interact with a subset of the population, as determined by an underlying network topology. An agent’s connectivity limits its ability to gather reputation information and its choice of available partners. Third, we assume that agents belong to a group, G, such that members of the group have the same behaviour. If some factor causes an agent to change its behaviour, then we assume that the behaviour changes for all members of the group. Fourth, we assume that agents provide accurate reputation information because they are honest and unbiased. In principle, because RaPTaR is used alongside an underlying trust assessment mechanism, this assumption is mitigated by trust techniques that cope with dishonest agents. In this paper, our focus is on the mechanism for selecting appropriate information on which to assess reputation, rather than addressing dishonesty.
In the remainder of this section, we describe our method that improves trust assessment in two ways. First, statistically detecting changes in recent outcomes from interactions and updating the data used in trust assessment, accordingly enables agents to assess trust appropriately for others’ behaviour. Second, RaPTaR produces an estimate of an agent’s behaviour based on its group by exploiting any learned patterns.
An agent uses RaPTaR to monitor the outcomes of a subset of agents,
$G \subset \mathcal{A}$
, to account for how a group may change its behaviour at different times and speeds compared to other groups. Each identifiable group in the population is monitored separately. Groups may represent some known coalitions but, if groups are not explicitly identifiable, it may be possible to group agents using techniques such as stereotypes (Player & Griffiths, Reference Player and Griffiths2018). Our previous work showed how we can react to dynamic behaviours of stereotypes. In this paper, we build on this idea, to also consider how we can predict behaviour changes. We found in our previous work that the level of noise prevented accurately identifying change; therefore, we aim to reduce the noise to study more complex agent behaviours. Therefore, in this paper, we assume that groups are identifiable. RaPTaR aims to make use of past learned behaviour information after the agent has changed behaviour, which is not something considered by previous methods. An agent stores the outcome of its interactions in a history,
$\mathcal{O}$
, which is divided into subsets representing the outcomes of interactions with agents from each group they have encountered,
$\mathcal{O}_{G} \subset \mathcal{O}$
. RaPTaR can then monitor
$\mathcal{O}_{G}$
for behavioural changes within each group G. In presenting RaPTaR, we use the notation in Table 1.
RaPTaR has a learning component and a predictive component. When the outcome of an interaction is saved, it will either be assumed to come from the same distribution as other recent outcomes or from a different distribution if a change in behaviour is detected. For the learning component, we use a modified version of the AdWin algorithm to detect change and remove irrelevant data. If a change is detected, the behaviours on either side of the change are learned. Agents record how long a behaviour was believed to be active for and which behaviour succeeded it, to learn behavioural patterns and improve predictions of future behaviour changes. Some trust and reputation algorithms will accept an a priori as input to improve trust estimates when there is little experience data. Similar to stereotype models, the predictive component of RaPTaR produces an a priori estimate based on experiences from members of the group who are assumed to behave similarly. RaPTaR uses the adaptive window of recent interactions to estimate the current behaviour, how long it has been active for and possible successor behaviours to assess an overall expected utility. We discuss the learning and predictive components in Sections 3.1 and 3.2, respectively.
Table 1 RaPTaR notation
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_tab1.png?pub-status=live)
3.1 Learning component
The learning component comprises two parts: the first detects changes in agent behaviour using interaction outcomes and the second learns patterns in changes of behaviour.
RaPTaR maintains a window, W, of outcomes from interacting with the agents of a group, G, such that after an interaction the outcome,
$o^t$
, is appended to the end of W. For every split of W into
$W_0$
and
$W_1$
, we perform the K-S test to determine whether the null hypothesis, ‘the data in the two sets is drawn from the same distribution’ can be rejected, that is, there has been a change in the agents’ behaviour. The K-S test is appropriate because the data is continuous (Press et al., Reference Press, Teukolsky, Vetterling and Flannery2007). There is only one parameter in the K-S test,
$\alpha \in [0,1]$
, where the confidence in the decision of accepting or rejecting the null hypothesis is
$1-\alpha$
. Therefore, for a high level of confidence in this decision,
$\alpha$
should be a low value. We show in Section 5, that in our results the efficacy of RaPTaR does not fluctuate with differing values for
$\alpha$
.
In the original AdWin algorithm, W is iteratively split into all combinations of
$W_0$
and
$W_1$
starting from isolating the oldest instance and incrementally moving the oldest element of
$W_1$
into
$W_0$
, until either change is detected or all the instances in W are assumed to be from the same distribution. We identified that there is a performance difference in the AdWin algorithm if W is split into
$W_0$
and
$W_1$
in reverse order, such that the most recent instance is isolated first into
$W_1$
with all other instances in
$W_0$
, then iteratively moving the last instance of
$W_0$
into
$W_1$
. To keep the retained data as relevant as possible, conducting the split starting at the most recent instance is more effective. This is because if there is a gradual change in the distribution over data, which does not have one clearly defined time point of change, the null hypothesis could be rejected at any of several consecutive split points. By identifying the most recent split point, the most relevant data is retained in
$W_1$
. Alternatively, starting the tests from older instances will result in the change being detected closer to the start of the window and thus retaining more, possibly irrelevant, data in
$W_1$
. Algorithm 1 describes how RaPTaR learns behavioural patterns, splitting W into
$W_0$
and
$W_1$
by starting to check for behaviour change at the end and then working backwards.
If there exists a split of W into
$W_0$
and
$W_1$
such that a change in behaviour is detected, RaPTaR stores information about the change for future predictions about behaviour. When a change is detected, the instances in
$W_0$
can be learnt as a concept. The currently active concept,
$c_c$
, which best estimates
$W_1$
, is still active and may still change, so we cannot record a transition to it yet. Therefore, transitions are recorded between the two concepts that were active prior to
$c_c$
. Concept
$c_y$
represents the behaviour in
$W_0$
and
$c_x$
was the concept learned before
$c_y$
, learned from data which has since been removed from W. Concept
$c_x$
is initialized to the unknown concept, and then updated at the end of this process as
$c_x \gets c_y$
, ready to record the next transition. The form of the transition matrix is depicted in Table 2. The first column of TM, index 0, is for the unknown concept, to show how frequently other concepts are followed by a new concept. When a change in behaviour is detected, as described above, the concept
$c_y$
that describes the behaviour in
$W_0$
needs to be identified as either a concept seen before or be learnt as a new concept. The value of all the known concepts in
$\overrightarrow{C}$
is compared to the mean of the instances in
$W_0$
,
$\mu_{W_0}$
, and if there exists a concept
$c \in \overrightarrow{C}$
such that
$|\mu_{W_0} - \mu_{c}| < cet$
, where cet is the conceptual equivalence threshold, then
$c_y \gets c$
. Storing mean values is a simple approach, and future work might consider storing more complex agent policies. If no concept is equivalent, a new concept is learnt with the value
$\mu_{W_0}$
. If the length of
$W_0$
is less than a stable learning size, s, then it is not considered substantial to record a transition for. The previous concept,
$c_x$
, is remembered even though the data for it has since been deleted, and we can now record the transition between
$c_x$
and the newly learned
$c_y$
. The list,
$L_{[c_x,c_y]}$
, records the length of time spent at
$c_x$
,
$tl_{cx}$
, before moving to
$c_y$
, for every occurrence of the transition. The time at
$c_x$
is initialized to 0 for the first recording. The reactive and learning component is summarized in Algorithm 1.
Table 2 Transition Matrix
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_tab2.png?pub-status=live)
Algorithm 1 Learning Concepts by Updating TM
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_tab6.png?pub-status=live)
3.2 Predictive component
For any trust and reputation model that accepts an a priori trust estimate, RaPTaR calculates an expected utility based on the value of each concept,
$\mu_c \in \overrightarrow{C}$
, in the concept history for a group, and the probability it may currently be active, p(c). The expected utility draws upon the information recorded in TM by the learning component and is calculated as:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn10.png?pub-status=live)
First, tr identifies if any of the known concepts,
$c \in \overrightarrow{C}$
, could be the currently active concept,
$c_c$
, by evaluating the difference between the concept values,
$\mu_c$
, and the mean of the values currently in W,
$\mu_W$
, if
$|\mu_{c_c} - \mu_W |< cet$
. The trustor assesses the length of time this concept has likely been active,
$t_{c_c}$
, as the current time, t, minus the time the first outcome in W was recorded:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqnU1.png?pub-status=live)
The first known occurrence of the currently active concept should be in the first element of the adaptive window, W[0], as the reactive component statistically readjusted the window to contain data believed to be generated by the same underlying distribution.
The next step is to calculate the probability, p(c), that each concept
$c \in \overrightarrow{C}$
will succeed
$c_c$
given that
$c_c$
is assumed to have been active for
$t_{c_c}$
. A PDF is built with a Gaussian Kernel Density Estimator using the lengths of time spent at
$c_c$
in the past before it was succeeded by c. These times were recorded in TM, illustrated in Table 2,
$\forall tl_{c_c,c} \in L_{[c_c,c]}$
. A PDF interprets the probability of the length of the concept, which is advantageous because the reactive component may not identify the time of change perfectly, but the PDF will combine all the times to give an estimate. As
$|L_{[c_c,c]}|$
increases, the PDF will get more accurate at pinpointing the time. The probability,
$\dot{p(c)}$
, that c succeeds
$c_c$
after
$t_{c_c}$
time at
$c_c$
is:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqnU2.png?pub-status=live)
where
$|L _{[c_c,c]}|$
is the length of the list, and therefore the number of times c has succeeded
$c_c$
. Multiplying by the frequency of transitions turns the probability into a frequency distribution estimate which will give precedence to the concepts which more frequently succeed
$c_c$
when the probabilities are normalized.
To normalize the probabilities of each possible next concept (including the possibility that
$c_c$
is followed by
$c_c$
at this time), the probabilities must sum to one,
$\sum_{c \in \overrightarrow{C}} p(c) = 1$
. Therefore, p(c) is given as:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn11.png?pub-status=live)
If no known concept is active, the currently active concept is assumed to be new and it takes on the value
$\mu_W$
and probability 1. No other concept has a probability of being active as we have no information about what or when another behaviour might follow the currently active concept.
3.3 Integrating RaPTaR with trust and reputation models
RaPTaR has two points of integration with trust and reputation algorithms. First, RaPTaR detects changes in the behaviour of agents in a group, G, and deletes any records believed to be irrelevant from
$\mathcal{O}$
. The effect is that the trust and reputation model draws upon records in
$\mathcal{O}$
which are believed to be representative of current behaviours. This contrasts with use of a fixed size sliding window that can only delete the oldest interaction record to accommodate space for the newest interaction record, regardless of the relevance of either the newer or older record. Second, RaPTaR outputs an expected utility which can be used as an a priori estimate in trust and reputation models which incorporate such a value when there are no, or few, direct experiences with an agent on which to otherwise base a decision.
4 Experimental methodology
In this section, we describe the evaluation environment used in this paper. RaPTaR is an extension to trust assessment requiring minimal parameter tuning, and therefore RaPTaR is applicable in multiple domains. We describe general environmental parameters that can be altered to model a variety of application types.
An interaction between two agents can represent an exchange of goods or services between producer and consumer. A trustor agent, tr, makes a trust assessment about a trustee agent, te, and if they are the most trustworthy available agent, they have an interaction. Agents interact over time steps and gain experiences with other agents which they use to assess their future performance. Agents behave as both trustors and trustees. In a round, a trustor chooses a partner to execute their task, but they can also be selected as a trustee by other agents. In our evaluation, an agent can act as a trustee in up to 10 requests per time step. This mimics evaluations from existing literature where there are more trustees than trustors, as well as enabling agents to multi-task (Nguyen & Bai, Reference Nguyen and Bai2018). When there are more possible partners to choose from, trustors have a more difficult task of identifying the best partner; therefore, the results showcase which partner selection technique is more successful.
An agent is described by the tuple
$\langle ID, G, b(t) \rangle$
where ID is the agent’s identifier, G is the agent’s groups and
$b(t) \in [0,1]$
describes agents’ behaviour which is dependent on time t and their group. The behaviour at any time point is described below. An agent maintains a set of historical records of their past interactions,
$\mathcal{O}$
, where each tuple represents one interaction in the form:
$\langle j, o_t, t \rangle$
where j is the ID of their interaction partner and
$o_t$
is the value of the interaction at time t.
4.1 Dynamic behaviours
The trustor, tr, receives an outcome as a result of interacting with a trustee te,
$o_t \in [0,1]$
, dependent on the behaviour of te at time t. This value can be seen as the extent to which, or the probability that, te cooperates or defects. Static agent behaviour, b, is typically defined within a range of values where the bounds are application dependent. Trust and reputation algorithms then normalize these bounds such that
$b \in [0,1]$
, where 0 indicates defective behaviour, whereas a value tending towards 1 indicates a cooperative agent. We extend this to define dynamic behaviours,
$b(t) \in [0,1]$
, as a function dependent on time. These functions are randomly generated in each experiment, so that the length of time concepts are active for, and the speed of transitions vary. Our evaluation then shows how each method copes given there is no advance information about agent behaviours. The outcome of an interaction is then drawn from a Gaussian distribution with mean b(t) and standard deviation 0.2, to represent that an agent will not always perform exactly the same, or that agents sharing the behaviour from that group behave uniformly. Examples of dynamic behaviours are given in Figure 1. These functions could be repetitive patterns over time as seen in Figure 1(a), or be random dynamic behaviour as depicted in Figure 1(b). An agent’s behaviour is defined by the group they belong to.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_fig1.png?pub-status=live)
Figure 1 Example of groups’ dynamic behaviours
4.2 Groups
In the model assumptions stated in Section 3, agents are identifiable by group, and their behaviour is dictated by their group. Figure 1 illustrates how a groups’ behaviour can change over time. Groups may be explicit, as we have assumed in this paper, but they may represent profiles of entities in the application. For example, Smart Grid users might be classified into groups including a single person living alone, or large houses with multiple occupancy, or industrial offices. As smart meters become more advanced and prevalent in the electricity grid, they can provide more information to improve these classifications using stereotype techniques when group identities are not explicit. For the evaluation in this paper, we use 3 groups, and agents do not change between groups. Agents are randomly assigned to a group at the beginning of their lifetime.
4.3 Network topology
To demonstrate that RaPTaR is appropriate in many domains, we consider agents being connected in different network structures (Anderson et al., Reference Anderson, Lee and Menassa2013). The network topology affects which agents are connected to which other agents, and the consequences to trust and reputation models include limiting available interaction partners and witnesses. Our evaluation considers three common structures. The Kleinberg small-world network where agents are grouped in hubs with few links connecting each hub, the Barabási-Albert scale-free network which uses the preferential attachment property, often used to represent social networks, where agents have a higher probability of being attached to an agent who already has a higher degree, and a fully connected network. When we use the small-world or scale-free networks in our evaluation they are populated with 100, 200 or 500 agents, but the fully connected network is only populated with 20 or 40 agents as the increased number of edges means it is significantly less scalable and simulations become too long to run. We include results with a fully connected network because this represent a portion of the network or a group rather than the whole population. Additionally, a fully connected network represents a scenario where agents have sparse data because of the high degree but many possible interaction partners. Small-world networks in the experiments are generated with a clustering coefficient of 0.5.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_fig2.png?pub-status=live)
Figure 2 RaPTaR more quickly adapts its trust estimates (EU) to reflect the true behaviour compared to other methods. Agents use BRS in a fully connected network with an exploration rate of 0.2
4.4 RaPTaR variables
The conceptual equivalence threshold, cet, is set to 0.1 in our evaluation. Trust models can have low accuracy and so a value of cet as high as 0.1 helps to group together interactions with trustees whose outcomes were generated by the same behaviour but have a margin of discrepancy between their assessment. Additionally, this is low enough that any true behaviours that are different but are assessed to be the same concept are not too different. Previous experiments show that perfect accuracy for estimating agent behaviour is not important but that the ranking of agents needs to be correct (Player & Griffiths, Reference Player and Griffiths2017). If
$cet = 0.1,$
this bins trust into 10 categories (assuming that trust is in the range [0,1]), providing enough granularity to rank behaviour from good to bad. The stable learning size, s, is set as low as 3 to encourage learning brief periods of behaviour which occur during transitions between longer periods of static behaviour. We do not consider this an input parameter to RaPTaR because we recommend the value to be small in a dynamic noisy agent-based environment.
Table 3 Average utility per agents per time step in network topologies using different trust models with either a fixed window size (ws), forgetting factor (
$\lambda$
) or RaPTaR, averaged over 1000 rounds and 100 runs.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_tab3.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_fig3.png?pub-status=live)
Figure 3 Quick adaptations to behaviour change lead to short term large utility benefits which are substantial enough to be significant improvements to average utility over 100 runs
5 Results and evaluation
In this section, we present experimental results of RaPTaR applied to three trust and reputation algorithms: BRS, DRS and TRAVOS. The results demonstrate that RaPTaR is an effective alternative to sliding windows and forgetting factors when the dynamic nature of the environment is unknown. All results are averaged over 100 runs and are statistically significant with a p-value less than 0.001 using a paired t-test. The only exceptions to this are in Figures 2 and 3(a) where the results demonstrate how RaPTaR works in a single experiment. RaPTaR aims to improve an agent’s ability to assess another agent’s behaviour, and thereby select better partners for interactions. Differences in results obtained from using RaPTaR with different
$\alpha$
values are not statistically significant, demonstrating there is little dependence on this parameter. To illustrate this, the results show the average utility agents receive for interactions. For both trust models, BRS and DRS, we have put in bold the evaluation results which are best. Table 3 shows the average utility, or outcome, an agent receives across all rounds and runs. RaPTaR outperforms all sliding windows with various window sizes (ws) and forgetting factor values (
$\lambda$
), where the improvement varies between 2% and 5%. This average improvement demonstrates RaPTaR is robust and consistent, but RaPTaR provides larger improvements in times of dynamic behaviour and performs equally well as existing techniques during periods of static behaviour, thus reducing the average improvement. The extent of the improvement can vary depending on the network topology, which is discussed below. Agents dynamic behaviour is patterned in all evaluations unless otherwise stated, where we show that RaPTaR is effective for random dynamic behaviour as well.
To visualize why and how RaPTaR performs well we analyze a single run. In this example, we use a fully connected network to emphasize how agents behave when they have a lot of choice for partners and sparse data, which demonstrates the efficacy of the behaviour assessment model. We compare how BRS performs with RaPTaR compared to a sliding window. Figure 2 shows trustors’ average trust estimate of agents from each group at every time period. We can see that first, RaPTaR adjusts to changes as they occur, and second, RaPTaR makes more accurate assessments of all the groups, not just the best one. Figure 3(a) depicts the average outcome that agents receive in this experiment. At times 500–550 and 850–900, there are substantial improvements from using RaPTaR. This is not a constant improvement across all rounds because during static periods of behaviour, the sliding window eventually tends towards accuracy. However, as soon as the dynamic behaviour of the groups causes the best group to change, RaPTaR’s ability to adapt helps agents select a better partner sooner. This analysis can be verified by observing the accuracy of the trust estimates at those times in Figure 2. Additionally, RaPTaR can predict cyclical behaviour before the change has occurred, as well as react, while existing methods take a period of time to relearn behaviours even though they may have been seen before.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_fig4.png?pub-status=live)
Figure 4 Varying
$\alpha$
values in the K-S Test of RaPTaR does not affect its performance
To test if these improvements occur for dynamic behaviours which change at different times and speeds across multiple runs, Figure 3(b) compares the average utility over 100 runs using the same experimental parameters as the example. RaPTaR shows a constant average improvement, demonstrating it is robust to dynamic behaviours.
Using different values of forgetting factor or fixed window size does not make these techniques more successful against RaPTaR, as demonstrated in Table 3 where any value of
$\alpha$
for RaPTaR performs better. The values of window size and forgetting factor have been chosen based on existing literature, and the intuition that any smaller values are too small to learn accurately from, while larger values would prevent these models from adapting to newer circumstances. One reason these methods do not perform well is because a single forgetting factor or sliding window size will be inappropriate for adapting to every agents’ dynamic behaviour.
Varying
$\alpha$
has limited effect on RaPTaR’s efficacy as seen in both Table 3 and a visualization of the average utility agents receive over time for different
$\alpha$
values in Figure 4(a). Further analysis in Figure 4(b) shows that lower values of
$\alpha$
, which require agents to have an extremely high level of confidence to believe there has been a behaviour change, result in more data being retained compared to a higher value of
$\alpha$
. Figure 4(b) shows the average window sizes over 100 runs; however, in each run, the adaptive windows monitoring each group will have varied according to the dynamic situation, which is why RaPTaR outperforms fixed window sizes.
Figure 5 shows that RaPTaR still performs well when behaviours are not repetitive. An example of non-repetitive, random behaviour was depicted in Figure 1(b). However, there is a drop in performance compared to when behaviour is cyclic. This shows that the predictive component of RaPTaR is effective, but that improvements could be made by learning when the predictive component is not successful to reduce any inaccurate biased influences it may cause.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_fig5.png?pub-status=live)
Figure 5 Average utility per agent given random dynamic behaviour in a scale-freenNetwork, agents use BRS
Previous work has demonstrated how the accuracy of a trust model to estimate a partner’s exact behaviour is not crucial, but that the trust model should be accurate enough to correctly rank the best agents (Player & Griffiths, Reference Player and Griffiths2017). If a group of agents is consistently the best, other techniques which forget at a constant rate will converge on the correct ranking of agents and perform equally compared to RaPTaR. However, with dynamic behaviours, it is unlikely that one group is consistently better than the others over time. Therefore, how consistently accurate the trust assessment is indicates how quickly the trust model adapts to changing situations. The measure of accuracy we use is the root mean squared error (RMSE) at each time point:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_eqn12.png?pub-status=live)
where n is the total number of interactions of all agents in that round,
$\tau_{tr,te}$
is the trust estimate between agent tr and trustee te and
$b(t)_{te}$
is the trustee’s true behaviour at time t.
Figure 6 demonstrates that while sliding windows are sufficient in static circumstances, they are not effective with dynamic behaviours. RaPTaR can handle both static and dynamic circumstances by adapting as necessary. Additionally, using an adaptive window means that RaPTaR does not require any prior knowledge of behaviours. Agents in this evaluation and the results from here onwards use DRS instead of BRS, such that interaction outcomes have more granularity. We show that RaPTaR has similar increase in performance and can handle nominal outcomes as well as binary, which was a motivating factor for using the K-S test for behaviour change detection using interaction outcomes.
Table 4 Network topology mean degree and standard deviation.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_tab4.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_fig6.png?pub-status=live)
Figure 6 Error in small-world network, where agents use DRS and an exploration rate of 0.2
The network topology affects agents’ neighbourhood size, limiting the number of available partners. Table 4 shows the mean degree of an agent (i.e. its neighbourhood size) in the respective network topologiesFootnote 1 and the standard deviation of that mean across all the agents in the network. The standard deviation of the degree in a small-world network is considerably lower than in a scale-free network, so almost all the agents in a small-world network have a selection of 5 to 7 agents to partner with and collect witness reports from. In a scale-free network, most agents have only one or two possible partners, while a small minority of agents will have an extremely high degree. From Table 3, it can be seen that there is little improvement in a scale-free network because most agents are forced to interact with one of their few neighbours regardless of their behaviour (and therefore also regardless of how accurately they assess trust). A fully connected network best showcases RaPTaR, as agents have a wide variety of choice and it is the assessment technique which most affects an agent’s average outcome. In a fully connected network, we can see that RaPTaR offers the highest improvement, regardless of the trust algorithm it is applied to.
In a dynamic environment, it is necessary to continually monitor for updates otherwise agents’ views of the world become outdated. However, agents face an exploration–exploitation trade-off. Exploration is the probability of interacting with a random partner. The necessary extent of exploration depends on how extreme the changes are. Each run of our experiments generates different dynamic behaviours for each group, and therefore runs can vary from being extremely dynamic to relatively static and there is no knowledge of this in advance. Figure 7 illustrates how different exploration rates affect RaPTaR, sliding windows and forgetting factors. The results presented are from a fully connected network because agents have a large choice of partners, giving them the opportunity to exploit the knowledge they gain from exploration.
When agents do not explore, as in Figure 7(a), all models are equally blind to changes in the environment and perform poorly. When all models use higher exploration levels, RaPTaR performs the best because it has the capability to learn about the environment and exploit the knowledge it gains from exploration. Once exploration is as high as 30%, the forced exploration with so many suboptimal partners slightly decreases the average utility of interactions to below 0.6 without having gained any more useful knowledge.
Table 5 Average utility per agents per time step in different network topologies when agents use TRAVOS with either a fixed window size (ws), forgetting factor (
$\lambda$
) or RaPTaR.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_tab5.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200616071110110-0644:S0269888920000077:S0269888920000077_fig7.png?pub-status=live)
Figure 7 Average utility per agent improves with a small amount of exploration, agents use DRS and are in a fully connected network
To demonstrate that RaPTaR can be applied to trust models with continuous representations of outcomes, we present results of agents using the TRAVOS trust algorithm in Table 5. One value of forgetting factor,
$\lambda=0.9$
, appears to perform equally to RaPTaR; however, differences between the averaged results from RaPTaR and a forgetting factor of 0.9 are not statistically significant. Additionally, not all values of
$\lambda$
as a forgetting value are good, and selecting this value would require advance knowledge of the application, whereas all values
$\alpha$
are comparable. We present fewer results here because TRAVOS is a time-consuming trust algorithm due to its reputation collation method. TRAVOS also relies on a large amount of data, and that agents share many interaction partners so that witnesses can provide many reports from which to statistically assess their accuracy as opinion providers. Due to the time constraints from TRAVOS, we present a subset of results in Table 5 using fewer agents and fewer rounds.
6 Discussion and conclusion
In this paper, we described the necessity for more intelligent data selection techniques in trust and reputation assessment in MAS. We presented RaPTaR, a method that improved trust assessment by reacting to behaviour changes in groups of agents to provide trust and reputation algorithms with past experience data that is statistically assessed to be representative of an agent’s current behaviour. RaPTaR can also learn from the past behaviour of a group to predict an a priori estimate of an agent’s behaviour. Our method exploits repetitive behaviour of agents but is also shown to be effective when agent behaviour is dynamic but changing randomly.
Selfish agents cause biased data collection by only interacting with agents they have known to be good in the past. This may mean they only identify and interact with slightly above average partners in a static environment, but when agents have dynamic behaviour, this can also prevent trustors from observing improvements in the behaviour of other agents, and ultimately causes suboptimal partner selection. If there is less interaction data collected from a group of agents, it affects RaPTaR’s accuracy to assess the behaviour of that group. Therefore, repeat interactions with partners assessed to be suboptimal must be forced either through the network structure which limits available partners or through exploration. Exploring to collect data sacrifices a possible higher utility in the short term; however, it improves trust assessment in the long term. Exploration is necessary in a dynamic MAS (Carmel & Markovitch, Reference Carmel and Markovitch1999), and this remains true when agents have dynamic behaviour.
The average utility agents receive during a period of static behaviour, or static ranking of the best agents gives sliding windows and forgetting factors time to identify the best group and then they perform equally as well as RaPTaR. Once dynamic behaviour causes a change in the ranking of agent behaviour, RaPTaR adapts quickly, potentially predicting it and offers significant improvements for the time period it takes the other models to catch up. This difference can be substantial and is robust, as average results over 100 runs show improvements. The improvements RaPTaR offers on existing methods are statistically significant in a paired test using a p-value less than 0.001, though not statistically significant between itself with different values of the input parameter
$\alpha$
, where we have demonstrated throughout the results that there is little dependence on selecting the value for this input.
The computational complexity of RaPTaR scales with the size of an agent’s history of interactions which it must search through after every interaction to statistically detect change using the AdWin algorithm. The K-S test that is undertaken with every split of the window can is done in linear time. Therefore, RaPTaR’s time complexity is
$O(W^2)$
, where W is the size of the window of instances. The amortized runtime can be reduced using the AdWin 2 algorithm, which uses an exponential histogram data structure to store the window, as opposed to an array as described in this work for illustrative purposes. This algorithm does not require testing every possible split of W, reducing the number of times the K-S test is performed (Bifet & Gavaldà, 2009). The additional computation RaPTaR performs to learn behaviours when change is detected is constant. As
$\alpha$
decreases, change is detected less frequently, and we saw in Figure 4(b) that this increases the window size. This means that the trust and reputation algorithm the agent is using requires more computation because such algorithms scale with the number of records they have to search through.
6.1 Future work
RaPTaR’s predictive component will not be accurate if behaviour changes are not cyclical. Our results show that RaPTaR’s ability to detect and react to random dynamic behaviour is still useful; however, this could be improved upon by integrating error detection and handling, to prevent overfitting to behaviour changes which are not reoccurring. This would make RaPTaR flexible in giving more or less precedence to the predictive component depending on how successful it is.
Exploration can give agents important information, but can also be redundant and costly by interacting with sub-optimal partners. RaPTaR might perform better if agents could adjust the exploration rate in real-time by sensing the level of dynamic change in the environment, similar to WoLF’s ability to learn quicker when its performance decreases, and slower while it is performing well (Bowling & Veloso, Reference Bowling and Veloso2001). Other options include a threshold of minimum trust to veto interactions with agents below the level, or exploring by an algorithm such as simulated annealing.