Publications
2020
-
S.V. Albrecht, P. Stone, M.P. Wellman
Special Issue on Autonomous Agents Modelling Other Agents: Guest Editorial
Artificial Intelligence (AIJ), Vol. 285, 2020
Abstract | BibTex | Publisher | Special Issue
Abstract: Much research in artificial intelligence is concerned with enabling autonomous agents to reason about various aspects of other agents (such as their beliefs, goals, plans, or decisions) and to utilise such reasoning for effective interaction. This special issue contains new technical contributions addressing open problems in autonomous agents modelling other agents, as well as research perspectives about current developments, challenges, and future directions.@article{albrecht2020special, title = {Special Issue on Autonomous Agents Modelling Other Agents: Guest Editorial}, author = {Stefano V. Albrecht and Peter Stone and Michael P. Wellman}, journal = {Artificial Intelligence}, volume = {285}, year = {2020}, publisher = {Elsevier}, url = {https://doi.org/10.1016/j.artint.2020.103292} }
-
F. Christianos, L. Schäfer, S.V. Albrecht
Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Conference on Neural Information Processing Systems (NeurIPS), 2020
Abstract | BibTex | arXiv
Abstract: Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.@inproceedings{christianos2020shared, title={Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning}, author={Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht}, booktitle={34th Conference on Neural Information Processing Systems}, year={2020} }
-
A. Rahman, N. Höpner, F. Christianos, S.V. Albrecht
Open Ad Hoc Teamwork using Graph-based Policy Learning
arXiv, 2006.10412, 2020
Abstract | BibTex | arXiv
Abstract: Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with previously unknown teammates. Prior work in this area has focused on closed teams in which the number of agents is fixed. In this work, we consider open teams by allowing agents of varying types to enter and leave the team without prior notification. Our proposed solution builds on graph neural networks to learn scalable agent models and value decompositions under varying team sizes, which can be jointly trained with a reinforcement learning agent using discounted returns objectives. We demonstrate empirically that our approach results in agent policies which can robustly adapt to dynamic team composition, and is able to effectively generalize to larger teams than were seen during training.@misc{rahman2020open, title={Open Ad Hoc Teamwork using Graph-based Policy Learning}, author={Arrasy Rahman and Niklas H"opner and Filippos Christianos and Stefano V. Albrecht}, year={2020}, eprint={2006.10412}, archivePrefix={arXiv}, primaryClass={cs.LG} }
-
G. Papoudakis, F. Christianos, L. Schäfer, S.V. Albrecht
Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms
arXiv, 2006.07869, 2020
Abstract | BibTex | arXiv
Abstract: Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we evaluate and compare three different classes of MARL algorithms (independent learners, centralised training with decentralised execution, and value decomposition) in a diverse range of multi-agent learning tasks. Our results show that (1) algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks; (2) independent learners often achieve equal or better performance than more complex algorithms; (3) tested algorithms struggle to solve multi-agent tasks with sparse rewards. We report detailed empirical data, including a reliability analysis, and provide insights into the limitations of the tested algorithms.@misc{papoudakis2020comparative, title={Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms}, author={Georgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht}, year={2020}, eprint={2006.07869}, archivePrefix={arXiv}, primaryClass={cs.LG} }
-
G. Papoudakis, F. Christianos, S.V. Albrecht
Local Information Opponent Modelling Using Variational Autoencoders
arXiv, 2006.09447, 2020
Abstract | BibTex | arXiv
Abstract: Modelling the behaviours of other agents (opponents) is essential for understanding how agents interact and making effective decisions. Existing methods for opponent modelling commonly assume knowledge of the local observations and chosen actions of the modelled opponents, which can significantly limit their applicability. We propose a new modelling technique based on variational autoencoders, which are trained to reconstruct the local actions and observations of the opponent based on embeddings which depend only on the local observations of the modelling agent (its observed world state, chosen actions, and received rewards). The embeddings are used to augment the modelling agent's decision policy which is trained via deep reinforcement learning; thus the policy does not require access to opponent observations. We provide a comprehensive evaluation and ablation study in diverse multi-agent tasks, showing that our method achieves comparable performance to an ideal baseline which has full access to opponent's information, and significantly higher returns than a baseline method which does not use the learned embeddings.@misc{papoudakis2020opponent, title={Local Information Opponent Modelling Using Variational Autoencoders}, author={Georgios Papoudakis and Filippos Christianos and Stefano V. Albrecht}, year={2020}, eprint={2006.09447}, archivePrefix={arXiv}, primaryClass={cs.LG} }
-
G. Papoudakis, S.V. Albrecht
Variational Autoencoders for Opponent Modeling in Multi-Agent Systems
AAAI-20 Workshop on Reinforcement Learning in Games, 2020
Abstract | BibTex | arXiv
Abstract: Multi-agent systems exhibit complex behaviors that emanate from the interactions of multiple agents in a shared environment. In this work, we are interested in controlling one agent in a multi-agent system and successfully learn to interact with the other agents that have fixed policies. Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system. By taking advantage of recent advances in unsupervised learning, we propose modeling opponents using variational autoencoders. Additionally, many existing methods in the literature assume that the opponent models have access to opponent's observations and actions during both training and execution. To eliminate this assumption, we propose a modification that attempts to identify the underlying opponent model using only local information of our agent, such as its observations, actions, and rewards. The experiments indicate that our opponent modeling methods achieve equal or greater episodic returns in reinforcement learning tasks against another modeling method.@misc{papoudakis2020variational, title={Variational Autoencoders for Opponent Modeling in Multi-Agent Systems}, author={Georgios Papoudakis and Stefano V. Albrecht}, year={2020}, eprint={2001.10829}, archivePrefix={arXiv}, primaryClass={cs.LG} }
-
I. Ahmed, J.P. Hanna, S.V. Albrecht
Quantum-Secure Authentication via Abstract Multi-Agent Interaction
arXiv, 2007.09327, 2020
Abstract | BibTex | arXiv
Abstract: Current methods for authentication based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach to authentication in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. The security of this approach rests upon the difficulty of learning the model parameters of interacting agents, a problem which we conjecture is also hard for quantum computing. We develop methods which enable a server agent to classify a client agent as either legitimate or adversarial based on their past interactions. Moreover, we use reinforcement learning techniques to train server policies which effectively probe the client's decisions to achieve more sample-efficient authentication, while making modelling attacks as difficult as possible via entropy-maximization principles. We empirically validate our methods for authenticating legitimate users while detecting different types of adversarial attacks.@misc{ahmed2020quantumsecure, title={Quantum-Secure Authentication via Abstract Multi-Agent Interaction}, author={Ibrahim Ahmed and Josiah P. Hanna and Stefano V. Albrecht}, year={2020}, eprint={2007.09327}, archivePrefix={arXiv}, primaryClass={cs.CR} }
-
H. Pulver, F. Eiras, L. Carozza, M. Hawasly, S.V. Albrecht, S. Ramamoorthy
PILOT: Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving
arXiv, 2011.00509, 2020
Abstract | BibTex | arXiv
Abstract: Achieving the right balance between planning quality, safety and runtime efficiency is a major challenge for autonomous driving research. Optimisation-based planners are typically capable of producing high-quality, safe plans, but at the cost of efficiency. We present PILOT, a two-stage planning framework comprising an imitation neural network and an efficient optimisation component that guarantees the satisfaction of requirements of safety and comfort. The neural network is trained to imitate an expensive-to-run optimisation-based planning system with the same objective as the efficient optimisation component of PILOT. We demonstrate in simulated autonomous driving experiments that the proposed framework achieves a significant reduction in runtime when compared to the optimisation-based expert it imitates, without sacrificing the planning quality.@misc{pulver2020pilot, title={PILOT: Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving}, author={Henry Pulver and Francisco Eiras and Ludovico Carozza and Majd Hawasly and Stefano V. Albrecht and Subramanian Ramamoorthy}, year={2020}, eprint={2011.00509}, archivePrefix={arXiv}, primaryClass={cs.RO} }
-
S.V. Albrecht, C. Brewitt, J. Wilhelm, B. Gyevnar, F. Eiras, M. Dobre, S. Ramamoorthy
Interpretable Goal-based Prediction and Planning for Autonomous Driving
arXiv, 2002.02277, 2020
Abstract | BibTex | arXiv
Abstract: We propose an integrated prediction and planning system for autonomous driving which uses rational inverse planning to recognise the goals of other vehicles. Goal recognition informs a Monte Carlo Tree Search (MCTS) algorithm to plan optimal maneuvers for the ego vehicle. Inverse planning and MCTS utilise a shared set of defined maneuvers and macro actions to construct plans which are explainable by means of rationality principles. Evaluation in simulations of urban driving scenarios demonstrate the system's ability to robustly recognise the goals of other vehicles, enabling our vehicle to exploit non-trivial opportunities to significantly reduce driving times. In each scenario, we extract intuitive explanations for the predictions which justify the system's decisions.@misc{albrecht2020integrating, title={Interpretable Goal-based Prediction and Planning for Autonomous Driving}, author={Stefano V. Albrecht and Cillian Brewitt and John Wilhelm and Balint Gyevnar and Francisco Eiras and Mihai Dobre and Subramanian Ramamoorthy}, year={2020}, eprint={2002.02277}, archivePrefix={arXiv}, primaryClass={cs.RO} }
-
F. Eiras, M. Hawasly, S.V. Albrecht, S. Ramamoorthy
Two-Stage Optimization-based Motion Planner for Safe Urban Driving
arXiv, 2002.02215, 2020
Abstract | BibTex | arXiv
Abstract: Recent road trials have shown that guaranteeing the safety of driving decisions is essential for the wider adoption of autonomous vehicle technology. One promising direction is to pose safety requirements as planning constraints in nonlinear, nonconvex optimization problems of motion synthesis. However, many implementations of this approach are limited by uncertain convergence and local optimality of the solutions achieved, affecting overall robustness. To improve upon these issues, we propose a novel two-stage optimization framework: in the first stage, we find a solution to a Mixed-Integer Linear Programming (MILP) formulation of the motion synthesis problem, the output of which initializes a second Nonlinear Programming (NLP) stage. The MILP stage enforces hard constraints of safety and road rule compliance generating a solution in the right subspace, while the NLP stage refines the solution within the safety bounds for feasibility and smoothness. We demonstrate the effectiveness of our framework via simulated experiments of complex urban driving scenarios, outperforming a state-of-the-art baseline in metrics of convergence, comfort and progress.@misc{eiras2020twostage, title={Two-Stage Optimization-based Motion Planner for Safe Urban Driving}, author={Francisco Eiras and Majd Hawasly and Stefano V. Albrecht and Subramanian Ramamoorthy}, year={2020}, eprint={2002.02215}, archivePrefix={arXiv}, primaryClass={cs.RO} }
2019
-
M. Wiatrak, S.V. Albrecht, A. Nystrom
Stabilizing Generative Adversarial Networks: A Survey
arXiv, 1910.00927, 2019
Abstract | BibTex | arXiv
Abstract: Generative Adversarial Networks (GANs) are a type of generative model which have received much attention due to their ability to model complex real-world data. Despite their recent successes, the process of training GANs remains challenging, suffering from instability problems such as non-convergence, vanishing or exploding gradients, and mode collapse. In recent years, a diverse set of approaches have been proposed which focus on stabilizing the GAN training procedure. The purpose of this survey is to provide a comprehensive overview of the GAN training stabilization methods which can be found in the literature. We discuss the advantages and disadvantages of each approach, offer a comparative summary, and conclude with a discussion of open problems.@misc{wiatrak2019stabilizing, title={Stabilizing Generative Adversarial Networks: A Survey}, author={Maciej Wiatrak and Stefano V. Albrecht and Andrew Nystrom}, year={2019}, eprint={1910.00927}, archivePrefix={arXiv}, primaryClass={cs.LG} }
-
G. Papoudakis, F. Christianos, A. Rahman, S.V. Albrecht
Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning
arXiv, 1906.04737, 2019
Abstract | BibTex | arXiv
Abstract: Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.@misc{papoudakis2019dealing, title={Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning}, author={Georgios Papoudakis and Filippos Christianos and Arrasy Rahman and Stefano V. Albrecht}, year={2019}, eprint={1906.04737}, archivePrefix={arXiv}, primaryClass={cs.LG} }
2018
-
S.V. Albrecht, P. Stone
Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems
Artificial Intelligence (AIJ), Vol. 258, pp. 66-95, 2018
Abstract | BibTex | arXiv | Publisher
Abstract: Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.@article{ albrecht2018modelling, title = {Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems}, author = {Stefano V. Albrecht and Peter Stone}, journal = {Artificial Intelligence}, volume = {258}, pages = {66--95}, year = {2018}, publisher = {Elsevier}, note = {DOI: 10.1016/j.artint.2018.01.002} }
-
C. Innes, A. Lascarides, S.V. Albrecht, S. Ramamoorthy, B. Rosman
Reasoning about Unforeseen Possibilities During Policy Learning
arXiv, 1801.03331, 2018
Abstract | BibTex | arXiv
Abstract: Methods for learning optimal policies in autonomous agents often assume that the way the domain is conceptualised—its possible states and actions and their causal structure—is known in advance and does not change during learning. This is an unrealistic assumption in many scenarios, because new evidence can reveal important information about what is possible, possibilities that the agent was not aware existed prior to learning. We present a model of an agent which both discovers and learns to exploit unforeseen possibilities using two sources of evidence: direct interaction with the world and communication with a domain expert. We use a combination of probabilistic and symbolic reasoning to estimate all components of the decision problem, including its set of random variables and their causal dependencies. Agent simulations show that the agent converges on optimal polices even when it starts out unaware of factors that are critical to behaving optimally.@misc{innes2018reasoning, title={Reasoning about Unforeseen Possibilities During Policy Learning}, author={Craig Innes and Alex Lascarides and Stefano V. Albrecht and Subramanian Ramamoorthy and Benjamin Rosman}, year={2018}, eprint={1801.03331}, archivePrefix={arXiv}, primaryClass={cs.AI} }
2017
-
S.V. Albrecht, S. Liemhetcharat, P. Stone
Special Issue on Multiagent Interaction without Prior Coordination: Guest Editorial
Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS), Vol. 31(4), pp. 765-766, 2017
Abstract | BibTex | Publisher | MIPC Workshop Series
Abstract: This special issue of the Journal of Autonomous Agents and Multi-Agent Systems sought research articles on the emerging topic of multiagent interaction without prior coordination. Topics of interest included empirical and theoretical investigations of issues arising from assumptions of prior coordination, as well as solutions in the form of novel models and algorithms for effective multiagent interaction without prior coordination.@article{ albrecht2017special, title = {Special Issue on Multiagent Interaction without Prior Coordination: Guest Editorial}, author = {Stefano V. Albrecht and Somchaya Liemhetcharat and Peter Stone}, journal = {Autonomous Agents and Multi-Agent Systems}, volume = {31}, issue = {4}, pages = {765--766}, year = {2017}, publisher = {Springer}, url = {http://dx.doi.org/10.1007/s10458-016-9358-0} }
-
S.V. Albrecht, P. Stone
Reasoning about Hypothetical Agent Behaviours and their Parameters
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2017
Abstract | BibTex | arXiv
Abstract: Agents can achieve effective interaction with previously unknown other agents by maintaining beliefs over a set of hypothetical behaviours, or types, that these agents may have. A current limitation in this method is that it does not recognise parameters within type specifications, because types are viewed as blackbox mappings from interaction histories to probability distributions over actions. In this work, we propose a general method which allows an agent to reason about both the relative likelihood of types and the values of any bounded continuous parameters within types. The method maintains individual parameter estimates for each type and selectively updates the estimates for some types after each observation. We propose different methods for the selection of types and the estimation of parameter values. The proposed methods are evaluated in detailed experiments, showing that updating the parameter estimates of a single type after each observation can be sufficient to achieve good performance.@inproceedings{ albrecht2017reasoning, title = {Reasoning about Hypothetical Agent Behaviours and their Parameters}, author = {Stefano V. Albrecht and Peter Stone}, booktitle = {Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems}, pages = {547--555}, year = {2017} }
-
S.V. Albrecht, S. Ramamoorthy
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks (Extended Abstract)
International Joint Conference on Artificial Intelligence (IJCAI), Journal Track, 2017
Abstract | BibTex | arXiv
Abstract: Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and uncertain observations. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF is evaluated in both synthetic processes and a simulated multi-robot warehouse, where it outperformed alternative filtering methods by exploiting passivity.@inproceedings{ albrecht2017causality, title = {Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks (Extended Abstract)}, author = {Stefano V. Albrecht and Subramanian Ramamoorthy}, booktitle = {Proceedings of the 26th International Joint Conference on Artificial Intelligence}, address = {Melbourne, Australia}, month = {August}, year = {2017} }
2016
-
S.V. Albrecht, J.W. Crandall, S. Ramamoorthy
Belief and Truth in Hypothesised Behaviours
Artificial Intelligence (AIJ), Vol. 235, pp. 63-94, 2016
Abstract | BibTex | arXiv | Publisher
Abstract: There is a long history in game theory on the topic of Bayesian or “rational” learning, in which each player maintains beliefs over a set of alternative behaviours, or types, for the other players. This idea has gained increasing interest in the artificial intelligence (AI) community, where it is used as a method to control a single agent in a system composed of multiple agents with unknown behaviours. The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents. The game theory literature studies this idea primarily in the context of equilibrium attainment. In contrast, many AI applications have a focus on task completion and payoff maximisation. With this perspective in mind, we identify and address a spectrum of questions pertaining to belief and truth in hypothesised types. We formulate three basic ways to incorporate evidence into posterior beliefs and show when the resulting beliefs are correct, and when they may fail to be correct. Moreover, we demonstrate that prior beliefs can have a significant impact on our ability to maximise payoffs in the long-term, and that they can be computed automatically with consistent performance effects. Furthermore, we analyse the conditions under which we are able complete our task optimally, despite inaccuracies in the hypothesised types. Finally, we show how the correctness of hypothesised types can be ascertained during the interaction via an automated statistical analysis.@article{ albrecht2016belief, title = {Belief and Truth in Hypothesised Behaviours}, author = {Stefano V. Albrecht and Jacob W. Crandall and Subramanian Ramamoorthy}, journal = {Artificial Intelligence}, volume = {235}, pages = {63--94}, year = {2016}, publisher = {Elsevier}, note = {DOI: 10.1016/j.artint.2016.02.004} }
-
S.V. Albrecht, S. Ramamoorthy
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks
Journal of Artificial Intelligence Research (JAIR), Vol. 55, pp. 1135-1178, 2016
Abstract | BibTex | arXiv | Publisher
Abstract: Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and noisy observations. This can be a hard problem in complex processes with large state spaces. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF produces exact belief states under certain assumptions and approximate belief states otherwise, where the approximation error is bounded by the degree of uncertainty in the process. We show empirically, in synthetic processes with varying sizes and degrees of passivity, that PSBF is faster than several alternative methods while achieving competitive accuracy. Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multi-robot warehouse, and how PSBF can exploit this to accelerate the filtering task.@article{ albrecht2016causality, title = {Exploiting Causality for Selective Belief Filtering in Dynamic {B}ayesian Networks}, author = {Stefano V. Albrecht and Subramanian Ramamoorthy}, journal = {Journal of Artificial Intelligence Research}, volume = {55}, pages = {1135--1178}, year = {2016}, publisher = {AI Access Foundation}, note = {DOI: 10.1613/jair.5044} }
2015
-
S.V. Albrecht, S. Ramamoorthy
Are You Doing What I Think You Are Doing? Criticising Uncertain Agent Models
Conference on Uncertainty in Artificial Intelligence (UAI), 2015
Abstract | BibTex | arXiv
Abstract: The key for effective interaction in many multiagent applications is to reason explicitly about the behaviour of other agents, in the form of a hypothesised behaviour. While there exist several methods for the construction of a behavioural hypothesis, there is currently no universal theory which would allow an agent to contemplate the correctness of a hypothesis. In this work, we present a novel algorithm which decides this question in the form of a frequentist hypothesis test. The algorithm allows for multiple metrics in the construction of the test statistic and learns its distribution during the interaction process, with asymptotic correctness guarantees. We present results from a comprehensive set of experiments, demonstrating that the algorithm achieves high accuracy and scalability at low computational costs.@inproceedings{ albrecht2015criticising, title = {Are You Doing What I Think You Are Doing? Criticising Uncertain Agent Models}, author = {Stefano V. Albrecht and Subramanian Ramamoorthy}, booktitle = {Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence}, pages = {52--61}, year = {2015} }
-
S.V. Albrecht, J.W. Crandall, S. Ramamoorthy
An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types
AAAI Conference on Artificial Intelligence (AAAI), 2015
Abstract | BibTex | arXiv | Appendix
Abstract: Many multiagent applications require an agent to learn quickly how to interact with previously unknown other agents. To address this problem, researchers have studied learning algorithms which compute posterior beliefs over a hypothesised set of policies, based on the observed actions of the other agents. The posterior belief is complemented by the prior belief, which specifies the subjective likelihood of policies before any actions are observed. In this paper, we present the first comprehensive empirical study on the practical impact of prior beliefs over policies in repeated interactions. We show that prior beliefs can have a significant impact on the long-term performance of such methods, and that the magnitude of the impact depends on the depth of the planning horizon. Moreover, our results demonstrate that automatic methods can be used to compute prior beliefs with consistent performance effects. This indicates that prior beliefs could be eliminated as a manual parameter and instead be computed automatically.@inproceedings{ albrecht2015empirical, title = {An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types}, author = {Stefano V. Albrecht and Jacob W. Crandall and Subramanian Ramamoorthy}, booktitle = {Proceedings of the 29th AAAI Conference on Artificial Intelligence}, pages = {1988--1994}, year = {2015} }
-
S.V. Albrecht, J.W. Crandall, S. Ramamoorthy
E-HBA: Using Action Policies for Expert Advice and Agent Typification
Second Workshop on Multiagent Interaction without Prior Coordination (MIPC), 2015
Abstract | BibTex | arXiv | Appendix
Abstract: Past research has studied two approaches to utilise predefined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents. In this work, we bring these complementary views together in the form of a novel meta-algorithm, called Expert-HBA (E-HBA), which can be applied to any expert algorithm that considers the average (or total) payoff an expert has yielded in the past. E-HBA gradually mixes the past payoff with a predicted future payoff, which is computed using the type-based characterisation. We present results from a comprehensive set of repeated matrix games, comparing the performance of several well-known expert algorithms with and without the aid of E-HBA. Our results show that E-HBA has the potential to significantly improve the performance of expert algorithms.@inproceedings{ albrecht2015ehba, title = {{E-HBA}: Using Action Policies for Expert Advice and Agent Typification}, author = {Stefano V. Albrecht and Jacob W. Crandall and Subramanian Ramamoorthy}, booktitle = {Proceedings of the Second AAAI-Workshop on Multiagent Interaction without Prior Coordination}, address = {Austin, Texas, USA}, month = {January}, year = {2015} }
2014 and earlier
-
S.V. Albrecht, S. Ramamoorthy
On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems
Conference on Uncertainty in Artificial Intelligence (UAI), 2014
Abstract | BibTex | arXiv | Appendix
Abstract: While many multiagent algorithms are designed for homogeneous systems (i.e. all agents are identical), there are important applications which require an agent to coordinate its actions without knowing a priori how the other agents behave. One method to make this problem feasible is to assume that the other agents draw their latent policy (or type) from a specific set, and that a domain expert could provide a specification of this set, albeit only a partially correct one. Algorithms have been proposed by several researchers to compute posterior beliefs over such policy libraries, which can then be used to determine optimal actions. In this paper, we provide theoretical guidance on two central design parameters of this method: Firstly, it is important that the user choose a posterior which can learn the true distribution of latent types, as otherwise suboptimal actions may be chosen. We analyse convergence properties of two existing posterior formulations and propose a new posterior which can learn correlated distributions. Secondly, since the types are provided by an expert, they may be inaccurate in the sense that they do not predict the agents’ observed actions. We provide a novel characterisation of optimality which allows experts to use efficient model checking algorithms to verify optimality of types.@inproceedings{ albrecht2014convergence, title = {On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems}, author = {Stefano V. Albrecht and Subramanian Ramamoorthy}, booktitle = {Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence}, pages = {12--21}, year = {2014} }
-
S.V. Albrecht, S. Ramamoorthy
A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013
Abstract | BibTex | arXiv (full technical report) | Extended Abstract
Abstract: The ad hoc coordination problem is to design an autonomous agent which is able to achieve optimal flexibility and efficiency in a multiagent system with no mechanisms for prior coordination. We conceptualise this problem formally using a game-theoretic model, called the stochastic Bayesian game, in which the behaviour of a player is determined by its private information, or type. Based on this model, we derive a solution, called Harsanyi-Bellman Ad Hoc Coordination (HBA), which utilises the concept of Bayesian Nash equilibrium in a planning procedure to find optimal actions in the sense of Bellman optimal control. We evaluate HBA in a multiagent logistics domain called level-based foraging, showing that it achieves higher flexibility and efficiency than several alternative algorithms. We also report on a human-machine experiment at a public science exhibition in which the human participants played repeated Prisoner's Dilemma and Rock-Paper-Scissors against HBA and alternative algorithms, showing that HBA achieves equal efficiency and a significantly higher welfare and winning rate.@inproceedings{ albrecht2013game, title = {A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems}, author = {Stefano V. Albrecht and Subramanian Ramamoorthy}, booktitle = {Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems}, address = {St. Paul, Minnesota, USA}, month = {May}, year = {2013} }
-
S.V. Albrecht, S. Ramamoorthy
Comparative Evaluation of Multiagent Learning Algorithms in a Diverse Set of Ad Hoc Team Problems
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012
Abstract | BibTex | arXiv
Abstract: This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilizing different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in ad hoc team problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium.@inproceedings{ albrecht2012comparative, title = {Comparative Evaluation of {MAL} Algorithms in a Diverse Set of Ad Hoc Team Problems}, author = {Stefano V. Albrecht and Subramanian Ramamoorthy}, booktitle = {Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems}, pages = {349--356}, year = {2012} }