Publications
Click on author names or tags to filter publications.
Selected filter tags (click to remove): Stefano-V.-Albrecht
2022
Shangmin Guo, Yi Ren, Kory Mathewson, Simon Kirby, Stefano V. Albrecht, Kenny Smith
Expressivity of Emergent Languages is a Trade-off between Contextual Complexity and Unpredictability
International Conference on Learning Representations (ICLR), 2022
Abstract | BibTex | arXiv | Code
ICLRmulti-agent-rlemergent-communication
Abstract:
Researchers are using deep learning models to explore the emergence of language in various language games, where simulated agents interact and develop an emergent language to solve a task. We focus on the factors which determine the expressivity of emergent languages, which reflects the amount of information about input spaces those languages are capable of encoding. We measure the expressivity of emergent languages based on their generalisation performance across different games, and demonstrate that the expressivity of emergent languages is a trade-off between the complexity and unpredictability of the context those languages are used in. Another novel contribution of this work is the discovery of message type collapse. We also show that using the contrastive loss proposed by Chen et al. (2020) can alleviate this problem, compared with the standard referential loss used by the existing works.
@inproceedings{guo2022expressivity,
title={Expressivity of Emergent Languages is a Trade-off between Contextual Complexity and Unpredictability},
author={Shangmin Guo and Yi Ren and Kory Mathewson and Simon Kirby and Stefano V. Albrecht and Kenny Smith},
booktitle={International Conference on Learning Representations (ICLR)},
year={2022}
}
Reuth Mirsky, Ignacio Carlucho, Arrasy Rahman, Elliot Fosong, William Macke, Mohan Sridharan, Peter Stone, Stefano V. Albrecht
A Survey of Ad Hoc Teamwork: Definitions, Methods, and Open Problems
arXiv:1801.03331, 2022
Abstract | BibTex | arXiv
surveyad-hoc-teamwork
Abstract:
Ad hoc teamwork is the well-established research problem of designing agents that can collaborate with new teammates without prior coordination. This survey makes a two-fold contribution. First, it provides a structured description of the different facets of the ad hoc teamwork problem. Second, it discusses the progress that has been made in the field so far, and identifies the immediate and long-term open problems that need to be addressed in the field of ad hoc teamwork.
@misc{mirsky2022survey,
title={A Survey of Ad Hoc Teamwork: Definitions, Methods, and Open Problems},
author={Reuth Mirsky and Ignacio Carlucho and Arrasy Rahman and Elliot Fosong and William Macke and Mohan Sridharan and Peter Stone and Stefano V. Albrecht},
year={2022},
eprint={2202.10450},
archivePrefix={arXiv},
primaryClass={cs.MA}
}
Morris Antonello, Mihai Dobre, Stefano V. Albrecht, John Redford, Subramanian Ramamoorthy
Flash: Fast and Light Motion Prediction for Autonomous Driving with Bayesian Inverse Planning and Learned Motion Profiles
arXiv:2203.08251, 2022
Abstract | BibTex | arXiv
autonomous-driving
Abstract:
Motion prediction of road users in traffic scenes is critical for autonomous driving systems that must take safe and robust decisions in complex dynamic environments. We present a novel motion prediction system for autonomous driving. Our system is based on the Bayesian inverse planning framework, which efficiently orchestrates map-based goal extraction, a classical control-based trajectory generator and an ensemble of light-weight neural networks specialised in motion profile prediction. In contrast to many alternative methods, this modularity helps isolate performance factors and better interpret results, without compromising performance. This system addresses multiple aspects of interest, namely multi-modality, motion profile uncertainty and trajectory physical feasibility. We report on several experiments with the popular highway dataset NGSIM, demonstrating state-of-the-art performance in terms of trajectory error. We also perform a detailed analysis of our system's components, along with experiments that stratify the data based on behaviours, such as change lane versus follow lane, to provide insights into the challenges in this domain. Finally, we present a qualitative analysis to show other benefits of our approach, such as the ability to interpret the outputs.
@misc{antonello2022flash,
title={Flash: Fast and Light Motion Prediction for Autonomous Driving with Bayesian Inverse Planning and Learned Motion Profiles},
author={Morris Antonello, Mihai Dobre, Stefano V. Albrecht, John Redford, Subramanian Ramamoorthy},
year={2022},
eprint={2203.08251},
archivePrefix={arXiv}
}
2021
Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks
Conference on Neural Information Processing Systems, Datasets and Benchmarks Track (NeurIPS), 2021
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlmulti-agent-rl
Abstract:
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we consistently evaluate and compare three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, value decomposition) in a diverse range of cooperative multi-agent learning tasks. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks, and we provide insights regarding the effectiveness of different learning approaches. We open-source EPyMARL, which extends the PyMARL codebase [Samvelyan et al., 2019] to include additional algorithms and allow for flexible configuration of algorithm implementation details such as parameter sharing. Finally, we open-source two environments for multi-agent research which focus on coordination under sparse rewards.
@inproceedings{papoudakis2021benchmarking,
title={Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks},
author={Georgios Papoudakis and Filippos Christianos and Lukas Sch\"afer and Stefano V. Albrecht},
booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS)},
year={2021},
url = {http://arxiv.org/abs/2006.07869},
openreview = {https://openreview.net/forum?id=cIrPX-Sn5n},
code = {https://github.com/uoe-agents/epymarl}
}
Georgios Papoudakis, Filippos Christianos, Stefano V. Albrecht
Agent Modelling under Partial Observability for Deep Reinforcement Learning
Conference on Neural Information Processing Systems (NeurIPS), 2021
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlagent-modelling
Abstract:
Modelling the behaviours of other agents is essential for understanding how agents interact and making effective decisions. Existing methods for agent modelling commonly assume knowledge of the local observations and chosen actions of the modelled agents during execution. To eliminate this assumption, we extract representations from the local information of the controlled agent using encoder-decoder architectures. Using the observations and actions of the modelled agents during training, our models learn to extract representations about the modelled agents conditioned only on the local observations of the controlled agent. The representations are used to augment the controlled agent's decision policy which is trained via deep reinforcement learning; thus, during execution, the policy does not require access to other agents' information. We provide a comprehensive evaluation and ablations studies in cooperative, competitive and mixed multi-agent environments, showing that our method achieves significantly higher returns than baseline methods which do not use the learned representations.
@inproceedings{papoudakis2021local,
title={Agent Modelling under Partial Observability for Deep Reinforcement Learning},
author={Georgios Papoudakis and Filippos Christianos and Stefano V. Albrecht},
booktitle = {Proceedings of the Neural Information Processing Systems (NeurIPS)},
year = {2021}
}
Rujie Zhong, Josiah P. Hanna, Lukas Schäfer, Stefano V. Albrecht
Robust On-Policy Data Collection for Data-Efficient Policy Evaluation
NeurIPS Workshop on Offline Reinforcement Learning (NeurIPS), 2021
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rlpolicy-evaluation
Abstract:
This paper considers how to complement offline reinforcement learning (RL) data with additional data collection for the task of policy evaluation. In policy evaluation, the task is to estimate the expected return of an evaluation policy on an environment of interest. Prior work on offline policy evaluation typically only considers a static dataset. We consider a setting where we can collect a small amount of additional data to combine with a potentially larger offline RL dataset. We show that simply running the evaluation policy – on-policy data collection – is sub-optimal for this setting. We then introduce two new data collection strategies for policy evaluation, both of which consider previously collected data when collecting future data so as to reduce distribution shift (or sampling error) in the entire dataset collected. Our empirical results show that compared to on-policy sampling, our strategies produce data with lower sampling error and generally lead to lower mean-squared error in policy evaluation for any total dataset size. We also show that these strategies can start from initial off-policy data, collect additional data, and then use both the initial and new data to produce low mean-squared error policy evaluation without using off-policy corrections.
@inproceedings{zhong2021robust,
title={Robust On-Policy Data Collection for Data-Efficient Policy Evaluation},
author={Rujie Zhong and Josiah P. Hanna and Lukas Sch\"afer and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Offline Reinforcement Learning (OfflineRL)},
year={2021}
}
Arrasy Rahman, Niklas Höpner, Filippos Christianos, Stefano V. Albrecht
Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning
International Conference on Machine Learning (ICML), 2021
Abstract | BibTex | arXiv | Video | Code
ICMLdeep-rlagent-modellingad-hoc-teamwork
Abstract:
Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with teammates without prior coordination mechanisms, including joint training. Prior work in this area has focused on closed teams in which the number of agents is fixed. In this work, we consider open teams by allowing agents with different fixed policies to enter and leave the environment without prior notification. Our solution builds on graph neural networks to learn agent models and joint-action value models under varying team compositions. We contribute a novel action-value computation that integrates the agent model and joint-action value model to produce action-value estimates. We empirically demonstrate that our approach successfully models the effects other agents have on the learner, leading to policies that robustly adapt to dynamic team compositions and significantly outperform several alternative methods.
@inproceedings{rahman2021open,
title={Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning},
author={Arrasy Rahman and Niklas H\"opner and Filippos Christianos and Stefano V. Albrecht},
booktitle={International Conference on Machine Learning (ICML)},
year={2021}
}
Filippos Christianos, Georgios Papoudakis, Arrasy Rahman, Stefano V. Albrecht
Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing
International Conference on Machine Learning (ICML), 2021
Abstract | BibTex | arXiv | Video | Code
ICMLdeep-rlmulti-agent-rl
Abstract:
Sharing parameters in multi-agent deep reinforcement learning has played an essential role in allowing algorithms to scale to a large number of agents. Parameter sharing between agents significantly decreases the number of trainable parameters, shortening training times to tractable levels, and has been linked to more efficient learning. However, having all agents share the same parameters can also have a detrimental effect on learning. We demonstrate the impact of parameter sharing methods on training speed and converged returns, establishing that when applied indiscriminately, their effectiveness is highly dependent on the environment. We propose a novel method to automatically identify agents which may benefit from sharing parameters by partitioning them based on their abilities and goals. Our approach combines the increased sample efficiency of parameter sharing with the representational capacity of multiple independent networks to reduce training time and increase final returns.
@inproceedings{christianos2021scaling,
title={Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing},
author={Filippos Christianos and Georgios Papoudakis and Arrasy Rahman and Stefano V. Albrecht},
booktitle={International Conference on Machine Learning (ICML)},
year={2021}
}
Lukas Schäfer, Filippos Christianos, Josiah Hanna, Stefano V. Albrecht
Decoupling Exploration and Exploitation in Reinforcement Learning
ICML Workshop on Unsupervised Reinforcement Learning (ICML), 2021
Abstract | BibTex | arXiv | Code
ICMLdeep-rlintrinsic-reward
Abstract:
Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. We show that DeRL is more robust to scaling and speed of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically motivated baselines in fewer interactions.
@inproceedings{schaefer2021decoupling,
title={Decoupling Exploration and Exploitation in Reinforcement Learning},
author={Lukas Schäfer and Filippos Christianos and Josiah Hanna and Stefano V. Albrecht},
booktitle={ICML Workshop on Unsupervised Reinforcement Learning (URL)},
year={2021}
}
Stefano V. Albrecht, Cillian Brewitt, John Wilhelm, Balint Gyevnar, Francisco Eiras, Mihai Dobre, Subramanian Ramamoorthy
Interpretable Goal-based Prediction and Planning for Autonomous Driving
IEEE International Conference on Robotics and Automation (ICRA), 2021
Abstract | BibTex | arXiv | Video | Code
ICRAautonomous-drivinggoal-recognition
Abstract:
We propose an integrated prediction and planning system for autonomous driving which uses rational inverse planning to recognise the goals of other vehicles. Goal recognition informs a Monte Carlo Tree Search (MCTS) algorithm to plan optimal maneuvers for the ego vehicle. Inverse planning and MCTS utilise a shared set of defined maneuvers and macro actions to construct plans which are explainable by means of rationality principles. Evaluation in simulations of urban driving scenarios demonstrate the system's ability to robustly recognise the goals of other vehicles, enabling our vehicle to exploit non-trivial opportunities to significantly reduce driving times. In each scenario, we extract intuitive explanations for the predictions which justify the system's decisions.
@inproceedings{albrecht2020igp2,
title={Interpretable Goal-based Prediction and Planning for Autonomous Driving},
author={Stefano V. Albrecht and Cillian Brewitt and John Wilhelm and Balint Gyevnar and Francisco Eiras and Mihai Dobre and Subramanian Ramamoorthy},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2021}
}
Cillian Brewitt, Balint Gyevnar, Samuel Garcin, Stefano V. Albrecht
GRIT: Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
Abstract | BibTex | arXiv | Video | Code
IROSautonomous-drivinggoal-recognition
Abstract:
It is important for autonomous vehicles to have the ability to infer the goals of other vehicles (goal recognition), in order to safely interact with other vehicles and predict their future trajectories. This is a difficult problem, especially in urban environments with interactions between many vehicles. Goal recognition methods must be fast to run in real time and make accurate inferences. As autonomous driving is safety-critical, it is important to have methods which are human interpretable and for which safety can be formally verified. Existing goal recognition methods for autonomous vehicles fail to satisfy all four objectives of being fast, accurate, interpretable and verifiable. We propose Goal Recognition with Interpretable Trees (GRIT), a goal recognition system which achieves these objectives. GRIT makes use of decision trees trained on vehicle trajectory data. We evaluate GRIT on two datasets, showing that GRIT achieved fast inference speed and comparable accuracy to two deep learning baselines, a planning-based goal recognition method, and an ablation of GRIT. We show that the learned trees are human interpretable and demonstrate how properties of GRIT can be formally verified using a satisfiability modulo theories (SMT) solver.
@inproceedings{brewitt2021grit,
title={{GRIT:} Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving},
author={Cillian Brewitt and Balint Gyevnar and Samuel Garcin and Stefano V. Albrecht},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021}
}
Josiah P. Hanna, Arrasy Rahman, Elliot Fosong, Francisco Eiras, Mihai Dobre, John Redford, Subramanian Ramamoorthy, Stefano V. Albrecht
Interpretable Goal Recognition in the Presence of Occluded Factors for Autonomous Vehicles
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
Abstract | BibTex | arXiv
IROSautonomous-drivinggoal-recognition
Abstract:
Recognising the goals or intentions of observed vehicles is a key step towards predicting the long-term future behaviour of other agents in an autonomous driving scenario. When there are unseen obstacles or occluded vehicles in a scenario, goal recognition may be confounded by the effects of these unseen entities on the behaviour of observed vehicles. Existing prediction algorithms that assume rational behaviour with respect to inferred goals may fail to make accurate long-horizon predictions because they ignore the possibility that the behaviour is influenced by such unseen entities. We introduce the Goal and Occluded Factor Inference (GOFI) algorithm which bases inference on inverse-planning to jointly infer a probabilistic belief over goals and potential occluded factors. We then show how these beliefs can be integrated into Monte Carlo Tree Search (MCTS). We demonstrate that jointly inferring goals and occluded factors leads to more accurate beliefs with respect to the true world state and allows an agent to safely navigate several scenarios where other baselines take unsafe actions leading to collisions.
@inproceedings{hanna2021interpretable,
title={Interpretable Goal Recognition in the Presence of Occluded Factors for Autonomous Vehicles},
author={Josiah P. Hanna and Arrasy Rahman and Elliot Fosong and Francisco Eiras and Mihai Dobre and John Redford and Subramanian Ramamoorthy and Stefano V. Albrecht},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021}
}
Henry Pulver, Francisco Eiras, Ludovico Carozza, Majd Hawasly, Stefano V. Albrecht, Subramanian Ramamoorthy
PILOT: Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
Abstract | BibTex | arXiv | Video
IROSautonomous-driving
Abstract:
Achieving a proper balance between planning quality, safety and efficiency is a major challenge for autonomous driving. Optimisation-based motion planners are capable of producing safe, smooth and comfortable plans, but often at the cost of runtime efficiency. On the other hand, naively deploying trajectories produced by efficient-to-run deep imitation learning approaches might risk compromising safety. In this paper, we present PILOT -- a planning framework that comprises an imitation neural network followed by an efficient optimiser that actively rectifies the network's plan, guaranteeing fulfilment of safety and comfort requirements. The objective of the efficient optimiser is the same as the objective of an expensive-to-run optimisation-based planning system that the neural network is trained offline to imitate. This efficient optimiser provides a key layer of online protection from learning failures or deficiency in out-of-distribution situations that might compromise safety or comfort. Using a state-of-the-art, runtime-intensive optimisation-based method as the expert, we demonstrate in simulated autonomous driving experiments in CARLA that PILOT achieves a seven-fold reduction in runtime when compared to the expert it imitates without sacrificing planning quality.
@inproceedings{pulver2020pilot,
title={{PILOT:} Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving},
author={Henry Pulver and Francisco Eiras and Ludovico Carozza and Majd Hawasly and Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021}
}
Francisco Eiras, Majd Hawasly, Stefano V. Albrecht, Subramanian Ramamoorthy
A Two-Stage Optimization-based Motion Planner for Safe Urban Driving
IEEE Transactions on Robotics (T-RO), 2021
Abstract | BibTex | arXiv | Publisher | Video
T-ROautonomous-driving
Abstract:
Recent road trials have shown that guaranteeing the safety of driving decisions is essential for the wider adoption of autonomous vehicle technology. One promising direction is to pose safety requirements as planning constraints in nonlinear, non-convex optimization problems of motion synthesis. However, many implementations of this approach are limited by uncertain convergence and local optimality of the solutions achieved, affecting overall robustness. To improve upon these issues, we propose a novel two-stage optimization framework: in the first stage, we find a solution to a Mixed-Integer Linear Programming (MILP) formulation of the motion synthesis problem, the output of which initializes a second Nonlinear Programming (NLP) stage. The MILP stage enforces hard constraints of safety and road rule compliance generating a solution in the right subspace, while the NLP stage refines the solution within the safety bounds for feasibility and smoothness. We demonstrate the effectiveness of our framework via simulated experiments of complex urban driving scenarios, outperforming a state-of-the-art baseline in metrics of convergence, comfort and progress.
@article{eiras2021twostage,
title = {A Two-Stage Optimization-based Motion Planner for Safe Urban Driving},
author = {Francisco Eiras and Majd Hawasly and Stefano V. Albrecht and Subramanian Ramamoorthy},
journal = {IEEE Transactions on Robotics},
volume = {38},
number = {2},
pages = {822--834},
year = {2022},
doi = {10.1109/TRO.2021.3088009}
}
Ibrahim H. Ahmed, Josiah P. Hanna, Elliot Fosong, Stefano V. Albrecht
Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction
International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS), 2021
Abstract | BibTex | arXiv | Publisher | Code
PAAMSsecurityagent-modelling
Abstract:
Current methods for authentication and key agreement based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach based on artificial intelligence research in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. Authentication and key agreement are decided based on the agents' observed behaviors during the interaction. The security of this approach rests upon the difficulty of modeling the decisions of interacting agents from limited observations, a problem which we conjecture is also hard for quantum computing. We release PyAMI, a prototype authentication and key agreement system based on the proposed method. We empirically validate our method for authenticating legitimate users while detecting different types of adversarial attacks. Finally, we show how reinforcement learning techniques can be used to train server models which effectively probe a client's decisions to achieve more sample-efficient authentication.
@inproceedings{ahmed2021quantum,
title={Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction},
author={Ibrahim H. Ahmed and Josiah P. Hanna and Elliot Fosong and Stefano V. Albrecht},
booktitle={International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS)},
year={2021}
}
Shangmin Guo, Yi Ren, Kory Mathewson, Simon Kirby, Stefano V. Albrecht, Kenny Smith
Expressivity of Emergent Language is a Trade-off between Contextual Complexity and Unpredictability
arXiv:2106.03982, 2021
Abstract | BibTex | arXiv
multi-agent-rlemergent-communication
Abstract:
Researchers are now using deep learning models to explore the emergence of language in various language games, where simulated agents interact and develop an emergent language to solve a task. Although it is quite intuitive that different types of language games posing different communicative challenges might require emergent languages which encode different levels of information, there is no existing work exploring the expressivity of the emergent languages. In this work, we propose a definition of partial order between expressivity based on the generalisation performance across different language games. We also validate the hypothesis that expressivity of emergent languages is a trade-off between the complexity and unpredictability of the context those languages are used in. Our second novel contribution is introducing contrastive loss into the implementation of referential games. We show that using our contrastive loss alleviates the collapse of message types seen using standard referential loss functions.
@misc{guo2021expressivity,
title={Expressivity of Emergent Language is a Trade-off between Contextual Complexity and Unpredictability},
author={Shangmin Guo and Yi Ren and Kory Mathewson and Simon Kirby and Stefano V. Albrecht and Kenny Smith},
year={2021},
eprint={2106.03982},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Trevor McInroe, Lukas Schäfer, Stefano V. Albrecht
Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning
arXiv:2110.04935, 2021
Abstract | BibTex | arXiv | Code
deep-rl
Abstract:
Deep reinforcement learning (RL) agents that exist in high-dimensional state spaces, such as those composed of images, have interconnected learning burdens. Agents must learn an action-selection policy that completes their given task, which requires them to learn a representation of the state space that discerns between useful and useless information. The reward function is the only supervised feedback that RL agents receive, which causes a representation learning bottleneck that can manifest in poor sample efficiency. We present k-Step Latent (KSL), a new representation learning method that enforces temporal consistency of representations via a self-supervised auxiliary task wherein agents learn to recurrently predict action-conditioned representations of the state space. The state encoder learned by KSL produces low-dimensional representations that make optimization of the RL task more sample efficient. Altogether, KSL produces state-of-the-art results in both data efficiency and asymptotic performance in the popular PlaNet benchmark suite. Our analyses show that KSL produces encoders that generalize better to new tasks unseen during training, and its representations are more strongly tied to reward, are more invariant to perturbations in the state space, and move more smoothly through the temporal axis of the RL problem than other methods such as DrQ, RAD, CURL, and SAC-AE.
@misc{mcinroe2021learning,
title={Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning},
author={Trevor McInroe and Lukas Schäfer and Stefano V. Albrecht},
year={2021},
eprint={2110.04935},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
2020
Stefano V. Albrecht, Peter Stone, Michael P. Wellman
Special Issue on Autonomous Agents Modelling Other Agents: Guest Editorial
Artificial Intelligence (AIJ), 2020
Abstract | BibTex | Publisher | Special Issue
AIJagent-modelling
Abstract:
Much research in artificial intelligence is concerned with enabling autonomous agents to reason about various aspects of other agents (such as their beliefs, goals, plans, or decisions) and to utilise such reasoning for effective interaction. This special issue contains new technical contributions addressing open problems in autonomous agents modelling other agents, as well as research perspectives about current developments, challenges, and future directions.
@article{albrecht2020special,
title = {Special Issue on Autonomous Agents Modelling Other Agents: Guest Editorial},
author = {Stefano V. Albrecht and Peter Stone and Michael P. Wellman},
journal = {Artificial Intelligence},
volume = {285},
year = {2020},
publisher = {Elsevier},
url = {https://doi.org/10.1016/j.artint.2020.103292}
}
Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht
Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Conference on Neural Information Processing Systems (NeurIPS), 2020
Abstract | BibTex | arXiv
NeurIPSdeep-rlmulti-agent-rl
Abstract:
Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.
@inproceedings{christianos2020shared,
title={Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning},
author={Filippos Christianos and Lukas Sch\"afer and Stefano V. Albrecht},
booktitle={34th Conference on Neural Information Processing Systems},
year={2020}
}
Georgios Papoudakis, Stefano V. Albrecht
Variational Autoencoders for Opponent Modeling in Multi-Agent Systems
AAAI Workshop on Reinforcement Learning in Games (AAAI), 2020
Abstract | BibTex | arXiv
AAAIdeep-rlagent-modelling
Abstract:
Multi-agent systems exhibit complex behaviors that emanate from the interactions of multiple agents in a shared environment. In this work, we are interested in controlling one agent in a multi-agent system and successfully learn to interact with the other agents that have fixed policies. Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system. By taking advantage of recent advances in unsupervised learning, we propose modeling opponents using variational autoencoders. Additionally, many existing methods in the literature assume that the opponent models have access to opponent's observations and actions during both training and execution. To eliminate this assumption, we propose a modification that attempts to identify the underlying opponent model using only local information of our agent, such as its observations, actions, and rewards. The experiments indicate that our opponent modeling methods achieve equal or greater episodic returns in reinforcement learning tasks against another modeling method.
@inproceedings{papoudakis2020variational,
title={Variational Autoencoders for Opponent Modeling in Multi-Agent Systems},
author={Georgios Papoudakis and Stefano V. Albrecht},
booktitle={AAAI Workshop on Reinforcement Learning in Games},
year={2020}
}
Arrasy Rahman, Niklas Höpner, Filippos Christianos, Stefano V. Albrecht
Open Ad Hoc Teamwork using Graph-based Policy Learning
arXiv:2006.10412, 2020
Abstract | BibTex | arXiv
deep-rlagent-modellingad-hoc-teamwork
Abstract:
Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with previously unknown teammates. Prior work in this area has focused on closed teams in which the number of agents is fixed. In this work, we consider open teams by allowing agents of varying types to enter and leave the team without prior notification. Our proposed solution builds on graph neural networks to learn scalable agent models and value decompositions under varying team sizes, which can be jointly trained with a reinforcement learning agent using discounted returns objectives. We demonstrate empirically that our approach results in agent policies which can robustly adapt to dynamic team composition, and is able to effectively generalize to larger teams than were seen during training.
@misc{rahman2020open,
title={Open Ad Hoc Teamwork using Graph-based Policy Learning},
author={Arrasy Rahman and Niklas H\"opner and Filippos Christianos and Stefano V. Albrecht},
year={2020},
eprint={2006.10412},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Georgios Papoudakis, Filippos Christianos , Lukas Schäfer, Stefano V. Albrecht
Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms
arXiv:2006.07869, 2020
Abstract | BibTex | arXiv
deep-rlmulti-agent-rl
Abstract:
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we evaluate and compare three different classes of MARL algorithms (independent learners, centralised training with decentralised execution, and value decomposition) in a diverse range of multi-agent learning tasks. Our results show that (1) algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks; (2) independent learners often achieve equal or better performance than more complex algorithms; (3) tested algorithms struggle to solve multi-agent tasks with sparse rewards. We report detailed empirical data, including a reliability analysis, and provide insights into the limitations of the tested algorithms.
@misc{papoudakis2020comparative,
title={Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms},
author={Georgios Papoudakis and Filippos Christianos and Lukas Sch\"afer and Stefano V. Albrecht},
year={2020},
eprint={2006.07869},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Georgios Papoudakis, Filippos Christianos, Stefano V. Albrecht
Local Information Opponent Modelling Using Variational Autoencoders
arXiv:2006.09447, 2020
Abstract | BibTex | arXiv
deep-rlagent-modelling
Abstract:
Modelling the behaviours of other agents (opponents) is essential for understanding how agents interact and making effective decisions. Existing methods for opponent modelling commonly assume knowledge of the local observations and chosen actions of the modelled opponents, which can significantly limit their applicability. We propose a new modelling technique based on variational autoencoders, which are trained to reconstruct the local actions and observations of the opponent based on embeddings which depend only on the local observations of the modelling agent (its observed world state, chosen actions, and received rewards). The embeddings are used to augment the modelling agent's decision policy which is trained via deep reinforcement learning; thus the policy does not require access to opponent observations. We provide a comprehensive evaluation and ablation study in diverse multi-agent tasks, showing that our method achieves comparable performance to an ideal baseline which has full access to opponent's information, and significantly higher returns than a baseline method which does not use the learned embeddings.
@misc{papoudakis2020opponent,
title={Local Information Opponent Modelling Using Variational Autoencoders},
author={Georgios Papoudakis and Filippos Christianos and Stefano V. Albrecht},
year={2020},
eprint={2006.09447},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Ibrahim H. Ahmed, Josiah P. Hanna, Stefano V. Albrecht
Quantum-Secure Authentication via Abstract Multi-Agent Interaction
arXiv:2007.09327, 2020
Abstract | BibTex | arXiv
securityagent-modelling
Abstract:
Current methods for authentication based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach to authentication in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. The security of this approach rests upon the difficulty of learning the model parameters of interacting agents, a problem which we conjecture is also hard for quantum computing. We develop methods which enable a server agent to classify a client agent as either legitimate or adversarial based on their past interactions. Moreover, we use reinforcement learning techniques to train server policies which effectively probe the client's decisions to achieve more sample-efficient authentication, while making modelling attacks as difficult as possible via entropy-maximization principles. We empirically validate our methods for authenticating legitimate users while detecting different types of adversarial attacks.
@misc{ahmed2020quantumsecure,
title={Quantum-Secure Authentication via Abstract Multi-Agent Interaction},
author={Ibrahim H. Ahmed and Josiah P. Hanna and Stefano V. Albrecht},
year={2020},
eprint={2007.09327},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
Stefano V. Albrecht, Cillian Brewitt, John Wilhelm, Balint Gyevnar, Francisco Eiras, Mihai Dobre, Subramanian Ramamoorthy
Interpretable Goal-based Prediction and Planning for Autonomous Driving
arXiv:2002.02277, 2020
Abstract | BibTex | arXiv
autonomous-drivinggoal-recognition
Abstract:
We propose an integrated prediction and planning system for autonomous driving which uses rational inverse planning to recognise the goals of other vehicles. Goal recognition informs a Monte Carlo Tree Search (MCTS) algorithm to plan optimal maneuvers for the ego vehicle. Inverse planning and MCTS utilise a shared set of defined maneuvers and macro actions to construct plans which are explainable by means of rationality principles. Evaluation in simulations of urban driving scenarios demonstrate the system's ability to robustly recognise the goals of other vehicles, enabling our vehicle to exploit non-trivial opportunities to significantly reduce driving times. In each scenario, we extract intuitive explanations for the predictions which justify the system's decisions.
@misc{albrecht2020integrating,
title={Interpretable Goal-based Prediction and Planning for Autonomous Driving},
author={Stefano V. Albrecht and Cillian Brewitt and John Wilhelm and Balint Gyevnar and Francisco Eiras and Mihai Dobre and Subramanian Ramamoorthy},
year={2020},
eprint={2002.02277},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
Henry Pulver, Francisco Eiras, Ludovico Carozza, Majd Hawasly, Stefano V. Albrecht, Subramanian Ramamoorthy
PILOT: Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving
arXiv:2011.00509, 2020
Abstract | BibTex | arXiv
autonomous-driving
Abstract:
Achieving the right balance between planning quality, safety and runtime efficiency is a major challenge for autonomous driving research. Optimisation-based planners are typically capable of producing high-quality, safe plans, but at the cost of efficiency. We present PILOT, a two-stage planning framework comprising an imitation neural network and an efficient optimisation component that guarantees the satisfaction of requirements of safety and comfort. The neural network is trained to imitate an expensive-to-run optimisation-based planning system with the same objective as the efficient optimisation component of PILOT. We demonstrate in simulated autonomous driving experiments that the proposed framework achieves a significant reduction in runtime when compared to the optimisation-based expert it imitates, without sacrificing the planning quality.
@misc{pulver2020pilot,
title={{PILOT:} Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving},
author={Henry Pulver and Francisco Eiras and Ludovico Carozza and Majd Hawasly and Stefano V. Albrecht and Subramanian Ramamoorthy},
year={2020},
eprint={2011.00509},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
Francisco Eiras, Majd Hawasly, Stefano V. Albrecht, Subramanian Ramamoorthy
Two-Stage Optimization-based Motion Planner for Safe Urban Driving
arXiv:2002.02215, 2020
Abstract | BibTex | arXiv
autonomous-driving
Abstract:
Recent road trials have shown that guaranteeing the safety of driving decisions is essential for the wider adoption of autonomous vehicle technology. One promising direction is to pose safety requirements as planning constraints in nonlinear, nonconvex optimization problems of motion synthesis. However, many implementations of this approach are limited by uncertain convergence and local optimality of the solutions achieved, affecting overall robustness. To improve upon these issues, we propose a novel two-stage optimization framework: in the first stage, we find a solution to a Mixed-Integer Linear Programming (MILP) formulation of the motion synthesis problem, the output of which initializes a second Nonlinear Programming (NLP) stage. The MILP stage enforces hard constraints of safety and road rule compliance generating a solution in the right subspace, while the NLP stage refines the solution within the safety bounds for feasibility and smoothness. We demonstrate the effectiveness of our framework via simulated experiments of complex urban driving scenarios, outperforming a state-of-the-art baseline in metrics of convergence, comfort and progress.
@misc{eiras2020twostage,
title={Two-Stage Optimization-based Motion Planner for Safe Urban Driving},
author={Francisco Eiras and Majd Hawasly and Stefano V. Albrecht and Subramanian Ramamoorthy},
year={2020},
eprint={2002.02215},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
2019
Maciej Wiatrak, Stefano V. Albrecht, Andrew Nystrom
Stabilizing Generative Adversarial Networks: A Survey
arXiv:1910.00927, 2019
Abstract | BibTex | arXiv
surveysecuritygan
Abstract:
Generative Adversarial Networks (GANs) are a type of generative model which have received much attention due to their ability to model complex real-world data. Despite their recent successes, the process of training GANs remains challenging, suffering from instability problems such as non-convergence, vanishing or exploding gradients, and mode collapse. In recent years, a diverse set of approaches have been proposed which focus on stabilizing the GAN training procedure. The purpose of this survey is to provide a comprehensive overview of the GAN training stabilization methods which can be found in the literature. We discuss the advantages and disadvantages of each approach, offer a comparative summary, and conclude with a discussion of open problems.
@misc{wiatrak2019stabilizing,
title={Stabilizing Generative Adversarial Networks: A Survey},
author={Maciej Wiatrak and Stefano V. Albrecht and Andrew Nystrom},
year={2019},
eprint={1910.00927},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Georgios Papoudakis, Filippos Christianos, Arrasy Rahman, Stefano V. Albrecht
Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning
arXiv:1906.04737, 2019
Abstract | BibTex | arXiv
surveydeep-rlmulti-agent-rl
Abstract:
Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.
@misc{papoudakis2019dealing,
title={Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning},
author={Georgios Papoudakis and Filippos Christianos and Arrasy Rahman and Stefano V. Albrecht},
year={2019},
eprint={1906.04737},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
2018
Stefano V. Albrecht, Peter Stone
Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems
Artificial Intelligence (AIJ), 2018
Abstract | BibTex | arXiv | Publisher
AIJsurveyagent-modellinggoal-recognition
Abstract:
Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.
@article{ albrecht2018modelling,
title = {Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems},
author = {Stefano V. Albrecht and Peter Stone},
journal = {Artificial Intelligence},
volume = {258},
pages = {66--95},
year = {2018},
publisher = {Elsevier},
note = {DOI: 10.1016/j.artint.2018.01.002}
}
Craig Innes, Alex Lascarides, Stefano V. Albrecht, Subramanian Ramamoorthy, Benjamin Rosman
Reasoning about Unforeseen Possibilities During Policy Learning
arXiv:1801.03331, 2018
Abstract | BibTex | arXiv
causal
Abstract:
Methods for learning optimal policies in autonomous agents often assume that the way the domain is conceptualised - its possible states and actions and their causal structure - is known in advance and does not change during learning. This is an unrealistic assumption in many scenarios, because new evidence can reveal important information about what is possible, possibilities that the agent was not aware existed prior to learning. We present a model of an agent which both discovers and learns to exploit unforeseen possibilities using two sources of evidence: direct interaction with the world and communication with a domain expert. We use a combination of probabilistic and symbolic reasoning to estimate all components of the decision problem, including its set of random variables and their causal dependencies. Agent simulations show that the agent converges on optimal polices even when it starts out unaware of factors that are critical to behaving optimally.
@misc{innes2018reasoning,
title={Reasoning about Unforeseen Possibilities During Policy Learning},
author={Craig Innes and Alex Lascarides and Stefano V. Albrecht and Subramanian Ramamoorthy and Benjamin Rosman},
year={2018},
eprint={1801.03331},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
2017
Stefano V. Albrecht, Somchaya Liemhetcharat, Peter Stone
Special Issue on Multiagent Interaction without Prior Coordination: Guest Editorial
Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS), 2017
Abstract | BibTex | Publisher | MIPC Workshop Series
JAAMASad-hoc-teamwork
Abstract:
This special issue of the Journal of Autonomous Agents and Multi-Agent Systems sought research articles on the emerging topic of multiagent interaction without prior coordination. Topics of interest included empirical and theoretical investigations of issues arising from assumptions of prior coordination, as well as solutions in the form of novel models and algorithms for effective multiagent interaction without prior coordination.
@article{ albrecht2017special,
title = {Special Issue on Multiagent Interaction without Prior Coordination: Guest Editorial},
author = {Stefano V. Albrecht and Somchaya Liemhetcharat and Peter Stone},
journal = {Autonomous Agents and Multi-Agent Systems},
volume = {31},
issue = {4},
pages = {765--766},
year = {2017},
publisher = {Springer},
url = {http://dx.doi.org/10.1007/s10458-016-9358-0}
}
Stefano V. Albrecht, Peter Stone
Reasoning about Hypothetical Agent Behaviours and their Parameters
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2017
Abstract | BibTex | arXiv
AAMASad-hoc-teamworkagent-modelling
Abstract:
Agents can achieve effective interaction with previously unknown other agents by maintaining beliefs over a set of hypothetical behaviours, or types, that these agents may have. A current limitation in this method is that it does not recognise parameters within type specifications, because types are viewed as blackbox mappings from interaction histories to probability distributions over actions. In this work, we propose a general method which allows an agent to reason about both the relative likelihood of types and the values of any bounded continuous parameters within types. The method maintains individual parameter estimates for each type and selectively updates the estimates for some types after each observation. We propose different methods for the selection of types and the estimation of parameter values. The proposed methods are evaluated in detailed experiments, showing that updating the parameter estimates of a single type after each observation can be sufficient to achieve good performance.
@inproceedings{ albrecht2017reasoning,
title = {Reasoning about Hypothetical Agent Behaviours and their Parameters},
author = {Stefano V. Albrecht and Peter Stone},
booktitle = {Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems},
pages = {547--555},
year = {2017}
}
Stefano V. Albrecht, Subramanian Ramamoorthy
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks (Extended Abstract)
International Joint Conference on Artificial Intelligence (IJCAI), 2017
Abstract | BibTex | arXiv
IJCAIstate-estimationcausal
Abstract:
Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and uncertain observations. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF is evaluated in both synthetic processes and a simulated multi-robot warehouse, where it outperformed alternative filtering methods by exploiting passivity.
@inproceedings{ albrecht2017causality,
title = {Exploiting Causality for Selective Belief Filtering in Dynamic {B}ayesian Networks (Extended Abstract)},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 26th International Joint Conference on Artificial Intelligence},
address = {Melbourne, Australia},
month = {August},
year = {2017}
}
2016
Stefano V. Albrecht, Jacob W. Crandall, Subramanian Ramamoorthy
Belief and Truth in Hypothesised Behaviours
Artificial Intelligence (AIJ), 2016
Abstract | BibTex | arXiv | Publisher
AIJad-hoc-teamworkagent-modelling
Abstract:
There is a long history in game theory on the topic of Bayesian or “rational” learning, in which each player maintains beliefs over a set of alternative behaviours, or types, for the other players. This idea has gained increasing interest in the artificial intelligence (AI) community, where it is used as a method to control a single agent in a system composed of multiple agents with unknown behaviours. The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents. The game theory literature studies this idea primarily in the context of equilibrium attainment. In contrast, many AI applications have a focus on task completion and payoff maximisation. With this perspective in mind, we identify and address a spectrum of questions pertaining to belief and truth in hypothesised types. We formulate three basic ways to incorporate evidence into posterior beliefs and show when the resulting beliefs are correct, and when they may fail to be correct. Moreover, we demonstrate that prior beliefs can have a significant impact on our ability to maximise payoffs in the long-term, and that they can be computed automatically with consistent performance effects. Furthermore, we analyse the conditions under which we are able complete our task optimally, despite inaccuracies in the hypothesised types. Finally, we show how the correctness of hypothesised types can be ascertained during the interaction via an automated statistical analysis.
@article{ albrecht2016belief,
title = {Belief and Truth in Hypothesised Behaviours},
author = {Stefano V. Albrecht and Jacob W. Crandall and Subramanian Ramamoorthy},
journal = {Artificial Intelligence},
volume = {235},
pages = {63--94},
year = {2016},
publisher = {Elsevier},
note = {DOI: 10.1016/j.artint.2016.02.004}
}
Stefano V. Albrecht, Subramanian Ramamoorthy
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks
Journal of Artificial Intelligence Research (JAIR), 2016
Abstract | BibTex | arXiv | Publisher
JAIRstate-estimationcausal
Abstract:
Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and noisy observations. This can be a hard problem in complex processes with large state spaces. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF produces exact belief states under certain assumptions and approximate belief states otherwise, where the approximation error is bounded by the degree of uncertainty in the process. We show empirically, in synthetic processes with varying sizes and degrees of passivity, that PSBF is faster than several alternative methods while achieving competitive accuracy. Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multi-robot warehouse, and how PSBF can exploit this to accelerate the filtering task.
@article{ albrecht2016causality,
title = {Exploiting Causality for Selective Belief Filtering in Dynamic {B}ayesian Networks},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
journal = {Journal of Artificial Intelligence Research},
volume = {55},
pages = {1135--1178},
year = {2016},
publisher = {AI Access Foundation},
note = {DOI: 10.1613/jair.5044}
}
2015
Stefano V. Albrecht, Subramanian Ramamoorthy
Are You Doing What I Think You Are Doing? Criticising Uncertain Agent Models
Conference on Uncertainty in Artificial Intelligence (UAI), 2015
Abstract | BibTex | arXiv
UAIagent-modelling
Abstract:
The key for effective interaction in many multiagent applications is to reason explicitly about the behaviour of other agents, in the form of a hypothesised behaviour. While there exist several methods for the construction of a behavioural hypothesis, there is currently no universal theory which would allow an agent to contemplate the correctness of a hypothesis. In this work, we present a novel algorithm which decides this question in the form of a frequentist hypothesis test. The algorithm allows for multiple metrics in the construction of the test statistic and learns its distribution during the interaction process, with asymptotic correctness guarantees. We present results from a comprehensive set of experiments, demonstrating that the algorithm achieves high accuracy and scalability at low computational costs.
@inproceedings{ albrecht2015criticising,
title = {Are You Doing What I Think You Are Doing? Criticising Uncertain Agent Models},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence},
pages = {52--61},
year = {2015}
}
Stefano V. Albrecht, Jacob W. Crandall, Subramanian Ramamoorthy
An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types
AAAI Conference on Artificial Intelligence (AAAI), 2015
Abstract | BibTex | arXiv | Appendix
AAAIagent-modelling
Abstract:
Many multiagent applications require an agent to learn quickly how to interact with previously unknown other agents. To address this problem, researchers have studied learning algorithms which compute posterior beliefs over a hypothesised set of policies, based on the observed actions of the other agents. The posterior belief is complemented by the prior belief, which specifies the subjective likelihood of policies before any actions are observed. In this paper, we present the first comprehensive empirical study on the practical impact of prior beliefs over policies in repeated interactions. We show that prior beliefs can have a significant impact on the long-term performance of such methods, and that the magnitude of the impact depends on the depth of the planning horizon. Moreover, our results demonstrate that automatic methods can be used to compute prior beliefs with consistent performance effects. This indicates that prior beliefs could be eliminated as a manual parameter and instead be computed automatically.
@inproceedings{ albrecht2015empirical,
title = {An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types},
author = {Stefano V. Albrecht and Jacob W. Crandall and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 29th AAAI Conference on Artificial Intelligence},
pages = {1988--1994},
year = {2015}
}
Stefano V. Albrecht, Jacob W. Crandall, Subramanian Ramamoorthy
E-HBA: Using Action Policies for Expert Advice and Agent Typification
AAAI Workshop on Multiagent Interaction without Prior Coordination (AAAI), 2015
Abstract | BibTex | arXiv | Appendix
AAAIad-hoc-teamworkagent-modelling
Abstract:
Past research has studied two approaches to utilise predefined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents. In this work, we bring these complementary views together in the form of a novel meta-algorithm, called Expert-HBA (E-HBA), which can be applied to any expert algorithm that considers the average (or total) payoff an expert has yielded in the past. E-HBA gradually mixes the past payoff with a predicted future payoff, which is computed using the type-based characterisation. We present results from a comprehensive set of repeated matrix games, comparing the performance of several well-known expert algorithms with and without the aid of E-HBA. Our results show that E-HBA has the potential to significantly improve the performance of expert algorithms.
@inproceedings{ albrecht2015ehba,
title = {{E-HBA}: Using Action Policies for Expert Advice and Agent Typification},
author = {Stefano V. Albrecht and Jacob W. Crandall and Subramanian Ramamoorthy},
booktitle = {AAAI Workshop on Multiagent Interaction without Prior Coordination},
address = {Austin, Texas, USA},
month = {January},
year = {2015}
}
2014
Stefano V. Albrecht, Subramanian Ramamoorthy
On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems
Conference on Uncertainty in Artificial Intelligence (UAI), 2014
Abstract | BibTex | arXiv | Appendix
UAIagent-modelling
Abstract:
While many multiagent algorithms are designed for homogeneous systems (i.e. all agents are identical), there are important applications which require an agent to coordinate its actions without knowing a priori how the other agents behave. One method to make this problem feasible is to assume that the other agents draw their latent policy (or type) from a specific set, and that a domain expert could provide a specification of this set, albeit only a partially correct one. Algorithms have been proposed by several researchers to compute posterior beliefs over such policy libraries, which can then be used to determine optimal actions. In this paper, we provide theoretical guidance on two central design parameters of this method: Firstly, it is important that the user choose a posterior which can learn the true distribution of latent types, as otherwise suboptimal actions may be chosen. We analyse convergence properties of two existing posterior formulations and propose a new posterior which can learn correlated distributions. Secondly, since the types are provided by an expert, they may be inaccurate in the sense that they do not predict the agents’ observed actions. We provide a novel characterisation of optimality which allows experts to use efficient model checking algorithms to verify optimality of types.
@inproceedings{ albrecht2014convergence,
title = {On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence},
pages = {12--21},
year = {2014}
}
2013
Stefano V. Albrecht, Subramanian Ramamoorthy
A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013
Abstract | BibTex | arXiv (full technical report) | Extended Abstract
AAMASad-hoc-teamworkagent-modelling
Abstract:
The ad hoc coordination problem is to design an autonomous agent which is able to achieve optimal flexibility and efficiency in a multiagent system with no mechanisms for prior coordination. We conceptualise this problem formally using a game-theoretic model, called the stochastic Bayesian game, in which the behaviour of a player is determined by its private information, or type. Based on this model, we derive a solution, called Harsanyi-Bellman Ad Hoc Coordination (HBA), which utilises the concept of Bayesian Nash equilibrium in a planning procedure to find optimal actions in the sense of Bellman optimal control. We evaluate HBA in a multiagent logistics domain called level-based foraging, showing that it achieves higher flexibility and efficiency than several alternative algorithms. We also report on a human-machine experiment at a public science exhibition in which the human participants played repeated Prisoner's Dilemma and Rock-Paper-Scissors against HBA and alternative algorithms, showing that HBA achieves equal efficiency and a significantly higher welfare and winning rate.
@inproceedings{ albrecht2013game,
title = {A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems},
address = {St. Paul, Minnesota, USA},
month = {May},
year = {2013}
}
2012
Stefano V. Albrecht, Subramanian Ramamoorthy
Comparative Evaluation of Multiagent Learning Algorithms in a Diverse Set of Ad Hoc Team Problems
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012
Abstract | BibTex | arXiv
AAMASmulti-agent-rlad-hoc-teamwork
Abstract:
This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilizing different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in ad hoc team problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium.
@inproceedings{ albrecht2012comparative,
title = {Comparative Evaluation of {MAL} Algorithms in a Diverse Set of Ad Hoc Team Problems},
author = {Stefano V. Albrecht and Subramanian Ramamoorthy},
booktitle = {Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems},
pages = {349--356},
year = {2012}
}