Publications
For news about publications, follow us on X/Twitter:
Click on any author names or tags to filter publications.
All topic tags:
surveydeep-rlmulti-agent-rlagent-modellingad-hoc-teamworkautonomous-drivinggoal-recognitionexplainable-aicausalgeneralisationsecurityemergent-communicationiterated-learningintrinsic-rewardsimulatorstate-estimationdeep-learningtransfer-learning
Selected tags (click to remove):
ICML
2024
Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design
International Conference on Machine Learning, 2024
Abstract | BibTex | arXiv
ICMLdeep-rl
Abstract:
Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when they share characteristics with the environments they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the implicit regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which have more control over the data generation mechanism. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (DRED). DRED generates levels using a generative model trained over an initial set of level parameters, reducing distributional shift, and achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods.
@inproceedings{garcin2024dred,
title={{DRED}: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design},
author={Samuel Garcin and James Doran and Shangmin Guo and Christopher G. Lucas and Stefano V. Albrecht},
year={2024},
booktitle={International Conference on Machine Learning (ICML)}
}
2021
Arrasy Rahman, Niklas Höpner, Filippos Christianos, Stefano V. Albrecht
Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning
International Conference on Machine Learning, 2021
Abstract | BibTex | arXiv | Video | Code
ICMLdeep-rlagent-modellingad-hoc-teamwork
Abstract:
Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with teammates without prior coordination mechanisms, including joint training. Prior work in this area has focused on closed teams in which the number of agents is fixed. In this work, we consider open teams by allowing agents with different fixed policies to enter and leave the environment without prior notification. Our solution builds on graph neural networks to learn agent models and joint-action value models under varying team compositions. We contribute a novel action-value computation that integrates the agent model and joint-action value model to produce action-value estimates. We empirically demonstrate that our approach successfully models the effects other agents have on the learner, leading to policies that robustly adapt to dynamic team compositions and significantly outperform several alternative methods.
@inproceedings{rahman2021open,
title={Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning},
author={Arrasy Rahman and Niklas H\"opner and Filippos Christianos and Stefano V. Albrecht},
booktitle={International Conference on Machine Learning (ICML)},
year={2021}
}
Filippos Christianos, Georgios Papoudakis, Arrasy Rahman, Stefano V. Albrecht
Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing
International Conference on Machine Learning, 2021
Abstract | BibTex | arXiv | Video | Code
ICMLdeep-rlmulti-agent-rl
Abstract:
Sharing parameters in multi-agent deep reinforcement learning has played an essential role in allowing algorithms to scale to a large number of agents. Parameter sharing between agents significantly decreases the number of trainable parameters, shortening training times to tractable levels, and has been linked to more efficient learning. However, having all agents share the same parameters can also have a detrimental effect on learning. We demonstrate the impact of parameter sharing methods on training speed and converged returns, establishing that when applied indiscriminately, their effectiveness is highly dependent on the environment. We propose a novel method to automatically identify agents which may benefit from sharing parameters by partitioning them based on their abilities and goals. Our approach combines the increased sample efficiency of parameter sharing with the representational capacity of multiple independent networks to reduce training time and increase final returns.
@inproceedings{christianos2021scaling,
title={Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing},
author={Filippos Christianos and Georgios Papoudakis and Arrasy Rahman and Stefano V. Albrecht},
booktitle={International Conference on Machine Learning (ICML)},
year={2021}
}
Lukas Schäfer, Filippos Christianos, Josiah Hanna, Stefano V. Albrecht
Decoupling Exploration and Exploitation in Reinforcement Learning
ICML Workshop on Unsupervised Reinforcement Learning, 2021
Abstract | BibTex | arXiv | Code
ICMLdeep-rlintrinsic-reward
Abstract:
Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. We show that DeRL is more robust to scaling and speed of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically motivated baselines in fewer interactions.
@inproceedings{schaefer2021decoupling,
title={Decoupling Exploration and Exploitation in Reinforcement Learning},
author={Lukas Schäfer and Filippos Christianos and Josiah Hanna and Stefano V. Albrecht},
booktitle={ICML Workshop on Unsupervised Reinforcement Learning (URL)},
year={2021}
}