Publications
For news about publications, follow us on X/Twitter:
Click on any author names or tags to filter publications.
All topic tags:
surveydeep-rlmulti-agent-rlagent-modellingad-hoc-teamworkautonomous-drivinggoal-recognitionexplainable-aicausalgeneralisationsecurityemergent-communicationiterated-learningintrinsic-rewardsimulatorstate-estimationdeep-learningtransfer-learning
Selected tags (click to remove):
Josiah-P.-Hanna
2022
Rujie Zhong, Duohan Zhang, Lukas Schäfer, Stefano V. Albrecht, Josiah P. Hanna
Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Conference on Neural Information Processing Systems, 2022
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to match the expected distribution of on-policy data after observing only a finite number of trajectories and this failure hinders data-efficient policy evaluation. Towards improved data-efficiency, we show how non-i.i.d., off-policy sampling can produce data that more closely matches the expected on-policy data distribution and consequently increases the accuracy of the Monte Carlo estimator for policy evaluation. We introduce a method called Robust On-Policy Sampling and demonstrate theoretically and empirically that it produces data that converges faster to the expected on-policy distribution compared to on-policy sampling. Empirically, we show that this faster convergence leads to lower mean squared error policy value estimates.
@inproceedings{zhong2022datacollection,
title={Robust On-Policy Data Collection for Data Efficient Policy Evaluation},
author={Rujie Zhong and Duohan Zhang and Lukas Sch\"afer and Stefano V. Albrecht and Josiah P. Hanna},
booktitle={Conference on Neural Information Processing Systems},
year={2022}
}
Lukas Schäfer, Filippos Christianos, Josiah P. Hanna, Stefano V. Albrecht
Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration
International Conference on Autonomous Agents and Multi-Agent Systems, 2022
Abstract | BibTex | arXiv | Code
AAMASdeep-rlintrinsic-reward
Abstract:
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. Our results show that DeRL is more robust to varying scale and rate of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically-motivated baselines in fewer interactions. Lastly, we discuss the challenge of distribution shift and show that divergence constraint regularisers can successfully minimise instability caused by divergence of exploration and exploitation policies.
@inproceedings{schaefer2022derl,
title={Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration},
author={Lukas Schäfer and Filippos Christianos and Josiah P. Hanna and Stefano V. Albrecht},
booktitle={International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
year={2022}
}
2021
Rujie Zhong, Josiah P. Hanna, Lukas Schäfer, Stefano V. Albrecht
Robust On-Policy Data Collection for Data-Efficient Policy Evaluation
NeurIPS Workshop on Offline Reinforcement Learning, 2021
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
This paper considers how to complement offline reinforcement learning (RL) data with additional data collection for the task of policy evaluation. In policy evaluation, the task is to estimate the expected return of an evaluation policy on an environment of interest. Prior work on offline policy evaluation typically only considers a static dataset. We consider a setting where we can collect a small amount of additional data to combine with a potentially larger offline RL dataset. We show that simply running the evaluation policy – on-policy data collection – is sub-optimal for this setting. We then introduce two new data collection strategies for policy evaluation, both of which consider previously collected data when collecting future data so as to reduce distribution shift (or sampling error) in the entire dataset collected. Our empirical results show that compared to on-policy sampling, our strategies produce data with lower sampling error and generally lead to lower mean-squared error in policy evaluation for any total dataset size. We also show that these strategies can start from initial off-policy data, collect additional data, and then use both the initial and new data to produce low mean-squared error policy evaluation without using off-policy corrections.
@inproceedings{zhong2021robust,
title={Robust On-Policy Data Collection for Data-Efficient Policy Evaluation},
author={Rujie Zhong and Josiah P. Hanna and Lukas Sch\"afer and Stefano V. Albrecht},
booktitle={NeurIPS Workshop on Offline Reinforcement Learning (OfflineRL)},
year={2021}
}
Josiah P. Hanna, Arrasy Rahman, Elliot Fosong, Francisco Eiras, Mihai Dobre, John Redford, Subramanian Ramamoorthy, Stefano V. Albrecht
Interpretable Goal Recognition in the Presence of Occluded Factors for Autonomous Vehicles
IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021
Abstract | BibTex | arXiv
IROSautonomous-drivinggoal-recognitionexplainable-ai
Abstract:
Recognising the goals or intentions of observed vehicles is a key step towards predicting the long-term future behaviour of other agents in an autonomous driving scenario. When there are unseen obstacles or occluded vehicles in a scenario, goal recognition may be confounded by the effects of these unseen entities on the behaviour of observed vehicles. Existing prediction algorithms that assume rational behaviour with respect to inferred goals may fail to make accurate long-horizon predictions because they ignore the possibility that the behaviour is influenced by such unseen entities. We introduce the Goal and Occluded Factor Inference (GOFI) algorithm which bases inference on inverse-planning to jointly infer a probabilistic belief over goals and potential occluded factors. We then show how these beliefs can be integrated into Monte Carlo Tree Search (MCTS). We demonstrate that jointly inferring goals and occluded factors leads to more accurate beliefs with respect to the true world state and allows an agent to safely navigate several scenarios where other baselines take unsafe actions leading to collisions.
@inproceedings{hanna2021interpretable,
title={Interpretable Goal Recognition in the Presence of Occluded Factors for Autonomous Vehicles},
author={Josiah P. Hanna and Arrasy Rahman and Elliot Fosong and Francisco Eiras and Mihai Dobre and John Redford and Subramanian Ramamoorthy and Stefano V. Albrecht},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021}
}
Ibrahim H. Ahmed, Josiah P. Hanna, Elliot Fosong, Stefano V. Albrecht
Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction
International Conference on Practical Applications of Agents and Multi-Agent Systems, 2021
Abstract | BibTex | arXiv | Publisher | Code
PAAMSsecurityagent-modelling
Abstract:
Current methods for authentication and key agreement based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach based on artificial intelligence research in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. Authentication and key agreement are decided based on the agents' observed behaviors during the interaction. The security of this approach rests upon the difficulty of modeling the decisions of interacting agents from limited observations, a problem which we conjecture is also hard for quantum computing. We release PyAMI, a prototype authentication and key agreement system based on the proposed method. We empirically validate our method for authenticating legitimate users while detecting different types of adversarial attacks. Finally, we show how reinforcement learning techniques can be used to train server models which effectively probe a client's decisions to achieve more sample-efficient authentication.
@inproceedings{ahmed2021quantum,
title={Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction},
author={Ibrahim H. Ahmed and Josiah P. Hanna and Elliot Fosong and Stefano V. Albrecht},
booktitle={International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS)},
year={2021}
}
2020
Ibrahim H. Ahmed, Josiah P. Hanna, Stefano V. Albrecht
Quantum-Secure Authentication via Abstract Multi-Agent Interaction
arXiv:2007.09327, 2020
Abstract | BibTex | arXiv
securityagent-modelling
Abstract:
Current methods for authentication based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach to authentication in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. The security of this approach rests upon the difficulty of learning the model parameters of interacting agents, a problem which we conjecture is also hard for quantum computing. We develop methods which enable a server agent to classify a client agent as either legitimate or adversarial based on their past interactions. Moreover, we use reinforcement learning techniques to train server policies which effectively probe the client's decisions to achieve more sample-efficient authentication, while making modelling attacks as difficult as possible via entropy-maximization principles. We empirically validate our methods for authenticating legitimate users while detecting different types of adversarial attacks.
@misc{ahmed2020quantumsecure,
title={Quantum-Secure Authentication via Abstract Multi-Agent Interaction},
author={Ibrahim H. Ahmed and Josiah P. Hanna and Stefano V. Albrecht},
year={2020},
eprint={2007.09327},
archivePrefix={arXiv},
primaryClass={cs.CR}
}