Publications
For news about publications, follow us on X/Twitter:
Click on any author names or tags to filter publications.
All topic tags:
surveydeep-rlmulti-agent-rlagent-modellingad-hoc-teamworkautonomous-drivinggoal-recognitionexplainable-aicausalgeneralisationsecurityemergent-communicationiterated-learningintrinsic-rewardsimulatorstate-estimationdeep-learningtransfer-learning
Selected tags (click to remove):
Duohan-Zhang
2022
Rujie Zhong, Duohan Zhang, Lukas Schäfer, Stefano V. Albrecht, Josiah P. Hanna
Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Conference on Neural Information Processing Systems, 2022
Abstract | BibTex | arXiv | Code
NeurIPSdeep-rl
Abstract:
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to match the expected distribution of on-policy data after observing only a finite number of trajectories and this failure hinders data-efficient policy evaluation. Towards improved data-efficiency, we show how non-i.i.d., off-policy sampling can produce data that more closely matches the expected on-policy data distribution and consequently increases the accuracy of the Monte Carlo estimator for policy evaluation. We introduce a method called Robust On-Policy Sampling and demonstrate theoretically and empirically that it produces data that converges faster to the expected on-policy distribution compared to on-policy sampling. Empirically, we show that this faster convergence leads to lower mean squared error policy value estimates.
@inproceedings{zhong2022datacollection,
title={Robust On-Policy Data Collection for Data Efficient Policy Evaluation},
author={Rujie Zhong and Duohan Zhang and Lukas Sch\"afer and Stefano V. Albrecht and Josiah P. Hanna},
booktitle={Conference on Neural Information Processing Systems},
year={2022}
}