[1] |
Dosovitskiy A,Ros G,Codevilla F,et al.CARLA:An open urban driving simulator[C]∥Proc of the 1st Conference on Robot Learning,2017:1-16.
|
[2] |
Gleave A,Dennis M,Wild C,et al.Adversarial policies:Attacking deep reinforcement learning[J].arXiv:1905.10615,2019.
|
[3] |
Szegedy C,Zaremba W,Sutskever I,et al.Intriguing properties of neural networks[J].arXiv:1312.6199,2013.
|
[4] |
Kos J,Song D.Delving into adversarial attacks on deep policies[J].arXiv:1705.06452,2017.
|
[5] |
Brown N,Sandholm T.Superhuman AI for multiplayer poker[J].Science,2019,365(6456):885-890.
|
[6] |
Jain M,Korzhyk D,Vaněk O,et al.A double oracle algorithm for zero-sum security games on graphs[C]∥Proc of the 10th International Conference on Autonomous Agents and Multiagent Systems,2011:327-334.
|
[7] |
Heinrich J, Lanctot M, Silver D. Fictitious self-play in extensive-form games[C]∥Proc of the 32nd International Conference on Machine Learning,2015:805-813.
|
[8] |
Li S H,Wu Y,Cui X Y,et al.Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient[C]∥Proc of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Education Advances in Artificial Intelligence,2019:4213-4220.
|
[9] |
Lanctot M,Zambaldi V,Gruslys A,et al.A unified game-theoretic approach to multiagent reinforcement learning[C]∥Proc of the 31st International Conference on Neural Information Processing Systems,2017:4193-4206.
|
[10] |
Jaderberg M,Czarnecki W M,Dunning I,et al.Human-level performance in 3D multiplayer games with population-based reinforcement learning[J].Science,2019,364(6443):859-865.
|
[11] |
Fatima S S,Wooldridge M,Jennings N R.A linear approximation method for the Shapley value[J].Artificial Intelligence,2008,172(14):1673-1699.
|
[12] |
Hong Z W,Shann T Y,Su S Y,et al.Diversity-driven exploration strategy for deep reinforcement learning[C]∥Proc of the 32nd Conference on Neural Information Processing Systems,2018:10510-10521.
|
[13] |
Eysenbach B, Gupta A,Ibarz J,et al.Diversity is all you need:Learning skills without a reward function[J].arXiv:1802.06070,2018.
|
[14] |
Conti E,Madhavan V,Petroski Such F,et al.Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[C]∥Proc of the 32nd International Conference on Neural Information Processing Systems, 2018:5032-5043.
|
[15] |
Parker-Holder J,Pacchiano A,Choromanski K,et al.Effective diversity in population based reinforcement learning[C]∥Proc of the 34th International Conference on Neural Information Processing Systems,2020:18050-18062.
|
[16] |
Kumar S,Kumar A,Levine S,et al.One solution is not all you need:Few-shot extrapolation via structured MaxEnt RL[C]∥Proc of the 34th International Conference on Neural Information Processing Systems, 2020:8198-8210.
|
[17] |
Gangwani T,Peng J,Zhou Y S.Harnessing distribution ratio estimators for learning agents with quality and diversity[C]∥Proc of the 4th Conference on Robot Learning,2020:2206-2215.
|
[18] |
Tang Z G,Yu C,Chen B Y,et al.Discovering diverse multi-agent strategic behavior via reward randomization[J].arXiv:2103.04564,2021.
|
[19] |
Li C H,Wang T H,Wu C T,et al.Celebrating diversity in shared multi-agent reinforcement learning[C]∥Proc of the 35th Conference on Neural Information Processing Systems,2021:3991-4002.
|
[20] |
Mckee K R, Leibo J Z,Beattie C,et al.Quantifying the effects of environment and population diversity in multi- agent reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2022,36(1):Article 21.
|
[21] |
Masood M, Doshi-Velez F.Diversity-inducing policy gradient:Using maximum mean discrepancy to find a set of diverse policies[C]∥Proc of the 28th International Joint Conference on Artificial Intelligence,2019:5923-5929.
|
[22] |
Zhang Y B,Yu W H,Turk G.Learning novel policies for tasks[C]∥Proc of the 36th International Conference on Machine Learning,2019:7483-7492.
|
[23] |
Sun H,Peng Z H,Dai B,et al.Novel policy seeking with constrained optimization[J].arXiv:2005.10696,2020.
|
[24] |
Ghasemi M, Crafts E S,Zhao B,et al.Multiple plans are better than one:Diverse stochastic planning[C]∥Proc of the International Conference on Automated Planning and Scheduling,2021:140-148.
|
[25] |
Zahavy T,Odonoghue B,Barreto A,et al.Discovering diverse nearly optimal policies with successor features[J].arXiv:2106.00669,2021.
|
[26] |
Zhou Z H,Fu W,Zhang B L,et al.Continuously discovering novel strategies via reward-switching policy optimization[J].arXiv:2204.02246,2022.
|
[27] |
Cully A,Clune J,Tarapore D,et al.Robots that can adapt like animals[J].Nature,2015,521(7553):503-507.
|
[28] |
Bodnar C,Day B,Lió P.Proximal distilled evolutionary reinforcement learning[C]∥Proc of the AAAI Conference on Artificial Intelligence,2020:3283-3290.
|
[29] |
Pugh J K,Soros L B,Stanley K O.Quality diversity:A new frontier for evolutionary computation[J].Frontiers in Robotics and AI,2016,3:Article 40.
|
[30] |
Colas C,Madhavan V,Huizinga J,et al.Scaling MAP-Elites to deep neuroevolution[C]∥Proc of the 2020 Genetic and Evolutionary Computation Conference,2020:67-75.
|
[31] |
Fontaine M C,Nikolaidis S.Differentiable quality diversity[C]∥Proc of the 35th Conference on Neural Information Processing Systems,2021:10040-10052.
|
[32] |
Nilsson O, Cully A.Policy gradient assisted MAP-Elites[C]∥Proc of the Genetic and Evolutionary Computation Conference,2021:866-875.
|
[33] |
Pierrot T,Macé V,Chalumeau F,et al.Diversity policy gradient for sample efficient quality-diversity optimization[C]∥Proc of the Genetic and Evolutionary Computation Confe- rence,2022:1075-1083.
|
[34] |
Fu H B,Liu W M,Wu S,et al.Actor-critic policy optimization in a large-scale imperfect-information game[C]∥Proc of the 10th International Conference on Learning Representations,2022:1.
|
[35] |
Tjanaka B,Fontaine M C,Togelius J,et al.Approximating gradients for differentiable quality diversity in reinforcement learning[C]∥Proc of the Genetic and Evolutionary Computation Conference,2022:1102-1111.
|
[36] |
Phan T,Gabor T,Sedlmeier A,et al.Learning and testing resilience in cooperative multi-agent systems[C]∥Proc of the 19th International Conference on Autonomous Agents and MultiAgent Systems,2020:1055-1063.
|
[37] |
Phan T,Belzner L,Gabor T,et al.Resilient multi-agent reinforcement learning with adversarial value decomposition[C]∥Proc of the 35th AAAI Conference on Artificial Intelligence,2021:11308-11316.
|
[38] |
Khadka S,Tumer K.Evolution-guided policy gradient in reinforcement learning[C]∥Proc of the 32nd Conference on Neural Information Processing Systems,2018:1196-1208.
|
[39] |
Mouret J B,Clune J.Illuminating search spaces by mapping elites[J].arXiv:1504.04909,2015.
|
[40] |
Schulman J,Wolski F,Dhariwal P,et al.Proximal policy optimization algorithms[J].arXiv:1707.06347,2017.
|
[41] |
Huang S Y,Ontaón S,Bamford C,et al.Gym-μRTS:Toward affordable full game real-time strategy games research with deep reinforcement learning[C]∥Proc of 2021 IEEE Conference on Games,2021:1-8.
|