• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (10): 1757-1764.

• 计算机网络与信息安全 • 上一篇    下一篇

基于多智能体深度强化学习的车联网区分业务资源分配算法

蔡玉,官铮,王增文,王学,杨志军   

  1. (云南大学信息学院,云南 昆明 650500)

  • 收稿日期:2024-01-03 修回日期:2024-03-06 接受日期:2024-10-25 出版日期:2024-10-25 发布日期:2024-10-29
  • 基金资助:
    云南省应用基础研究计划(202201AT070167);云南省专家工作站项目(202305AF150045);云南省教育厅科研基金(2023Y0246)

Resource allocation algorithm for distinguished services in vehicular networks based on multi-agent deep reinforcement learning

CAI Yu,GUAN Zheng,WANG Zeng-wen,WANG Xue,YANG Zhi-jun   

  1. (School of Information Science & Engineering,Yunnan University,Kunming 650500,China)
  • Received:2024-01-03 Revised:2024-03-06 Accepted:2024-10-25 Online:2024-10-25 Published:2024-10-29

摘要: 车联网产生大量网络连接和差异化数据,针对单个智能体难以在动态场景下收集信道状态信息并进行区分业务的资源分配和链路调度,提出了基于多智能体深度强化学习的车联网区分业务资源分配算法。该算法以实现紧急业务链路干扰最小化约束下,V2V链路数据包成功交付率和V2I链路总容量最大化为目标,利用深度强化学习算法进行多个蜂窝用户和设备到设备用户共存的单天线车载网络中,频谱分配和功率选择的策略优化。每个智能体都利用DQN进行训练,智能体间共同与通信环境交互,通过全局奖励函数实现智能体间的协作。仿真结果表明,高负载场景下,相较于传统随机分配算法,该算法的V2I链路总吞吐量增加了3.76 Mbps,V2V链路的数据包交付率提高了17.1%,紧急业务链路所受干扰相对于普通链路减少1.42 dB,实现紧急业务链路的优先级保障,有效提高了V2I链路和V2V链路的总传输容量。

关键词: 车联网, 频谱分配, 强化学习, 多智能体, 紧急业务

Abstract: The Internet of vehicles (IoV) generates a massive amount of network connections and diversified data. To address the challenge that a single agent struggles to collect channel state information and perform service-differentiated resource allocation and link scheduling in dynamic scenarios, a multi-agent deep reinforcement learning-based service-differentiated resource allocation method for IoV is proposed. This method aims to maximize the successful delivery rate of V2V link data packets and the total capacity of V2I links, under the constraint of minimizing interference to emergency service links. It employs deep reinforcement learning algorithms to optimize spectrum allocation and power selection strategies in a single-antenna vehicle-mounted network where multiple cellular users and device-to-device users coexist. Each agent is trained using deep Q-network(DQN), and they interact with the communication environment collectively, achieving coordination through a global reward function. Simulation results show that, in high-load scenarios, compared to traditional random allocation schemes, this scheme increases the total throughput of V2I links by 3.76 Mbps, improves the packet delivery rate of V2V links by 17.1%, and reduces the interference to emergency service links by 1.42 dB compared to ordinary links. This achieves priority guarantee for emergency service links and effectively enhances the overall transmission capacity of V2I and V2V links.

Key words: internet of vehicles, spectrum allocation, reinforcement learning, multi-agent, emergency services