融合空间语义的自动驾驶视觉联合感知算法

Title

Spatial Semantic Fusion Network for Autonomous Driving Visual Joint Perception Algorithm
作者

王越曹家乐孙学斌王建庞彦伟
Author

WANG Yue;CAO Jiale;SUN Xuebin;WANG Jian;PANG Yanwei
单位

天津大学电气自动化与信息工程学院
Organization

School of Electrical and Information Engineering, Tianjin University
摘要

【目的】作为自动驾驶关键部分,视觉联合感知系统可完成自动驾驶场景中的目标检测、可行驶区域分割及车道线检测等多项任务,在实际应用中需实现精度与速率的合理权衡。自动驾驶视觉联合感知框架YOLOP在实时性方面取得了优异表现,但存在特征金字塔不同尺度间的特征冲突及下采样过程的纹理细节损失问题。为缓解这些问题,提出一种融合空间语义的自动驾驶视觉联合感知算法,以空间语义嵌入和融合为核心,从特征增强及特征融合两方面改进YOLOP原有语义融合网络。【方法】在特征增强方面,提出双向注意力信息增强模块,减少多尺度特征图生成过程中的空间信息损失,从水平和垂直两个正交维度对全局上下文先验及对应精确位置信息建模,将通道注意力语义信息嵌入至空间细节,有效突出关键区域,提升特征图纹理细节表征能力;在特征融合方面,设计多分支级联特征融合模块,缓解各层级特征对应空间位置的相互干扰,采用不同扩张率空洞卷积与指数加权池化增大感受野范围,级联融合空间上下文语义信息,利用动态卷积对多尺度场景特征进行自适应交互聚合,实现纹理细节与高层语义的信息互补。此外,针对模型中各子任务训练不均衡问题,引入自适应参数对损失函数加权系数进行改进,有效提升网络检测和分割性能。【结果】在BDD100K数据集的实验表明,相比于YOLOP,所构建自动驾驶视觉联合感知模型保证了网络推理实时性,在车道线检测及目标检测平均精度方面分别提升了8.9%和1.6%。
Abstract

【Purposes】The visual joint perception system can fulfill multi-tasks such as traffic ob⁃ ject detection, drivable area segmentation, and lane detection in autonomous driving traffic scenes, which is essential in autonomous driving. In practical application, the accuracy and speed should be ap⁃ propriately balanced. The autonomous driving visual joint perception network YOLOP has achieved great performance in real-time. However, it ignores the feature conflicts of different scales in the fea⁃ ture pyramid network and the texture details loss in the downsampling process. To relieve these prob⁃ lems, the spatial semantic fusion network for autonomous driving visual joint perception (SSFJP) algo⁃ rithm is proposed. In this paper, the original semantic fusion network of YOLOP is modified from two aspects, focusing on spatial semantic embedding and fusion. 【Methods】 Regarding feature enhance⁃ ment, the bidirectional attention information strength module (BAISM) was adopted for global contex⁃ tual prior and corresponding precise positional information modeling from horizontal and vertical di⁃ mensions, which helps embed channel attention semantic information into spatial details, effectively highlight the critical visual area, and improve the representation ability of features’ texture details. In terms of feature fusion, the multi-branch cascade feature fusion (MCFF) used atrous convolution with different rates and exponentially weighted pooling to fuse scene feature information of different scales, cascade fusion of spatial context semantic information, relieve the mutual interference of features cor⁃ responding to spatial positions of different levels, and achieve complementary information between texture details and high-level semantics. Besides, adaptive parameters were introduced to design weighting coefficients of the loss function to solve the imbalanced training of different sub-tasks, effec⁃ tively improving the detection and segmentation performance. 【Findings】Experiments on the BDD100K dataset show that the proposed autonomous multi-task driving joint perception, model SSFJP, compared with YOLOP, guarantees real-time detection and increases the average accuracy of lane line detection and object detection by 8.9% and 1.6%, respectively.
关键词

自动驾驶视觉联合感知语义融合双向注意力信息增强多任务多尺度
KeyWords

autonomous driving;visual joint perception;semantic fusion;bidirectional attention information strength;multi-task;multi-scale
基金项目(Foundation)

科技创新2030-新一代人工智能重大项目(2022ZD0160400);国家自然科学基金资助项目(C0049120)
DOI

10.16355/j.tyut.1007-9432.20230476
引用格式

王越,曹家乐,孙学斌,等.融合空间语义的自动驾驶视觉联合感知算法[J].太原理工大学学报,2025,56(2):338-347.
Citation

WANG Yue,CAO Jiale,SUN Xuebin,et al. Spatial semantic fusion network for autonomous driving visual joint perception algorithm[J].Journal of Taiyuan University of Technoloty,2025,56(2):338-347.

煤问提

煤传媒

煤视界

科技创新50强

会员中心