Administrator
Shanxi Provincial Education Department
Sponsor
Taiyuan University of Technology
Publisher
Ed. Office of Journal of TYUT
Editor-in-Chief
SUN Hongbin
ISSN: 1007-9432
CN: 14-1220/N

The visual joint perception system can realize multi-tasks such as traffic object detection, drivable area segmentation, and lane detection in autonomous driving traffic scenes, which is essential in autonomous driving. In practical application, the accuracy and speed should be appropriately balanced. The autonomous driving visual joint perception network YOLOP has achieved great performance in real-time. However, it ignores the feature conflicts of different scales in the feature pyramid network and the texture details lost in the downsampling process.
To relieve these problems, the spatial semantic fusion network for autonomous driving visual joint perception (SSFJP) algorithm was proposed. This paper modified the original semantic fusion network of YOLOP from two aspects, focusing on spatial semantic embedding and fusion. Regarding feature enhancement, the bidirectional attention information strength module (BAISM) modeled global contextual prior and corresponding precise positional information from horizontal and vertical dimensions, embedding channel attention semantic information into spatial details, effectively highlighting the critical visual area, and improving the representation ability of
features' texture details. In terms of feature fusion, the multi-branch cascade feature fusion (MCFF) used atrous convolution with different rates and exponentially weighted pooling to fuse scene feature information of different scales, cascades fusion of spatial context semantic information, relieving the mutual interference of features corresponding to spatial positions of different levels, which could achieve complementary information between texture details and high-level semantics. Adaptive parameters were introduced to design weighting coefficients of the loss function to solve the imbalanced training of different sub-tasks, effectively improving the detection and segmentation performance. Experiments on the BDD100K dataset showed that the proposed autonomous multi-task driving joint perception model SSFJP guaranteed real-time detection, increasing the average accuracy of lane line detection and object detection by 8.9% and 1.6% compared to YOLOP.