用于实时语义分割的丰富语义提取器网络_中国煤炭行业知识服务平台

用于实时语义分割的丰富语义提取器网络

Title

Rich semantic extractor network for real-time semantic segmentation
作者

赵珊田楷文孙君顶
Author

ZHAO Shan;TIAN Kaiwen;SUN Junding
单位

河南理工大学软件学院河南理工大学计算机科学与技术学院
Organization

School of Software,Henan Polytechnic University
School of Computer Science and Technology,Henan Poly⁃technic University
摘要

目的目的由于推理速度限制,网络深度较浅,实时语义分割网络提取的语义特征信息不足。此外,较浅的网络深度也限制了特征提取网络的能力,降低了其鲁棒性和适应能力。为此,方法提出一种用于实时语义分割的丰富语义提取器网络。首先针对语义特征信息提取不足的问题,引入丰富语义提取器,丰富语义提取器包括多尺度全局语义提取模块和语义融合模块。其次,利用多尺度全局语义提取模块可以提取丰富的多尺度全局语义,扩大网络的有效感受野,同时语义融合模块将多尺度局部语义与多尺度全局语义高效融合,使网络拥有更全面更丰富的语义信息。最后针对细节分支和语义分支的特点设计空间重构聚合模块,建模细节特征的上下文信息,增强特征表示,使2个分支高效聚合。结果结果在Cityscapes和ADE20K数据集上进行全面实验,所提出的RSENet分别以76帧/s和67帧/s的推理速度达到了75.6%和35.7%的MIoU。结论结论实验结果表明,在复杂场景语义信息的提取方面,本文所提出的网络能够深入挖掘并准确捕捉图像中语义信息。同时,在精度与速度的平衡方面也展现出了卓越的性能,不仅能够实现高精度的语义分割,而且推理速度非常快。这种高效的图像分割能力使得网络在实际应用场景中具有极高的实用性和可操作性。
Abstract

Objectives The inference speed of the real-time semantic segmentation network is limited,the depth of the network is shallow,which lead to insufficient semantic feature information extracted.Addition‐ally,the shallow network depth restricts the capability of feature extraction networks,reducing their robust‐ness and adaptability. In order to solve such the problems, Methods a rich semantic extractor network(RSENet) for real-time semantic segmentation was proposed.Firstly,aiming at the problem of inadequate se‐mantic feature information extraction,a rich semantic extractor(RSE) was introduced,which included a multi-scale global semantic extraction module(MGSEM) and a semantic fusion module(SFM).MGSEM was used to extract rich multi-scale global semantics and expand the effective receptive field of the network.At the same time,SFM efficiently fused multi-scale local semantics and multi-scale global semantics,so that the network had more comprehensive and rich semantic information.Finally,according to the characteristics of the detailed branch and the semantic branch,a space reconstruction aggregation module(SRAM) was de‐signed to model the context information of the detailed features and enhanced the feature representation,so that the two branches could be efficiently aggregated. Results Comprehensive experiments were conducted on Cityscapes and ADE20K datasets,and the proposed RSENet achieved mIoU of 75.6% and 35.7% at in‐ference speed of 76 frames/s and 67 frames/s,respectively. Conclusions The experimental results suggested that in the extraction of semantic information within complex scenes,the network proposed in this paper was able to deeply explore and accurately capture such semantic information in images.Furthermore,outstanding performance was demonstrated in achieving a balance between accuracy and speed,with the network not only capable of achieving high-precision semantic segmentation but also exhibiting very fast inference speeds.This efficient image segmentation capability endowed the network with high practicality and operabil‐ity in real-world application scenarios.
关键词

语义分割多尺度特征视觉Transformer 特征融合
KeyWords

semantic segmentation;multi-scale feature;vision Transformer;feature fusion
基金项目(Foundation)

国家自然科学基金资助项目(62276092)
DOI

10.16186/j.cnki.1673-9787.2023030005
引用格式

赵珊,田楷文,孙君顶 . 用于实时语义分割的丰富语义提取器网络[J]. 河南理工大学学报(自然科学版),2024,43(6):146‐155.
Citation

ZHAO S,TIAN K W,SUN J D. Rich semantic extractor network for real-time semantic segmentation[J]. Journal of Henan Polytechnic University(Natural Science),2024,43(6):146-155.

煤问提

煤传媒

煤视界

科技创新50强

会员中心