Coal mine monitoring image semantic segmentation based on domain adaptation
YANG Xiao,CHEN Wei,REN Peng,YANG Wenjia,BI Fangming
中国矿业大学 计算机科学与技术学院中国矿业大学 矿山数字化教育部工程研究中心中国科学院上海微系统与信息技术研究所 无线传感网与通信重点实验室
煤矿复杂场景监控图像的解析是煤矿安全高效生产的重要保障。语义分割是图像智能分析的关键技术,为图像中的每个像素分配类别标签。全卷积神经网络、DeepLab系列、DFN等较高性能的语义分割模型需要依赖大量的像素级标签。针对煤矿监控图像缺少标注信息及容易混淆外观相似的不同类别目标的问题,提出双对齐网络模型。该模型从特征级和像素级两方面减少域间差异,将在合成数据集上训练的语义分割模型迁移到煤矿真实场景中,实现煤矿监控图像语义分割。在特征空间中,使用特征级域适应网络学习域不变特征,减少2个领域之间特征表示的分布差异,实现特征级对齐;在像素空间中,使用像素级域适应网络将源域图像风格转换为目标域图像的风格,减少因纹理、光照等因素造成的域偏移,实现像素级对齐。分割网络使用具有煤矿环境风格的源域图像进行训练,学习煤矿监控图像光照、纹理等特征,增强煤矿监控图像中不同类别目标的区分度。判别器中添加空间注意力模块和通道注意力模块,用来提高双对齐网络模型中判别器的判别能力。通道注意力模块为每个通道的特征分配不同的权重,空间注意力模块使用非局部操作获得不同位置间的关系信息。实验选取GTA5-to-Cityscapes和SYNTHIA-to-Cityscapes两个典型的域适应任务验证双对齐网络的有效性,将该算法与AdaptSegNet,DCAN,CLAN等算法进行对比实验。实验结果表明,双对齐网络的平均交并比提高到43.7%和45.80%。对于煤矿复杂环境,选取SYNTHIA-to-Coal Mine域适应任务进行实验。双对齐网络模型的平均交并比为38.26%,比AdaptSegNet,DCAN,CLAN等算法分别提高7.19%,8.34%和5.56%。针对煤矿监控图像缺少标注信息的问题,双对齐网络减少合成图像与煤矿监控图像的域间差异,较好地分割不同类别的目标。
Coal mine complex scene monitoring images parsing is an important guarantee for safety and efficiency in coal mine operation.Semantic segmentation is a crucial way in the image intelligent analysis, which aims to assign a category label to each pixel in the image.High performance semantic segmentation models, such as Fully Convolutional Neural Networks, DeepLab, and DFN, depend on a large number of pixel level labels.There are some problems in the task of coal mine monitoring image semantic segmentation, such as the lack of monitoring image annotation information and the confusion of different semantic targets with similar appearances.Therefore, the Dual Alignment Networks method is proposed.The method reduces the domain difference in the feature level and pixel level, and can transfer the image semantic segmentation model trained on the synthetic data to the coal mine scene for monitoring images semantic segmentation.In the feature space, the feature level domain adaptation network is used to learn the domain invariant features, which can reduce the feature representation distribution difference between the two domains.In the pixel space, the pixel level domain adaptation network is used to transfer the source images to the style of target images, which can reduce the domain shift caused by texture and illumination.To enhance the discrimination of different categories of targets in the coal mine monitoring image, the stylized images are used to train the segmentation network, making it can learn the characteristics of coal mine monitoring image illumination and texture.To improve the discriminative ability of the discriminator, the spatial attention module and channel attention module are involved in discriminator.The channel attention module assigns different weights to each channel features, and the spatial attention module obtains the relationship information between different positions by non local operation.To evaluate the effectiveness of Dual Alignment Networks, the algorithm is compared with AdaptSegNet, DCAN,and CLAN in GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes domain adaptation tasks.The experimental results show that the Mean Intersection over Union (MIoU) of the Dual Alignment Networks is 43.7% and 45.80%.For a coal mines complex scene, the algorithm is compared with AdaptSegNet, DCAN, and CLAN in SYNTHIA-to-Coal Mine domain adaptation task.The Mean Intersection over Union of Dual Alignment Networks is 38.26%, which is increased by 7.19%, 8.34% and 5.56% respectively.For some coal mine monitoring images without annotations, the Dual Alignment Networks can segment different semantic categories targets by reducing the difference between the synthetic image and the coal mine monitoring image.
coal mine image semantic segmentation;unsupervised domain adaptation;pixel level alignment;feature level alignment;attention mechanism
1 双对齐网络模型
1.1 问题描述
1.2 双对齐网络结构
1.3 特征级域适应
1.4 像素级域适应
1.5 注意力模块
2 实验结果及分析
2.1 实验数据集
2.2 实验环境
2.3 结果分析
2.4 消融实验
3 结论
主办单位:煤炭科学研究总院有限公司 中国煤炭学会学术期刊工作委员会