Detection and recognition of unsafe behaviors of underground coal miners based on deep learning
郭孝园朱美强田军朱贝贝
GUO Xiaoyuan;ZHU Meiqiang;TIAN Jun;ZHU Beibei
中煤科工集团常州研究院有限公司天地(常州)自动化股份有限公司中国矿业大学 信息与控制工程学院
针对井下目标发生多尺度变化、运动目标遮挡及目标与环境过于相似等问题,提出了一种基于深度学习的煤矿井下人员不安全行为检测与识别方法。采用自上而下的策略,构建了一种基于自注意力机制的目标检测模型YOLOv5s_swin:在基于自注意力机制的模型Transformer基础上引入滑动窗口操作,得到Swin−Transformer,再利用Swin−Transformer对传统YOLOv5s模型进行改进,得到YOLOv5s_swin。针对井下人员与监控探头间距不定导致的人体检测框多尺度变化问题,在检测出人员目标的基础上,使用高分辨率特征提取网络对人体的关节点进行提取,再通过时空图卷积网络(ST−GCN)进行行为识别。实验结果表明:YOLOv5s_swin的精确度达98.9%,在YOLOv5s的基础上提升了1.5%,推理速度达102帧/s,满足实时性检测要求;高分辨率特征提取网络能够准确提取不同尺度的目标人体关节点,特征通道数更多的HRNet_w48网络性能优于HRNet_w32;在复杂工矿条件下,ST−GCN模型的准确率和召回率都较高,可准确地对矿工行为进行分类,推理速度达31 帧/s,满足井下监测需求。
To address challenges such as multi-scale variations in underground targets, occlusion of moving objects, and the excessive similarity between targets and the environment, a deep learning-based method was proposed for detecting and recognizing unsafe behaviours of underground coal miners. A top-down approach was adopted to construct a YOLOv5s_swin target detection model based on a self-attention mechanism. This model was developed by introducing a sliding window operation into the Transformer-based self-attention mechanism to obtain Swin-Transformer, which was then used to enhance the traditional YOLOv5s model, resulting in YOLOv5s_swin. To tackle the issue of multi-scale variations in human detection bounding boxes caused by the varying distances between underground personnel and surveillance cameras, a high-resolution feature extraction network was employed to extract human keypoints after detecting personnel. Subsequently, a spatiotemporal graph convolutional network (ST-GCN) was utilized for behaviour recognition. Experimental results showed that YOLOv5s_swin achieved an accuracy of 98.9%, an improvement of 1.5% over YOLOv5s, with an inference speed of 102 frames per second (fps), meeting real-time detection requirements. The high-resolution feature extraction network effectively extracted human keypoints at different scales, and the HRNet_w48 network, with more feature channels, outperformed HRNet_w32. Under complex industrial and mining conditions, the ST-GCN model demonstrated high accuracy and recall rates, enabling precise classification of miners' behaviors, with an inference speed of 31 fps, thereby meeting underground monitoring requirements.
井下不安全行为识别目标检测深度学习自注意力机制YOLOv5s高分辨率特征提取网络时空图卷积网络
underground unsafe behaviour recognition;object detection;deep learning;self-attention mechanism;YOLOv5s;high-resolution feature extraction network;spatiotemporal graph convolutional network
主办单位:煤炭科学研究总院有限公司 中国煤炭学会学术期刊工作委员会