• 全部
主办单位:煤炭科学研究总院有限公司、中国煤炭学会学术期刊工作委员会
基于ERNIE-BiGRU-CRF模型的煤矿安全隐患命名实体智能识别研究
  • Title

    Intelligent recognition of named entities of coal mine safety hidden danger based on ERNIE-BiGRU-CRF model

  • 作者

    刘飞翔李泽荃赵嘉良李靖

  • Author

    LIU Feixiang;LI Zequan;ZHAO Jialiang;LI Jing

  • 单位

    华北科技学院矿山安全学院华北科技学院经济管理学院中国矿业大学(北京)能源与矿业学院

  • Organization
    School of Mine Safety, North China Institute of Science and Technology
    School of Economics and Management, North China Institute of Science and Technology
    School of energy and mining, China University of Mining and Technology-Beijing
  • 摘要

    为充分挖掘煤矿安全隐患文本关键知识,帮助煤矿企业安全管理人员更好的开展隐患排查治理工作,提出一种基于预训练语言模型的命名实体识别方法。首先定义煤矿安全隐患实体类别,并采用BIO标注策略构建了7个实体类别和15个实体标签;然后将收集到的煤矿隐患排查数据进行预处理,由煤矿安全领域专家人工标注相关实体,得到1500条煤矿安全隐患命名实体标准数据集;最后采用ERNIE预训练模型对煤矿安全隐患文本词向量进行表征、同时利用BiGRU结构进行上下文语义特征提取以及CRF模型进行实体标签解码,完成煤矿安全隐患命名实体识别研究。实验结果表明:ERNIE-BiGRU-CRF模型在序列标注任务上的精确率、召回率和F1值分别为56.69%、69.23%和62.34%,较于BiLSTM-CRF基线模型分别提高了6.85%、13.74%和9.83%,并且实体抽取结果与实际标注结果相差不大。另外,消融实验也验证了BiGRU层能够更好的捕捉煤矿安全隐患文本上下文语义依赖关系以及CRF层能够进一步优化标签序列的有效性。

  • Abstract

    In order to fully explore the key text knowledge of coal mine safety hidden danger and help the safety management personnel of coal mine enterprises to better investigate and manage hidden danger, a named entity recognition method based on pre-training language model was proposed. Firstly, entity categories of coal mine safety hidden danger were defined, and 7 entity categories and 15 entity labels were constructed using BIO labeling strategy. Then, the collected data are preprocessed, and relevant entities were manually marked by experts in the field of coal mine safety, and 1500 standard data sets of named entities of coal mine safety hidden danger were obtained. Finally, the text word vector of coal mine safety hidden danger was represented with ERNIE pre-training model, the context semantic features was extracted with BiGRU structure and the entity labels was decoded with CRF model, thus to complete the named entity recognition of coal mine safety hidden danger. The experimental results show that: the accuracy, recall and F1 value of ERNIE-BiGRU-CRF model on sequence labeling tasks are 56. 69%, 69. 23% and 62. 34%, respectively, which are 6. 85%, 13. 74% and 9. 83% higher than baseline model of BiLSTM-CRF. And there is little difference between the entity prediction results and the actual label results. In addition, it was verified by the ablation experiment that, BiGRU layer can better capture semantic dependency of text context for coal mine safety hidden danger and CRF layer can further optimize label sequence.

  • 关键词

    煤矿安全隐患ERNIE-BiGRU-CRF算法模型命名实体识别信息抽取

  • KeyWords

    coal mine safety hidden danger text; ERNIE - BiGRU - CRF algorithm model; named entity recognition

  • 基金项目(Foundation)
    中央高校基本科研业务费资助项目(3142017107);廊坊市科技计划项目(2023029061)
  • DOI
  • 引用格式
    刘飞翔, 李泽荃, 赵嘉良, 等. 基于 ERNIE-BiGRU-CRF 模型的煤矿安全隐患命名实体智能识别研究 [J].煤炭工程, 2024, 56(2): 206-212.
相关问题

主办单位:煤炭科学研究总院有限公司 中国煤炭学会学术期刊工作委员会

©版权所有2015 煤炭科学研究总院有限公司 地址:北京市朝阳区和平里青年沟东路煤炭大厦 邮编:100013
京ICP备05086979号-16  技术支持:云智互联