Intelligent recognition of named entities of coal mine safety hidden danger based on ERNIE-BiGRU-CRF model
LIU Feixiang;LI Zequan;ZHAO Jialiang;LI Jing
为充分挖掘煤矿安全隐患文本关键知识,帮助煤矿企业安全管理人员更好的开展隐患排查治理工作,提出一种基于预训练语言模型的命名实体识别方法。首先定义煤矿安全隐患实体类别,并采用BIO标注策略构建了7个实体类别和15个实体标签;然后将收集到的煤矿隐患排查数据进行预处理,由煤矿安全领域专家人工标注相关实体,得到1500条煤矿安全隐患命名实体标准数据集;最后采用ERNIE预训练模型对煤矿安全隐患文本词向量进行表征、同时利用BiGRU结构进行上下文语义特征提取以及CRF模型进行实体标签解码,完成煤矿安全隐患命名实体识别研究。实验结果表明:ERNIE-BiGRU-CRF模型在序列标注任务上的精确率、召回率和F1值分别为56.69%、69.23%和62.34%,较于BiLSTM-CRF基线模型分别提高了6.85%、13.74%和9.83%,并且实体抽取结果与实际标注结果相差不大。另外,消融实验也验证了BiGRU层能够更好的捕捉煤矿安全隐患文本上下文语义依赖关系以及CRF层能够进一步优化标签序列的有效性。
In order to fully explore the key text knowledge of coal mine safety hidden danger and help the safety management personnel of coal mine enterprises to better investigate and manage hidden danger, a named entity recognition method based on pre-training language model was proposed. Firstly, entity categories of coal mine safety hidden danger were defined, and 7 entity categories and 15 entity labels were constructed using BIO labeling strategy. Then, the collected data are preprocessed, and relevant entities were manually marked by experts in the field of coal mine safety, and 1500 standard data sets of named entities of coal mine safety hidden danger were obtained. Finally, the text word vector of coal mine safety hidden danger was represented with ERNIE pre-training model, the context semantic features was extracted with BiGRU structure and the entity labels was decoded with CRF model, thus to complete the named entity recognition of coal mine safety hidden danger. The experimental results show that: the accuracy, recall and F1 value of ERNIE-BiGRU-CRF model on sequence labeling tasks are 56. 69%, 69. 23% and 62. 34%, respectively, which are 6. 85%, 13. 74% and 9. 83% higher than baseline model of BiLSTM-CRF. And there is little difference between the entity prediction results and the actual label results. In addition, it was verified by the ablation experiment that, BiGRU layer can better capture semantic dependency of text context for coal mine safety hidden danger and CRF layer can further optimize label sequence.
coal mine safety hidden danger text; ERNIE - BiGRU - CRF algorithm model; named entity recognition
主办单位:煤炭科学研究总院有限公司 中国煤炭学会学术期刊工作委员会