基于大语言模型的矿山事故知识图谱构建_中国煤炭行业知识服务平台

基于大语言模型的矿山事故知识图谱构建

Title

Construction of a mine accident knowledge graph based on Large Language Models
作者

张朋杨生龙王巍魏忠诚赵继军
Author

ZHANG Pengyang;SHENG Long;WANG Wei;WEI Zhongcheng;ZHAO Jijun
单位

河北工程大学信息与电气工程学院河北工程大学河北省安防信息感知与处理重点实验室
Organization

School of Information and Electrical Engineering, Hebei University of Engineering
Hebei Provincial Key Laboratory of Security Information Perception and Processing, Hebei University of Engineering
摘要

现有矿山领域知识图谱构建方法在预训练阶段需要大量人工标注的高质量监督数据，人力成本高且效率低。大语言模型（LLM）可在少量人工标注的高质量数据下显著提高信息抽取的质量且效率较高，然而LLM结合Prompt的方法会产生灾难性遗忘问题。针对上述问题，将图结构信息嵌入到Prompt模板中，提出了图结构Prompt，通过在LLM上嵌入图结构Prompt，实现基于LLM的矿山事故知识图谱高质量构建。首先，收集煤矿安全生产网公开的矿山事故报告并进行格式修正、冗余信息剔除等预处理。其次，利用LLM挖掘矿山事故报告文本中蕴含的知识，对矿山事故报告文本中的实体及实体间关系进行K−means聚类，完成矿山事故本体构建。然后，依据构建的本体进行少量数据标注，标注数据用于LLM的学习与微调。最后，采用嵌入图结构Prompt的LLM进行信息抽取，实例化实体关系三元组，从而构建矿山事故知识图谱。实验结果表明：在实体抽取和关系抽取任务中，LLM的表现优于通用信息抽取（UIE）模型，且嵌入图结构Prompt的LLM在精确率、召回率、F₁值方面均高于未嵌入图结构Prompt的LLM。
Abstract

Current methods for constructing knowledge graphs in the field of mining require a large amount of manually labeled high-quality supervised data during the pre-training stage, resulting in high labor costs and low efficiency. Large Language Models (LLMs) can significantly improve the quality and efficiency of information extraction with only a small amount of manually labeled high-quality data. However, the prompt-based approach in LLMs suffers from catastrophic forgetting. To address this issue, graph-structured information was embedded into the prompt template and a Graph-Structured Prompt was proposed. By integrating this prompt into the LLM, high-quality construction of a mine accident knowledge graph based on the LLM was achieved. First, publicly available mine accident reports were collected from the Coal Mine Safety Production Network and preprocessed through formatting corrections and redundant information removal. Next, the LLM was utilized to extract knowledge embedded in the accident reports and K-means clustering was used to classify entities and relationships, thereby completing the construction of the mine accident ontology. Then, a small amount of data were labeled based on the ontology, which was used for LLM training and fine-tuning. Finally, the LLM embedded with the Graph-Structured Prompt was employed for information extraction, instantiating entity-relation triples to construct the mine accident knowledge graph. Experimental results showed that LLMs outperformed the Universal Information Extraction (UIE) model in entity and relationship extraction tasks. Moreover, the LLM embedded with the Graph-Structured Prompt achieved higher precision, recall, and F1 scores compared to those without it.
关键词

矿山事故知识图谱大语言模型图结构Prompt本体构建信息抽取
KeyWords

mine accident;knowledge graph;Large Language Model;Graph-Structured Prompt;ontology construction;information extraction
基金项目(Foundation)

国家自然科学基金资助项目（61802107）；河北省高等学校科学技术研究项目（ZD2020171）；河北省省级科技计划资助项目（22567624H）。
DOI

10.13272/j.issn.1671-251x.2024080031
引用格式

张朋杨，生龙，王巍，等. 基于大语言模型的矿山事故知识图谱构建[J]. 工矿自动化，2025，51（2）：76-83, 105.
Citation

ZHANG Pengyang, SHENG Long, WANG Wei, et al. Construction of a mine accident knowledge graph based on Large Language Models[J]. Journal of Mine Automation，2025，51（2）：76-83, 105.
图表
图(9) / 表(2)

煤问提

煤传媒

煤视界

科技创新50强

会员中心