Construction of a mine accident knowledge graph based on Large Language Models
张朋杨生龙王巍魏忠诚赵继军
ZHANG Pengyang;SHENG Long;WANG Wei;WEI Zhongcheng;ZHAO Jijun
河北工程大学 信息与电气工程学院河北工程大学 河北省安防信息感知与处理重点实验室
现有矿山领域知识图谱构建方法在预训练阶段需要大量人工标注的高质量监督数据,人力成本高且效率低。大语言模型(LLM)可在少量人工标注的高质量数据下显著提高信息抽取的质量且效率较高,然而LLM结合Prompt的方法会产生灾难性遗忘问题。针对上述问题,将图结构信息嵌入到Prompt模板中,提出了图结构Prompt,通过在LLM上嵌入图结构Prompt,实现基于LLM的矿山事故知识图谱高质量构建。首先,收集煤矿安全生产网公开的矿山事故报告并进行格式修正、冗余信息剔除等预处理。其次,利用LLM挖掘矿山事故报告文本中蕴含的知识,对矿山事故报告文本中的实体及实体间关系进行K−means聚类,完成矿山事故本体构建。然后,依据构建的本体进行少量数据标注,标注数据用于LLM的学习与微调。最后,采用嵌入图结构Prompt的LLM进行信息抽取,实例化实体关系三元组,从而构建矿山事故知识图谱。实验结果表明:在实体抽取和关系抽取任务中,LLM的表现优于通用信息抽取(UIE)模型,且嵌入图结构Prompt的LLM在精确率、召回率、
Current methods for constructing knowledge graphs in the field of mining require a large amount of manually labeled high-quality supervised data during the pre-training stage, resulting in high labor costs and low efficiency. Large Language Models (LLMs) can significantly improve the quality and efficiency of information extraction with only a small amount of manually labeled high-quality data. However, the prompt-based approach in LLMs suffers from catastrophic forgetting. To address this issue, graph-structured information was embedded into the prompt template and a Graph-Structured Prompt was proposed. By integrating this prompt into the LLM, high-quality construction of a mine accident knowledge graph based on the LLM was achieved. First, publicly available mine accident reports were collected from the Coal Mine Safety Production Network and preprocessed through formatting corrections and redundant information removal. Next, the LLM was utilized to extract knowledge embedded in the accident reports and K-means clustering was used to classify entities and relationships, thereby completing the construction of the mine accident ontology. Then, a small amount of data were labeled based on the ontology, which was used for LLM training and fine-tuning. Finally, the LLM embedded with the Graph-Structured Prompt was employed for information extraction, instantiating entity-relation triples to construct the mine accident knowledge graph. Experimental results showed that LLMs outperformed the Universal Information Extraction (UIE) model in entity and relationship extraction tasks. Moreover, the LLM embedded with the Graph-Structured Prompt achieved higher precision, recall, and F1 scores compared to those without it.
矿山事故知识图谱大语言模型图结构Prompt本体构建信息抽取
mine accident;knowledge graph;Large Language Model;Graph-Structured Prompt;ontology construction;information extraction
主办单位:煤炭科学研究总院有限公司 中国煤炭学会学术期刊工作委员会