Construction exploration and application prospect of the large model in mining industry
WANG Haijun
煤炭是保障能源安全的压舱石。在当前加快发展数字经济、积极稳妥推进“双碳”目标的背景下,煤炭行业亟需深化数字化转型与智能化建设。在此背景下,探索引入大模型技术赋能煤炭行业应用,充分利用行业海量知识数据,加快推动煤炭行业的数字化发展,已成为行业关注的焦点。基于此,梳理了通用大模型技术的发展现状,阐述了大模型技术在多领域的应用现状与成效,介绍了数据处理(清洗、平衡、增强等)、文本分词、预训练与微调、提示词优化、向量嵌入、对齐、检索增强生成等行业大模型关键技术,表明了行业大模型在继承通用大模型“通”的优势的同时又兼具“专”的特点,在推动行业生产力革新和产业升级方面发挥着重要作用。深度剖析了大模型技术在煤炭行业应用面临研发投入成本高、高质量数据搜集难度大、多模态数据融合技术难度高等挑战,从基础设施层、数据资源层、算法模型层、应用服务层、安全可信与测试层、行业生态层六方面详细总结了太阳石矿山大模型为应对上述挑战采取的建设路径以及取得的阶段性成效,最后对大模型技术的发展给煤炭行业带来的生产与技术变革进行了展望,指出矿山行业大模型建设应遵循开源模型与行业数据相结合的路径,发挥大模型的工具属性以赋能业务场景、构建“产-学-研-用”相结合的应用生态,助力矿山行业新质生产力的发展。
Coal is the cornerstone for energy security. In the current background of accelerating the development of the digital economy and actively and steadily promoting the “dual carbon” goal, the coal industry urgently needs to deepen digital transformation and intelligent construction. In this background, exploring the introduction of large model technology to empower coal industry applications, making full use of the industry’s massive knowledge data, and accelerating the digital development of the coal industry has become the focus of industry attention. Based on this, this paper sorts out the development status of generative large model technology, expounds the application status and effectiveness of large model technology in multiple fields, introduces the key technologies of industry large model such as data processing (cleaning, balancing, enhancement, etc.), text tokenization, pre-training and fine-tuning, prompt word optimization, vector embedding, alignment, retrieval enhancement generation and other large model technologies, and demonstrates that the industry large model inherits the advantages of the general large model of “general” and at the same time has the characteristics of “specialization”. This paper deeply analyzes the challenges of high R&D investment cost, difficulty in collecting high-quality data, and high difficulty in multimodal data fusion technology in the application of large model technology in the coal industry, and summarizes in detail the construction path and phased results achieved by SolStone Mine Large Model to cope with the above challenges from six aspects: infrastructure layer, data resource layer, algorithm model layer, application service layer, security and trustworthiness and testing layer, and industry ecological layer, and finally looks forward to the production and technological changes brought by the development of large model technology to the coal industry. It is pointed out that the construction of large models in the mining industry should follow the path of combining open access models and industry data, give full play to the tool attributes of large models to the application in scenarios, and build an application ecology combining “production-learning-research-application”, so as to help the development of new quality productivity in the mining industry.
the large scale pre-trained model;the large model in mining industry;SolStone Mine Large Model;retrieval enhancement generation;knowledge labeling system
主办单位:煤炭科学研究总院有限公司 中国煤炭学会学术期刊工作委员会