Research on intelligent decision-making for group top-coal caving based on batch reinforcement learning
YANG Yi,LI Qingyuan,LI Huamin,LI Dongyin,YANG Yanlin,FEI Shumin
河南理工大学电气工程与自动化学院河南省煤矿装备智能检测与控制重点实验室河南理工大学能源科学与工程学院河南理工大学学术出版中心东南大学自动化学院
在综放开采过程中,影响顶煤破碎和运移动态特性的因素庞杂,难以通过预探测结果对开采环境和放煤过程进行精准建模,使得放顶煤开采的智能控制缺乏模型基础,难以通过简单的放煤工艺过程实质性解决顶煤采出率和出煤含矸率之间的矛盾。将液压支架群转换为多智能体,将顶煤放出率与出煤含矸率的最优控制转换为马尔可夫过程的最优决策,从而在多智能体框架下,采用强化学习实现多放煤口的协同控制。为此,将顶煤赋存状态作为马尔可夫过程的状态,建立“顶煤赋存状态-放煤口控制”的关联机制,在线生成放煤口控制策略。为实现智能体对上述关联机制的充分学习,提出一种批量式Q值更新方法,对状态采集过程中出现的“状态跳变”现象进行优化,进一步提高智能体的在线学习效率。为验证算法的有效性,结合塔山煤矿8222工作面煤层条件及液压支架主要技术参数进行数值模拟仿真,基于离散元方法在Linux系统中搭建了放煤过程的三维仿真试验平台,并通过该平台对连续放煤、分段间隔放煤、群组智能放煤展开对比试验。一系列仿真试验结果表明,提出的批量式强化学习放煤决策方法能够根据顶煤赋存状态动态调整放煤口的动作,在群组放煤过程中将煤炭与矸石进行有效分离,实现了放顶煤收益最大化。
In the fully-mechanized top-coal caving mining process, there are many factors that affect the dynamic characteristics of top-coal crushing and transportation. It is difficult to accurately model the mining environment and coal caving process through the pre-detection results. As a result, the intelligent control of top coal caving mining lacks a model basis, and the contradiction between the top coal recovery rate and gangue rate could not be substantially resolved by a simple coal caving process. In this paper, a novel intelligent decision-making method called Batch Reinforcement Learning for group caving was proposed, in which the hydraulic support group was converted into a multi-agent, and the optimal control of the top coal recovery rate and gangue rate were converted into an optimal decision of the Markov process. Under the framework of multi-agent, reinforcement learning was used to achieve multiple coordinated control of windows. To this end, the top coal occurrence state was regarded as the state of the Markov process, and the correlation mechanism of “top coal occurrence state-coal outlet control” was established, and the coal outlet control strategy was generated online. In order to improve the learning ability of the agent, a batch Q value update method was proposed to optimize the “state jump” phenomenon that occurs in the state acquisition, and further improve the agent's online learning efficiency. In order to verify the validity of the algorithm, a numerical simulation was carried out in combination with the coal seam conditions of the No.8222 working face of Tashan Coal Mine and the main technical parameters of the hydraulic support. The comparative experiments were carried out on sequential coal caving, segmented and interval coal caving and group intelligent coal caving on the platform. A series of simulation experiment results show that the batch-type reinforcement learning coal caving decision method proposed in this paper can dynamically adjust the action of the coal caving port according to the occurrence state of the top coal, effectively separate coal and gangue in the group coal caving process and maximize the profits of top coal caving method.
top-coal caving;reinforcement learning;batch update;intelligent decision;group top-coal caving
主办单位:煤炭科学研究总院有限公司 中国煤炭学会学术期刊工作委员会