Prediction method of gas emission in working face based on feature selection and BO-GBDT
马文伟
MA Wenwei
中煤科工集团沈阳研究院有限公司煤矿安全技术国家重点实验室
影响工作面瓦斯涌出量的特征众多,利用主成分分析等方法对原始数据降维,可节省计算资源,但会改变数据集的原始特征结构,损失部分原始数据特征的细节信息。针对该问题,建立梯度提升决策树(GBDT)瓦斯涌出量预测模型,利用5种特征选择算法对数据集进行特征过滤,分析每种特征组合在GBDT模型中的拟合度、计算时间及预测结果,优选出包装法为最佳的特征选择算法;结合现场实际,优选出8种特征进行瓦斯涌出量预测,结果表明,特征数量的多少与预测结果的准确性和泛化性并不呈正比关系,冗余特征或无关特征的存在反而会降低模型的预测准确性。为进一步提高模型精度,通过5种超参数寻优算法对GBDT模型进行超参数寻优,对比分析每一种超参数组合下GBDT模型的预测性能,结果表明:寻优算法本身对GBDT模型的准确性和泛化性影响较小,但基于树结构Parzen估计器(TPE)的贝叶斯优化(BO)算法所得出的最优超参数组合在GBDT模型中具有最高的准确率和相对较少的优化时间,其优化性能最佳,以此建立BO−GBDT模型。将特征选择后的数据集划分出训练集及测试集,利用BO−GBDT模型进行工作面瓦斯涌出量预测,并与随机森林、支持向量机、神经网络模型进行对比,结果表明:BO−GBDT模型具有更高的准确性和泛化性,其平均相对误差为2.61%,相比随机森林、支持向量机、神经网络模型分别降低了35.56%,37.41%,32.03%,能够满足现场工程应用需求,为矿井安全生产提供理论指导。
Gas emission in the working face is influenced by a variety of factors. Dimensionality reduction methods, such as Principal Component Analysis, can reduce computational resources but may alter the original feature structure, leading to a loss of some detailed information in the dataset. To address this issue, a gradient boosting decision tree (GBDT) model for gas emission prediction was developed. Five feature selection algorithms were applied to filter the dataset, and the model fit, computational time, and prediction accuracy of each feature combination in the GBDT model were analyzed. The wrapping method was identified as the most effective feature selection algorithm. Based on field conditions, 8 optimal features were selected for prediction. The results indicated that the number of features did not necessarily correlate with the prediction's accuracy or generalization capability. In fact, redundant or irrelevant features reduced the model's prediction accuracy. To further improve performance, five hyperparameter optimization algorithms were applied to the GBDT model. A comparative analysis of prediction performance for each hyperparameter combination was conducted. The results showed that the optimization algorithm itself had minimal impact on the accuracy and generalization of the GBDT model. However, the optimal hyperparameter combination, obtained through the tree-structured Parzen estimator (TPE) based Bayesian optimization (BO) algorithm, provided the highest accuracy and relatively short optimization time, yielding the best optimization performance. Thus, the BO-GBDT model was established. After feature selection, the dataset was divided into training and testing sets, and the BO-GBDT model was used to predict gas emission in the working face. Comparison with random forest, support vector machine, and neural network models showed that the BO-GBDT model achieved the highest accuracy and generalization, with an average relative error of 2.61%. This was 35.56%, 37.41%, and 32.03% lower than the random forest, support vector machine, and neural network models, respectively. The BO-GBDT model meets the field engineering application requirements and provides theoretical guidance for ensuring safe mining production.
瓦斯涌出量预测特征选择梯度提升决策树贝叶斯优化超参数优化机器学习
gas emission prediction;feature selection;gradient boosting decision tree;Bayesian optimization;hyperparameter optimization;machine learning
主办单位:煤炭科学研究总院有限公司 中国煤炭学会学术期刊工作委员会