Towards the study on the geochemistry through machine learning
XU Na,HUANG Bin,LI Qiang,ZHU Wei,WANG Zhiwei,WANG Ru
研究煤中元素赋存状态的方法包括各种显微探针方法(电子、离子和 X 射线探针)、谱学方法(如 X 射线吸收精细结构谱方法)、数理统计方法、浮沉试验方法和化学方法(如逐级化学提取试验)。 传统常用的数理统计方法有相关分析、聚类分析、因子分析和多元判别分析等,其中相关分析是基于灰分、硫质量分数、常量元素质量分数与微量元素质量分数的相关性来判断元素的赋存状态,如计算煤灰分与煤中元素质量分数之间的相关系数。 但是,数理统计方法在判别煤中元素赋存状态和来源时有诸多问题,例如不同基准下(全煤基和灰基)煤中元素相关性不一致问题。 不同的层次聚类算法在判别煤中元素的赋存状态时也会有诸多问题,例如不同的层次聚类算法会导致煤中元素赋存状态的推测结果不同。 机器学习的发展为上述问题的解决提供了解决方法。 例如非对称对数比转换方法,有效解决了煤和煤炭不同基准下元素之间、元素和灰分之间相关性不一致的问题。在单链接、全链接、平均链接和质心链接等 4 种常见的分层聚类算法中,平均链接算法的效果相对要好,在基于原始数据和转换后数据的聚类分析中,基于皮尔逊相关系数的距离度量都要比欧几里德距离好,基于枢轴坐标进行转换后的数据优于原始数据,而加权对称的枢轴坐标又优于枢轴坐标。 另外,还论述了机器学习在煤中关键金属和有害元素的地球化学研究优势,在以机器学习算法预测钡对关键金属元素铕干扰的临界值和基于 CART 算法确定我国煤中铀的辐射危害阈值 2 个实例中,机器学习算法得出的结果相比传统方法更加精准。
A number of methods have been used to determine the modes of occurrence of elements in coal, inclu⁃ ding microscopy method (electron microscopy, ion microscopy, X-ray microscopy), spectrometry method (X-ray absorption fine structure), mathematical statistics, float/sink method and chemical method (sequential extrac⁃ tion method). Traditional mathematical statistics methods includecorrelation analysis, cluster analysis, factor analysis and multiple discriminant analysis, etc. The correlation analysis is to determine the modes of occurrence of elements⁃ based on the correlation of ash yields, sulfur concentration, major elements and trace elements concentration, for ex⁃ ample calculating the correlation coefficient between the ash yields and the concentration of elements in coal.There are many problems when calculating the correlation between ash yield and the concentrations of elements in coal,the inconsistence of the correlation of coal elements on whole coal versus ash bases. Different hierarchical cluste⁃ ring algorithms have different conclusions in determining the modes of occurrences of elements in coal e.g., differ⁃ ent modes of occurrence of elements in coal with different hierarchical clustering algorithms. The development of ma⁃ chine learning provides a solution to the above problems.For example, the asymmetric log⁃ratio conversion method dis⁃ cussed in this paper effectively solves the problem of inconsistent correlation between elements and between elements and ash under different benchmarks of coals.Among the four common hierarchical clustering algorithms of single link, full link, average link and prime link, the average link algorithm is relatively much better. The distance measure based on Pearson’s correlation coefficient is much better than the Euclidean distance in both the clustering analysis based on the original data and the transformed data, and the transformed data based on the pivot coordinates are better than the original data, while the weighted symmetric pivot coordinates are better than the pivot coordinates.In addition, this paper discussed the advantages of machine learning in the study of geochemistry of key metals and haz⁃ ardous elements in coal. In two examples of predicting the critical value of barium interference with europium of key metal elements by machine learning algorithm and determining the radiation hazard threshold of uranium in coal in China based on CART algorithm, the results obtained by machine learning algorithm are more accurate compared with traditional methods.
machine learning;geochemistry;mathematical statistics;artificial intelligence
主办单位:煤炭科学研究总院有限公司 中国煤炭学会学术期刊工作委员会