Predicting the fuel performance of coal-based liquids using the ML-QSPR method
LI Wenying;WANG Xiangling;FAN Huanhuan;FAN Hongxia;FENG Jie
煤基液体混合物如煤焦油、煤直接液化油的分子结构描述和性质预测是开发煤基液体产品高值化工艺和技术的重要基础。由于煤基液体主要由C、H、O、N、S元素构成数量庞杂、芳环结构各异的混合物,因此,使用Python中的RDKit工具包,利用简化分子线性输入规范(Simplified Molecular Input Line Entry System,SMILES)语言构建煤基液体中物质分子描述符,描述符包含样品元素信息、环数与环结构信息、原子数及分子量信息等共计115个分子描述符。对比人工信息提取方法,将所构建的分子描述符能够体现煤基液体分子结构碎片、分子量及原子个数信息等作为机器学习的特征输入变量,用于建立预测煤基液体的燃料性能的分子机器学习−定量结构性质关系方法(ML-QSPR),实现对燃料低位热值(LHV)、液体密度(
A comprehensive understanding of the composition and physicochemical properties of coal-based liquids, such as coal tar or coal direct liquefaction oil, is conducive to the rapid development of multi-purpose, high-performance and high-value-added products and the efficient use of oil properties. A full understanding of the composition of ideal components in the coal-based liquid mixtures and their physical and chemical properties is also the key to designing liquid fuels with some special properties. The authors use the RDKit toolkit in Python, a method based on the Simplified Molecular Input Specification for Molecules (SMILES) language, to construct the molecular descriptors suitable for substances in the coal-based liquids. The constructed molecular descriptors are able to extract the required structural fragments for the molecules in the coal-based liquids, which are mainly composed of the elements C, H, O, N, and S and contain a large number of substances with polycyclic aromatic structures, so the constructed structural fragment descriptors are mainly considered from the perspective of the elemental and ring numbers of the polycyclic aromatic compounds. At the same time, the number of atoms and the molecular weight descriptors are added to the structural fragment descriptors, which the number of molecular descriptors is 115 in total. Compared with the traditional manual information extraction methods, the constructed molecular descriptors can quickly extract the information contained in a large number of molecules in the coal-based liquids. The structural fragments, molecular weights and atomic numbers of the coal-based liquid molecules obtained by the constructed molecular descriptors are used as input features in Machine Learning (ML) to establish a method of predicting the quantitative molecular structure-property relationship (ML-QSPR) of the coal-based liquids, which achieves the fast and accurate prediction of four properties, namely, the lower heating value (LHV), the density of the liquid (
coal tar;liquids from direct coal liquefaction;coal structure;coal composition;molecular descriptors
主办单位:煤炭科学研究总院有限公司 中国煤炭学会学术期刊工作委员会