Recent Achievement of the Team - Integrated Literature Mining Method for Extracting Text and Tabular Information from Materials Science Publications
Our team has published a paper titled “A literature-mining method of integrating text and table extraction for materials science publications” in the international journal “Computational Materials Science” (IF: 3.3000). The first affiliation of this paper is the School of Computer Engineering and Science, Shanghai University, with Associate Professor Zhang Rui as the first author and Zhang Jiawang as the second author. Associate Professor Han Yuexing is the corresponding author.
Scientific literature serves as an important means of showcasing research outcomes. In this study, we propose a large-scale information processing method for materials science literature, which involves extracting both textual and tabular information and conducting analysis. Firstly, we propose a material text named entity recognition model that combines general dynamic word vectors with domain-specific static word vectors. Secondly, we present an efficient and accurate method for recognizing and extracting information from image-based tables, specifically extracting material names, units, and components from composition tables. Finally, we utilize the extracted components, processes, properties, and property changes from both text and tables to predict the performance of corrosion resistance, ductility, strength, and hardness using machine learning techniques. This paper demonstrates the methodology using stainless steel as a demonstration material, mining 2.36 million entities and 7,970 compositions from 11,058 stainless steel literature, and predicting four types of performance changes. The proposed method enables large-scale knowledge extraction from materials science literature, and the extracted results can be utilized by relevant researchers to facilitate material performance improvement efforts.