Congratulations to Xia Jinhua on graduating successfully!

Xia Jinhua graduated with a Bachelor’s degree from Jiangsu University of Science and Technology. In 2020, Xia Jinhua began pursuing a professional master’s degree at the School of Computer Engineering and Science, Shanghai University. Under the guidance of Professor Han Yuexing, Xia Jinhua conducted research on material literature information mining methods and successfully completed the following studies:

  1. Extraction of Numerical Chart Information: A combined image and text-based literature mining method was proposed for extracting information from numerical charts along with their corresponding titles. The method involves several steps. Firstly, Yolov5s is utilized to extract individual numerical chart images from scientific literature, and an improved scientific literature image detection method is employed to enhance accuracy. Next, the PDFminer tool is used to parse the textual content from the scientific literature. The cosine similarity and Jaccard similarity between sentences are calculated to match the textual titles corresponding to the numerical charts. The Sci-Bert model and CRF algorithm are then applied to identify axis names in the titles. Additionally, techniques such as morphological operations and character recognition are used to extract specific data information from the numerical chart images. Finally, the extracted axis names and data are integrated to obtain complete numerical chart information.

  2. In order to address the low accuracy issue in recognizing axis names of numerical charts mentioned above, this study focuses on the relationship between numerical chart images and text in scientific literature and proposes a method to improve recognition performance. The method starts by identifying label text on the numerical chart image and filling it into a sample template to generate unlabeled text data, effectively achieving data augmentation. Additionally, text similarity matching techniques are employed to search for corresponding statements describing the numerical charts in the body of the scientific literature. These statements are then concatenated with the title text to expand the textual context, improving the vector representation of the generated input sentences. This optimization aims to enhance the predictive performance of the model.

After graduating, Xia Jinhua joined Hangzhou Guangli Microelectronics Company and engaged in software development-related work. Throughout Xia Jinhua’s three-year graduate studies at Shanghai University, they diligently pursued learning, continuously enhancing their professional knowledge and research presentation skills. Xia Jinhua had the privilege of meeting many excellent mentors and friends. We hope that Xia Jinhua will always remember their original aspirations and mission, overcome challenges, and forge ahead on their future path with determination.

Essay: Research on Context-Aware Information Mining of Image and Text in Material Science Literature

夏锦桦照片