日本語 English | 中文 |
Website Home Page Self-Introduction Research Introduction Participated Projects Teaching Introduction Other Content Contact Map
2025.01 Recent achievements of the team - Automatic pipeline for information of curve graphs in papers based on deep learning
Our team published a paper "Automatic pipeline for information of curve graphs in papers based on deep learning" in the international journal 《Journal of the European Ceramic Society》 (IF : 3.1, CAS Zone 3). The School of Computer Engineering and Science of Shanghai University was the first department, Han Yuexing was the first author, Xia Jinhua was the second author, and Chen Qiaochuan was the corresponding author.
In materials science and biomedical fields. Current academic database tools mainly focus on mining textual information, ignoring the valuable information presented in graphs and charts. Extracting information from a large amount of literature can help researchers quickly grasp the current state of development. Literature is a carrier of many forms of data, and most researchers focus only on textual content. Especially, like graphs, they contain a lot of key numerical information that is not expressed in other data. In this paper, we propose a method for extracting information from graphs in the literature. With this method, the numerical values and axis entities of curved graphs can be extracted from both graphs and text. Firstly, the curved graphs are cut out from the literature using Yolov5s. Then, the exact title text corresponding to each curve graph is matched by manipulating Sentence-Bert. After obtaining the title text, the X-axis and Y-axis names in the curve graphs were extracted using SCI-Bert. At the same time, techniques such as Optical Character Recognition (OCR) were used to automatically parse the numerical data reflected on the graphs. In addition, a number of principles are used to improve performance. We validated each step using a dataset of 6042 articles from Elsevier. With our method, the accuracy of graph detection and title matching are 96.4% and 95.8% respectively, which are both better than the initial model, proving the effectiveness of our method. The accuracy of entity and numerical data extraction is 76.3% and 28.2%, respectively. The experimental results show that our method is able to achieve large-scale knowledge extraction of curve diagrams from the literature.

Essay: Automatic pipeline for information of curve graphs in papers based on deep learning

2024
2026