2026年05月

Our team published the paper “Tiny object detection via implicit feature fusion and hybrid metric adaptive label assignment” in Knowledge-Based Systems (IF: 7.6, QSCI Zone 1 Top). The School of Computer Engineering and Science at Shanghai University is the first institution listed.

Tiny Object Detection (TOD) has broad applications in agricultural scenarios. Tiny objects contain extremely limited pixels, which restricts feature extraction and fusion and poses challenges to the label assignment strategies used in mainstream detection methods. To address these problems, this paper proposes a tiny object detection network based on Implicit Feature Fusion (IFF) and Hybrid Adaptive Label Assignment (HALA), named IHANet, aiming to achieve high-precision tiny object detection.

Specifically, IFF leverages implicit neural representations to alleviate feature misalignment in multi-scale fusion by mapping feature maps from different pyramid levels to a unified size before fusion. By modeling feature maps as continuous representations, IFF enables effective fusion at arbitrary resolutions, preserving tiny-object details and reducing information loss. HALA combines Intersection over Union (IoU) with Receptive Field Distance (RFD), which performs better in tiny object detection, and adopts an adaptive selection strategy to mine high-quality training samples. This optimizes the label assignment process and improves both training and detection performance. Extensive experiments on the AI-TOD, SODA-D, VisDrone, and AgriPest datasets show that IHANet achieves state-of-the-art performance across multiple TOD scenarios, reaching an AP of 29.1 on the AI-TOD dataset.

Essay: Tiny object detection via implicit feature fusion and hybrid metric adaptive label assignment

Code: https://github.com/han-yuexing/IHANet

徐天洋
Read
阮礼恒照片

Name: Ruan Liheng

Unit: Shanghai University

Topic: Research and Application of Few-Shot Image Generation Methods Based on Feature Enhancement with Shape Space Theory

Tutor: Prof. Han Yuexing

Read
2026年04月

Our team published the paper “Deep learning-driven microstructure characterization and Vickers-hardness prediction of Mg-Gd alloys” in Journal of Magnesium and Alloys (QSCI Zone 1, JCR Q1). Taking high-strength Mg-Gd alloys as the research object, this paper focuses on quantitative modeling of the relationships among alloy processing, microstructure, and properties. It proposes a multimodal fusion framework based on image recognition and deep learning, enabling automated prediction of the Vickers hardness of Mg-Gd alloys.

In high-strength Mg-rare earth (Mg-RE) alloys, solution treatment and aging treatment significantly affect the microstructure and mechanical properties of the alloys. However, traditional experimental methods and physical modeling approaches still struggle to effectively establish quantitative mapping relationships among processing parameters, microstructural features, and property responses. To address this problem, this paper takes high-strength Mg-Gd alloys as a case study and constructs a quantitative analysis framework for “processing (solution and aging) - microstructure - properties”. Specifically, the mechanical properties of solution-treated Mg-Gd alloys are mainly influenced by Gd content, grain boundary characteristics, and the presence of second phases, while the properties of aged alloys are further jointly affected by Gd content, aging parameters, and precipitate features.

To establish the above mapping relationships, this paper proposes a two-stage multimodal fusion framework that combines elemental composition, processing parameters, and microstructural features extracted from alloy micrographs to predict alloy hardness. The framework first uses deep learning methods to automatically extract key microstructural features, such as grain size, second phases, and precipitates, from alloy images under different states. These image features are then fused with composition and processing parameters to construct solution-treated and aged datasets, respectively. The solution-treated dataset is used to predict solution-treated hardness, while the aged dataset is used to predict the hardness increment caused by aging treatment. Experimental results show that the two prediction models achieve R² values of 0.90 and 0.89, respectively, demonstrating high prediction accuracy.

Comparison with manual analysis results verifies that the proposed two-stage framework can automatically predict the final room-temperature hardness of Mg-Gd alloys, effectively reducing the cost of manual microstructure analysis.

Essay: Deep learning-driven microstructure characterization and Vickers-hardness prediction of Mg-Gd alloys

Code: https://github.com/han-yuexing/MCVHPA

王璐
Read
2026年03月

Our team published the paper “Scribble consistency match and pixel-level prototype contrastive calibration for weakly supervised medical segmentation” in Neurocomputing (IF: 6.5, QSCI Zone 2). The School of Computer Engineering and Science at Shanghai University is the first institution listed. To address the high cost of pixel-level annotation for medical images and the insufficient supervision provided by scribble annotations, this paper proposes FW2SS, a weakly supervised medical image segmentation framework.

Medical image segmentation is an important task in medical image analysis. It is mainly used to accurately separate organs, tissues, or lesion regions from images such as CT and MRI, providing auxiliary support for disease diagnosis, quantitative analysis, and clinical treatment. In recent years, deep learning has significantly improved segmentation performance, but it usually relies on large amounts of precise pixel-level annotations. Since medical image annotation is costly and requires professional expertise, weakly supervised medical image segmentation has gradually become a research hotspot.

FW2SS is based on a CNN-Transformer hybrid architecture, combining the local detail modeling capability of CNNs with the global structural perception capability of Transformers. The paper proposes a Scribble Consistency Match technique, which generates more reliable dense pseudo-labels through consistency learning between network perturbations and input perturbations, enabling the model to learn complete shape information from sparse scribble annotations. Meanwhile, the Pixel-level Prototype Contrastive Calibration technique is introduced to construct category prototypes using high-confidence pixels and enhance intra-class consistency and inter-class discriminability through contrastive learning, thereby improving segmentation performance in boundary and detail regions.

Experiments on the ACDC and MSCMRseg datasets show that FW2SS achieves state-of-the-art performance under scribble supervision, with average Dice scores of 90.0% and 88.2%, respectively, significantly outperforming various existing weakly supervised medical image segmentation methods. This research reduces the cost of medical image annotation while improving segmentation accuracy, providing an effective technical solution for weakly supervised medical image analysis and intelligent clinical assistance.

Essay: Scribble consistency match and pixel-level prototype contrastive calibration for weakly supervised medical segmentation

Code: https://github.com/han-yuexing/FW2SS

李子铭
Read
2026年02月

Our team published the paper “A multi-task learning framework for integrated assessment in agricultural applications” in Information Sciences (IF: 6.8, QSCI Zone 2). The School of Computer Engineering and Science at Shanghai University is the first institution listed.

Automated assessment of fruits and vegetables is an important task in smart agriculture, quality control, and supply chain management. Traditional manual weighing and visual inspection are time-consuming, labor-intensive, and highly subjective, while most existing automated methods focus on a single task and struggle to perform comprehensive multi-attribute assessment within a unified framework. In addition, datasets with multi-attribute annotations for fruits and vegetables remain limited. To address this problem, this paper proposes a multi-task deep learning framework for agricultural applications, capable of simultaneously performing weight prediction, key phenotypic feature analysis, and quality grade classification from a single RGB image.

Specifically, this paper constructs FruVegSet (FVS), an integrated assessment dataset for fruits and vegetables, covering two types of agricultural products, cucumbers and bananas, and providing multi-attribute annotations including images, weight, key phenotypic features, and quality grades. In terms of model design, this paper adopts a ResNet18-based pre-classification module to identify the category of agricultural products and route input images to corresponding category-specific subnetworks. Then, task-related features are extracted through the weight branch and key phenotype branch, respectively. A feature pyramid network is introduced to enhance morphological feature representation, while a large-kernel attention fusion module and cross-attention mechanism are combined to enable information interaction between tasks. Finally, the model simultaneously predicts weight, analyzes key phenotypic features, and classifies quality grades to complete integrated assessment. Experimental results show that the proposed framework achieves favorable integrated assessment performance on both cucumber and banana data, outperforming single-task models and representative agricultural quality classification models.

Essay: A multi-task learning framework for integrated assessment in agricultural applications

葛嘉浩
Read

Our team published the paper “A Dual-Domain Detection Transformer for Fine-Grained Weed Detection in Complex Agricultural Scenes” in Information Sciences (IF: 6.8, QSCI Zone 2 TOP). The School of Computer Engineering and Science at Shanghai University is the first institution listed.

Weed detection is a key technology in precision agriculture, intelligent weeding, and smart farmland management. However, in complex agricultural environments, existing detection methods are prone to false detections and missed detections due to factors such as the highly similar appearances of crops and weeds, severe object occlusion, complex background interference, and significant scale variations, making it difficult to meet practical application needs. To address these challenges, this paper proposes FS-DETR (Frequency-Spatial Detection Transformer), a dual-domain fusion detection Transformer framework that collaboratively models spatial-domain and frequency-domain information to achieve accurate fine-grained weed detection in complex agricultural scenes.

Specifically, this paper proposes a Hybrid Feature Fusion (HFF) module that integrates multi-scale spatial features with high-frequency information in the frequency domain, enhancing the representation of fine-grained texture features and edge information, thereby effectively alleviating detection difficulties caused by crop-weed overlap and complex background interference. Meanwhile, a Dual Domain Attention Mechanism (DDAM) is designed to adaptively fuse frequency-domain attention with deformable attention, fully exploiting spatial structural information and frequency-domain texture information during the encoding stage to improve feature extraction and target discrimination in complex agricultural environments. Furthermore, a Gaussian Distribution-based and Constraint-guided Label Assignment (GCLA) module is constructed to optimize the label matching process for weed and crop targets, improving the quality of supervision and detection accuracy during training.

Experimental results on three public agricultural weed datasets, WeedCrop, LincolnBeet, and MH-Weed16, show that FS-DETR achieves excellent performance. Specifically, FS-DETR obtains AP scores of 47.2%, 60.4%, and 32.5% on WeedCrop, LincolnBeet, and MH-Weed16, respectively, improving upon the baseline model by 1.4%, 1.0%, and 0.6%. In addition, for small-object weed detection tasks, FS-DETR improves over the current second-best methods by 1.2% and 0.2%, demonstrating strong fine-grained object detection capability and robustness in complex scenes, and providing a new technical solution for precision weed management in intelligent agriculture.

Essay: A Dual-Domain Detection Transformer for Fine-Grained Weed Detection in Complex Agricultural Scenes

Code: https://github.com/YanSun-github/FS-DETR

沈新宇
Read
2026年01月

Our team published the paper “PDDNet: An End-to-End Object Detection Framework for Real-World Plant Leaf Disease Diagnosis” in Expert Systems with Applications (IF: 7.5, QSCI Zone 1). The School of Computer Engineering and Science at Shanghai University is the first institution listed.

Plant leaf disease detection is an important task in smart agriculture, precision plant protection, and crop health management. However, in real-world agricultural scenarios, leaf lesions are often affected by complex natural backgrounds, multi-scale disease regions, lighting variations, and subtle visual differences between different disease categories. As a result, existing detection methods still face challenges in localization accuracy, classification robustness, and cross-scene generalization. To address this problem, this paper proposes PDDNet, an end-to-end plant leaf disease detection framework that integrates local lesion details with global contextual information through a cascaded encoder-decoder structure, thereby improving disease detection performance in real-world scenarios.

Specifically, we propose an Enhanced Attention-based Multi-scale Aggregation (EAMA) module that strengthens the feature representation capability for lesion regions at different scales through collaborative modeling of spatial attention and channel attention. Meanwhile, a Prior-guided Self-Attention (PGSA) mechanism is introduced to incorporate position priors and IoU geometric relationships into attention computation, enabling the model to focus more effectively on lesion boundaries and morphological structures. Furthermore, this paper designs a Multi-task Feature Decoupling Module (MFDM), which separates classification features from localization features using task-specific dynamic masks, alleviating conflicts between classification and regression tasks. Experimental results on real-world datasets such as PlantDoc and Tomato Leaf Disease show that PDDNet achieves favorable detection performance in complex backgrounds, multi-scale lesion detection, and fine-grained category recognition tasks, providing reliable technical support for automated disease diagnosis in precision agriculture.

Essay: PDDNet: An End-to-End Object Detection Framework for Real-World Plant Leaf Disease Diagnosis

马唯一
Read
Last updated: 2026-06-03
Visits: 加载中...