EAAI Journal 2026 Journal Article
A self-supervised visual auto-regressive framework for medical hyperspectral anomaly detection
- Lingqin Chen
- Jinzhuang Xu
- Yu Jiang
- Zhihao Zhan
- Xiaoli Yang
- Mingzhong Pan
- Chenglong Zhang
- Xuesen Xu
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Reconstructing fine-grained geometry of clothed human from single-view image is a challenging task, particularly in accurately recovering complex shapes and generating clothes details. To address these limitations, we propose a novel approach named HumanPro, which estimates high-quality human normals via a generative model, and progressively deforms a parametric body into the final clothed human mesh guided by normals. First, we propose a geometry-aware latent diffusion model with a normal enhancer to estimate high-quality human normals from four views. Then, we propose a progressive mesh optimization consisting of shape-aware deformation alignment and global-to-patch detail refinement for human mesh reconstruction. The shape-aware deformation alignment applies image morphing to learn the shape-level gap of normals, addressing large-scale deformation of complex clothes. It can recover the overall silhouette of a clothed human, and serves as an initialization for the global-to-patch detail refinement. Our detail refinement combines global and patch-wise optimization strategies to iteratively produce the clothed human mesh by minimizing the pixel-level difference of normals. This way effectively recovers fine-grained details while avoiding local minima. Extensive experiments demonstrate that HumanPro can deal with various challenging scenarios and outperforms state-of-the-art methods.
AAAI Conference 2026 Conference Paper
Effectively capturing co-occurrence signals, such as hand shapes, facial expressions, and body postures, is critical for semantic understanding in sign language recognition (SLR) and translation (SLT). Although skeleton data offer greater efficiency and robustness than RGB inputs, existing methods typically rely on pairwise graph structures, limiting their ability to model complex high-order interactions across body regions. To address this limitation, we propose HyperSign, a hierarchical hypergraph neural network that systematically captures high-order co-occurrence patterns among diverse body parts. The Co-occurrence Graph Perception Module jointly learns relational structures via three complementary pathways: (1) traditional graph convolutions for modeling physical joint connections, (2) dynamic geometric hypergraphs constructed via k-nearest neighbors to encode local spatial patterns, and (3) soft hypergraphs generated by learnable prototypes to reveal latent semantic associations. To further enhance structural modeling and semantic consistency, a Meta-Part Hypergraph Fusion Module abstracts feature streams from the hands, face, and body into unified hypergraph nodes, while leveraging empirically derived co-occurrence priors to model high-order cross-part dependencies. Moreover, an uncertainty-aware collaborative distillation mechanism guides the model to focus on critical body regions. Extensive experiments on standard SLR and SLT benchmarks (e.g., PHOENIX-2014, PHOENIX-2014T, and CSL-Daily) demonstrate that HyperSign not only outperforms existing skeleton-based approaches in both speed and accuracy but also achieves competitive or superior results compared to several state-of-the-art RGB-based methods across multiple evaluation metrics.
IROS Conference 2025 Conference Paper
Articulated object manipulation remains a critical challenge in robotics due to the complex kinematic constraints and the limited physical reasoning of existing methods. In this work, we introduce ArtGS, a novel framework that extends 3D Gaussian Splatting (3DGS) by integrating visual-physical modeling for articulated object understanding and interaction. ArtGS begins with multi-view RGB-D reconstruction, followed by reasoning with a vision-language model (VLM) to extract semantic and structural information, particularly the articulated bones. Through dynamic, differentiable 3DGS-based rendering, ArtGS optimizes the parameters of the articulated bones, ensuring physically consistent motion constraints and enhancing the manipulation policy. By leveraging dynamic Gaussian splatting, cross-embodiment adaptability, and closed-loop optimization, ArtGS establishes a new framework for efficient, scalable, and generalizable articulated object modeling and manipulation. Experiments conducted in both simulation and real-world environments demonstrate that ArtGS significantly outperforms previous methods in joint estimation accuracy and manipulation success rates across a variety of articulated objects. Additional images and videos are available on the project website: sites.google.com/view/artgs.
ICRA Conference 2025 Conference Paper
Modern orchards are planted in structured rows with distinct panel divisions to improve management. Accurate and efficient joint segmentation of point cloud from Panel to Tree and Branch (P2TB) is essential for robotic operations. However, most current segmentation methods focus on single-instance segmentation and depend on a sequence of deep networks to perform joint tasks. This strategy hinders the use of hierarchical information embedded in the data, leading to both error accumulation and increased costs for annotation and computation, which limits its scalability for real-world applications. In this study, we proposed a novel approach that incorporated a Real2Sim L-TreeGen for training data generation and a joint model (J-P2TB) designed for the P2TB task. The J-P2TB model, trained on the generated simulation dataset, was used for joint segmentation of real-world panel point clouds via zero-shot learning. Compared to representative methods, our model outperformed them in most segmentation metrics while using 40% fewer learnable parameters. This Sim2Real result highlighted the efficacy of L-TreeGen in model training and the performance of J-P2TB for joint segmentation, demonstrating its strong accuracy, efficiency, and generalizability for real-world applications. These improvements would not only greatly benefit the development of robots for automated orchard operations but also advance digital twin technology, enabling the facilitation of field robotics across various domains.
EAAI Journal 2025 Journal Article
IROS Conference 2024 Conference Paper
Robotic branch pruning, a rapidly growing field addressing labor shortages in agriculture, requires detailed perception of branch geometry and topology. However, point clouds obtained in agricultural settings often lack completeness, limiting pruning accuracy. This work addressed point cloud quality via a closed-loop approach, (Real2Sim) −1. Leveraging a Real-to-Simulation (Real2Sim) data generation pipeline, we generated simulated 3D apple trees based on realistically characterized apple tree information without manual parameterization. These 3D trees were used to train a simulation-based deep model that jointly performs point cloud completion and skeletonization on real-world partial branches, without extra real-world training. The Sim2Real qualitative results showed the model’s remarkable capability for geometry reconstruction and topology prediction. Additionally, we quantitatively evaluated the Sim2Real performance by comparing branch-level trait characterization errors using raw incomplete data and the best complete data. The Mean Absolute Error (MAE) reduced by 75% and 8% for branch diameter and branch angle estimation, respectively, which indicates the effectiveness of the Real2Sim data in a zero-shot generalization setting. The characterization improvements contributed to the precision and efficacy of robotic branch pruning.
AIIM Journal 2024 Journal Article
AAAI Conference 2023 Short Paper
Integer programming problems (IPs) are challenging to be solved efficiently due to the NP-hardness, especially for large-scale IPs. To solve this type of IPs, Large neighborhood search (LNS) uses an initial feasible solution and iteratively improves it by searching a large neighborhood around the current solution. However, LNS easily steps into local optima and ignores the correlation between variables to be optimized, leading to compromised performance. This paper presents a general adaptive constraint partition-based optimization framework (ACP) for large-scale IPs that can efficiently use any existing optimization solver as a subroutine. Specifically, ACP first randomly partitions the constraints into blocks, where the number of blocks is adaptively adjusted to avoid local optima. Then, ACP uses a subroutine solver to optimize the decision variables in a randomly selected block of constraints to enhance the variable correlation. ACP is compared with LNS framework with different subroutine solvers on four IPs and a real-world IP. The experimental results demonstrate that in specified wall-clock time ACP shows better performance than SCIP and Gurobi.
ICML Conference 2023 Conference Paper
The latest two-stage optimization framework based on graph neural network (GNN) and large neighborhood search (LNS) is the most popular framework in solving large-scale integer programs (IPs). However, the framework can not effectively use the embedding spatial information in GNN and still highly relies on large-scale solvers in LNS, resulting in the scale of IP being limited by the ability of the current solver and performance bottlenecks. To handle these issues, this paper presents a GNN&GBDT-guided fast optimizing framework for large-scale IPs that only uses a small-scale optimizer to solve large-scale IPs efficiently. Specifically, the proposed framework can be divided into three stages: Multi-task GNN Embedding to generate the embedding space, GBDT Prediction to effectively use the embedding spatial information, and Neighborhood Optimization to solve large-scale problems fast using the small-scale optimizer. Extensive experiments show that the proposed framework can solve IPs with millions of scales and surpass SCIP and Gurobi in the specified wall-clock time using only a small-scale optimizer with 30% of the problem size. It also shows that the proposed framework can save 99% of running time in achieving the same solution quality as SCIP, which verifies the effectiveness and efficiency of the proposed framework in solving large-scale IPs.
AAAI Conference 2023 Short Paper
Graph convolutional neural network (GCN) based methods have achieved noticeable performance in solving mixed integer programming problems (MIPs). However, the generalization of existing work is limited due to the problem structure. This paper proposes a self-paced learning (SPL) based GCN network (SPGCN) with curriculum learning (CL) to make the utmost of samples. SPGCN employs a GCN model to imitate the branching variable selection during the branch and bound process, while the training process is conducted in a self-paced fashion. Specifically, SPGCN contains a loss-based automatic difficulty measurer, where the training loss of the sample represents the difficulty level. In each iteration, a dynamic training dataset is constructed according to the difficulty level for GCN model training. Experiments on four NP-hard datasets verify that CL can lead to generalization improvement and convergence speedup in solving MIPs, where SPL performs better than predefined CL methods.
IROS Conference 2023 Conference Paper
Autonomous navigation is crucial for achieving the full automation of agricultural research and production management using agricultural robots. In this paper, we present a vision-based autonomous navigation approach for agriculture robots in trellised cropping systems, which stands out for its remarkable performance achieved entirely without human annotation. We propose a novel learning-based method that directly estimates the path traversibility heatmap from an RGB-D image and subsequently converts it into a preferred traversal path. One key advantage of our approach lies in its capability to predict the robot's preferred path directly, allowing us to obtain training labels without manual annotation. Specifically, we propose an automatic annotation pipeline that leverages the robot's path recorded during data collection. Furthermore, we develop a full navigation framework by integrating our path detection model with row switching modules, enabling the robot to smoothly transition between crop rows within the vineyard. We conduct extensive field trials in three different vineyards to validate the performance of our autonomous navigation framework. The results demonstrate that our approach provides a cost-effective, accurate, and robust solution for vineyard navigation.
IROS Conference 2022 Conference Paper
The wheel-bipedal robot has the advantages of both wheeled robots and legged robots, but as a cost, it is more challenging to perform flexible movements in various surroundings while keeping it balanced. The inaccurate dynamics of the robot makes the balance problem even more intractable. To solve this problem, the robot Ollie is used as a testbed. The whole-body control (WBC) framework is adopted to enhance the dexterity of the robot with multiple degrees of freedom in the task space. Moreover, a learning-based adaptive technique is applied to assist the WBC such that the balance controller can be designed in the absence of the accurate dynamics. Physical experiments demonstrate that the robot can manage various actions, with the help of the combination of the WBC and the learning-based adaptive technique.
IROS Conference 2022 Conference Paper
The global grape and wine industry has been considerably impacted by diseases such as downy mildew (DM). Agricultural robots have demonstrated great potential to accurately and rapidly map DM infection for precision applications. Although the robots can autonomously acquire high-resolution images in the vineyard, data processing is mostly performed offline because of network infrastructure and onboard computing power constraints, limiting the use of agricultural robots for field operations. To address this issue, we developed a semantic segmentation model based on the modified DeepLabv3 network for near real time DM segmentation in high resolution images. Compared with state-of-the-art real time semantic segmentation models, the developed one achieved the best efficiency-accuracy balance on the DM dataset using embedded computing devices that can be easily integrated with commercial robotic platforms. DM severity estimation pipeline based on the model also showed a comparable measurement accuracy and statistical power in differentiation of fungicide treatments as the one based on offline semantic segmentation models. This enables the use of robotic perception systems for field operations.
NeurIPS Conference 2020 Conference Paper
To learn good joint policies for multi-agent collaboration with incomplete information remains a fundamental challenge. While for two-player zero-sum games, coordinate-ascent approaches (optimizing one agent's policy at a time, e. g. , self-play) work with guarantees, in multi-agent cooperative setting they often converge to sub-optimal Nash equilibrium. On the other hand, directly modeling joint policy changes in incomplete information game is nontrivial due to complicated interplay of policies (e. g. , upstream updates affect downstream state reachability). In this paper, we show global changes of game values can be decomposed to policy changes localized at each information set, with a novel term named \emph{policy-change density}. Based on this, we propose \emph{Joint Policy Search} (JPS) that iteratively improves joint policies of collaborative agents in incomplete information games, without re-evaluating the entire game. On multiple collaborative tabular games, JPS is proven to never worsen performance and can improve solutions provided by unilateral approaches (e. g, CFR), outperforming algorithms designed for collaborative policy learning (e. g. BAD). Furthermore, for real-world game whose states are too many to enumerate, \ours{} has an online form that naturally links with gradient updates. We test it to Contract Bridge, a 4-player imperfect-information game where a team of $2$ collaborates to compete against the other. In its bidding phase, players bid in turn to find a good contract through a limited information channel. Based on a strong baseline agent that bids competitive bridge purely through domain-agnostic self-play, JPS improves collaboration of team players and outperforms WBridge5, a championship-winning software, by $+0. 63$ IMPs (International Matching Points) per board over $1000$ games, substantially better than previous SoTA ($+0. 41$ IMPs/b against WBridge5). Note that $+0. 1$ IMPs/b is regarded as a nontrivial improvement in Computer Bridge.
IJCAI Conference 2018 Conference Paper
In many classification applications, the amount of data from different categories usually vary significantly, such as software defect predication and medical diagnosis. Under such circumstances, it is essential to propose a proper method to solve the imbalance issue among the data. However, most of the existing methods mainly focus on improving the performance of classifiers rather than searching for an appropriate way to find an effective data space for classification. In this paper, we propose a method named Iterative Metric Learning (IML) to explore the correlations among imbalance data and construct an effective data space for classification. Given the imbalance training data, it is important to select a subset of training samples for each testing data. Thus, we aim to find a more stable neighborhood for testing data using the iterative metric learning strategy. To evaluate the effectiveness of the proposed method, we have conducted experiments on two groups of dataset, i. e. , the NASA Metrics Data Program (NASA) dataset and UCI Machine Learning Repository (UCI) dataset. Experimental results and comparisons with state-of-the-art methods have exhibited better performance of our proposed method.
EAAI Journal 2014 Journal Article