Author name cluster

Fang Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

1 author row

AAAI Conference 2026 Conference Paper

VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness

Qimao Chen
Fang Li
Shaoqing Xu
Zhiyi Lai
Zixun Xie
Yuechen Luo
Shengyin Jiang
Hanbing Li

The safe deployment of autonomous driving (AD) systems is fundamentally hindered by the long-tail problem, where rare yet critical driving scenarios are severely underrepresented in real-world data. Existing solutions including safety-critical scenario generation and closed-loop learning often rely on rule-based heuristics, resampling methods and generative models learned from offline datasets, limiting their ability to produce diverse and novel challenges. While recent works leverage Vision Language Models (VLMs) to produce scene descriptions that guide a separate, downstream model in generating hazardous trajectories for agents, such two-stage framework constrains the generative potential of VLMs, as the diversity of the final trajectories is ultimately limited by the generalization ceiling of the downstream algorithm. To overcome these limitations, we introduce VILTA (VLM-In-the-Loop Trajectory Adversary), a novel framework that integrates a VLM into the closed-loop training of AD agents. Unlike prior works, VILTA actively participates in the training loop by comprehending the dynamic driving environment and strategically generating challenging scenarios through direct, fine-grained editing of surrounding agents' future trajectories. This direct-editing approach fully leverages the VLM's powerful generalization capabilities to create a diverse curriculum of plausible yet challenging scenarios that extend beyond the scope of traditional methods. We demonstrate that our approach substantially enhances the safety and robustness of the resulting AD policy, particularly in its ability to navigate critical long-tail events.

PDF Details DOI

EAAI Journal 2025 Journal Article

A swarm intelligence framework in complex environments: Optimizing area coverage guidance and control

Jiahao Sun
Sen Han
Shifeng Ding
Lingxiao Yan
Fang Li
Li Zhou

With the development of artificial intelligence technology, deploying multiple Unmanned Surface Vehicles (multi-USVs) enhances efficiency and safety but introduces challenges including environmental disturbances, regulatory compliance (COLREGs), and collision avoidance. This study proposes an integrated framework addressing these through the Theta-Integrated Divide Areas Trajectory Planning (TDAP) algorithm for dynamic-constrained coverage planning, Nonlinear Model Predictive Control (NMPC) for robust trajectory tracking under wind/current variations, and a Dynamic Theta* (D-Theta*) algorithm with virtual obstacle-lines for COLREGs-compliant collision avoidance—including emergency evasion when target ships fail to act. Simulations demonstrate high coverage efficiency, precise trajectory tracking, and consistently safe navigation across diverse scenarios. The framework significantly improves multi-USV coordination and safety in complex environments, enabling reliable autonomous operations without direct human intervention.

Details DOI

AAAI Conference 2025 Conference Paper

GMAP: Generalized Manipulation of Articulated Objects in Robotic Using Pre-trained Model

Hongliang Zeng
Ping Zhang
Fang Li
QinPeng Yi
Tingyu Ye
Jiahua Wang

Perception and interaction with articulated objects present a unique challenge for service robots. Although recent research has emphasized understanding articulated shapes and affordance proposals, existing methods only address isolated aspects, failing to develop comprehensive strategies for robotic perception and manipulation of articulated objects. To bridge this gap, we propose GMAP, which systematically integrates the entire process from command to perception and manipulation. Specifically, we first perform precise part-level segmentation of the object and identify the geometric and kinematic parameters of articulated joints. Then, by evaluating point-level affordance proposals, we determine the interaction poses for the robot's end-effector. Finally, the robot's execution trajectory is dynamically computed by combining commands with joint parameters and interaction points. Additionally, a key innovation of GMAP is addressing the scarcity of annotated data. We designed a multi-scale point cloud feature extraction module and introduced pre-training and fine-tuning techniques, significantly enhancing the generalization capability of the perception model. Extensive experiments demonstrate that GMAP achieves state-of-the-art (SOTA) performance in both the perception and manipulation of articulated objects and adapts to real-world scenarios.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Learning Cocoercive Conservative Denoisers via Helmholtz Decomposition for Poisson Imaging Inverse Problems

Deliang Wei
Peng Chen
Haobo Xu
Jiale Yao
Fang Li
Tieyong Zeng

Plug-and-play (PnP) methods with deep denoisers have shown impressive results in imaging problems. They typically require strong convexity or smoothness of the fidelity term and a (residual) non-expansive denoiser for convergence. These assumptions, however, are violated in Poisson inverse problems, and non-expansiveness can hinder denoising performance. To address these challenges, we propose a cocoercive conservative (CoCo) denoiser, which may be (residual) expansive, leading to improved denoising performance. By leveraging the generalized Helmholtz decomposition, we introduce a novel training strategy that combines Hamiltonian regularization to promote conservativeness and spectral regularization to ensure cocoerciveness. We prove that CoCo denoiser is a proximal operator of a weakly convex function, enabling a restoration model with an implicit weakly convex prior. The global convergence of PnP methods to a stationary point of this restoration model is established. Extensive experimental results demonstrate that our approach outperforms closely related methods in both visual quality and quantitative metrics.

PDF Details

TMLR Journal 2025 Journal Article

MagicPose4D: Crafting Articulated Models with Appearance and Motion Control

Hao Zhang
Di Chang
Fang Li
Mohammad Soleymani
Narendra Ahuja

With the success of 2D and 3D visual generative models, there is growing interest in generating 4D content. Existing methods primarily rely on text prompts to produce 4D content, but they often fall short of accurately defining complex or rare motions. To address this limitation, we propose MagicPose4D, a novel framework for refined control over both appearance and motion in 4D generation. Unlike current 4D generation methods, MagicPose4D accepts monocular videos or mesh sequences as motion prompts, enabling precise and customizable motion control. MagicPose4D comprises two key modules: (i) Dual-Phase 4D Reconstruction Module which operates in two phases. The first phase focuses on capturing the model's shape using accurate 2D supervision and less accurate but geometrically informative 3D pseudo-supervision without imposing skeleton constraints. The second phase extracts the 3D motion (skeleton poses) using more accurate pseudo-3D supervision, obtained in the first phase, and introduces kinematic chain-based skeleton constraints to ensure physical plausibility. Additionally, we propose a Global-local Chamfer loss that aligns the overall distribution of predicted mesh vertices with the supervision while maintaining part-level alignment without extra annotations. (ii) Cross-category Motion Transfer Module leverages the extracted motion from the 4D reconstruction module and uses a kinematic-chain-based skeleton to achieve cross-category motion transfer. It ensures smooth transitions between frames through dynamic rigidity, facilitating robust generalization without additional training. Through extensive experiments, we demonstrate that MagicPose4D significantly improves the accuracy and consistency of 4D content generation, outperforming existing methods in various benchmarks.

PDF Details

IJCAI Conference 2025 Conference Paper

Pseudo-Label Reconstruction for Partial Multi-Label Learning

Yu Chen
Fang Li
Na Han
Guanbin Li
Hongbo Gao
Sixian Chan
Xiaozhao Fang

In Partial Multi-Label Learning (PML), each instance is associated with a candidate label set containing multiple relevant labels along with other false positive labels. Currently, most PML methods directly extract instance correlation from instance features while ignoring the candidate labels, which may contain more discriminative instance-related information. This paper argues that, with a well-designed model, more accurate instance correlation can be mined from the candidate labels to facilitate label disambiguation. To this end, we propose a novel PML method based on pseudo-label reconstruction (PML-PLR). Specifically, we first propose a novel orthogonal candidate label reconstruction method, which jointly optimizes with instance features to extract more consistent instance correlation. Then, we use instance correlation as reconstruction coefficient to reconstruct pseudo-labels. Subsequently, through local manifold learning, the reconstructed pseudo-labels are leveraged to propagate the consistency relationship between labels and instances, thereby improving the accuracy of pseudo-labels. Extensive experiments and analyses demonstrate that the proposed PML-PLR outperforms state-of-the-art methods.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes

Fang Li
Hao Zhang
Narendra Ahuja

Although COLMAP has long remained the predominant method for camera parameter optimization in static scenes, it is constrained by its lengthy runtime and reliance on ground truth (GT) motion masks for application to dynamic scenes. Many efforts attempted to improve it by incorporating more priors as supervision such as GT focal length, motion masks, 3D point clouds, camera poses, and metric depth, which, however, are typically unavailable in casually captured RGB videos. In this paper, we propose a novel method for more accurate and efficient camera parameter optimization in dynamic scenes solely supervised by a single RGB video, dubbed $\textbf{\textit{ROS-Cam}}$. Our method consists of three key components: (1) Patch-wise Tracking Filters, to establish robust and maximally sparse hinge-like relations across the RGB video. (2) Outlier-aware Joint Optimization, for efficient camera parameter optimization by adaptive down-weighting of moving outliers, without reliance on motion priors. (3) A Two-stage Optimization Strategy, to enhance stability and optimization speed by a trade-off between the Softplus limits and convex minima in losses. We visually and numerically evaluate our camera estimates. To further validate accuracy, we feed the camera estimates into a 4D reconstruction method and assess the resulting 3D scenes, and rendered 2D RGB and depth maps. We perform experiments on 4 real-world datasets (NeRF-DS, DAVIS, iPhone, and TUM-dynamics) and 1 synthetic dataset (MPI-Sintel), demonstrating that our method estimates camera parameters more efficiently and accurately with a single RGB video as the only supervision.

PDF Details

EAAI Journal 2024 Journal Article

A short-term forecasting for multi-factor time series with multiple linear trend fuzzy information granule and cross-association

Fang Li
Jingxian Ma
Xiyang Yang
Wei Deng

Multi-factor time series forecasting is of great significance in research and application, where capturing data characteristic and association are the main works. For data characteristic, the multiple linear trend fuzzy information granule is developed on multi-factor time series. This kind of granule accurately describes the multi-linear-trend information within the data, and exhibits high semantic and temporal interpretation. To distinguish the diverse trend information hidden in such granule, a fuzzy information granule clustering algorithm is raised, yielding the multi-factor cluster label series. Notably, each cluster label represents a class of trend patterns. Leveraging the characterized trend information, two multi-factor fuzzy association rules are mined, the multi-factor cluster label association rule and the multi-factor cluster label cross-association rule, reflecting the association and cross-association in multi-factor time series respectively. By combing the excavated data characteristic with fuzzy association rules, a short-term forecasting model is designed. This model wins the smallest root mean squared error, mean absolute percentage error, and mean absolute percentage error values in five stock time series forecasting analysis after comparing with other models, and the prediction comparisons of a statistical index (Wilcoxon signed rank test) are smaller than 0. 05. The superiority of the novel forecasting model can be demonstrated through the performance across various metrics and indicators.

Details DOI

AAAI Conference 2024 Conference Paper

CSL: Class-Agnostic Structure-Constrained Learning for Segmentation Including the Unseen

Hao Zhang
Fang Li
Lu Qi
Ming-Hsuan Yang
Narendra Ahuja

Addressing Out-Of-Distribution (OOD) Segmentation and Zero-Shot Semantic Segmentation (ZS3) is challenging, necessitating segmenting unseen classes. Existing strategies adapt the class-agnostic Mask2Former (CA-M2F) tailored to specific tasks. However, these methods cater to singular tasks, demand training from scratch, and we demonstrate certain deficiencies in CA-M2F, which affect performance. We propose the Class-Agnostic Structure-Constrained Learning (CSL), a plug-in framework that can integrate with existing methods, thereby embedding structural constraints and achieving performance gain, including the unseen, specifically OOD, ZS3, and domain adaptation (DA) tasks. There are two schemes for CSL to integrate with existing methods (1) by distilling knowledge from a base teacher network, enforcing constraints across training and inference phrases, or (2) by leveraging established models to obtain per-pixel distributions without retraining, appending constraints during the inference phase. Our soft assignment and mask split methodologies enhance OOD object segmentation. Empirical evaluations demonstrate CSL's prowess in boosting the performance of existing algorithms spanning OOD segmentation, ZS3, and DA segmentation, consistently transcending the state-of-art across all three tasks.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Hongliang Zeng
Ping Zhang
Chengjiong Wu
Jiahua Wang
Tingyu Ye
Fang Li

Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characterization. It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features, coupled with reinforcement learning-based active sensing for autonomous optimization of observation viewpoints. In experiments conducted with various articulated object instances from the PartNet-Mobility dataset, our method outperformed current state-of-the-art methods in joint parameter estimation accuracy. Additionally, through active sensing, MARS further reduces errors, demonstrating enhanced efficiency in handling suboptimal viewpoints. Furthermore, our method effectively generalizes to real-world articulated objects, enhancing robot interactions. Code is available at https: //github. com/robhlzeng/MARS.

PDF Details DOI

AIIM Journal 2024 Journal Article

OphGLM: An ophthalmology large language-and-vision assistant

Zhuo Deng
Weihao Gao
Chucheng Chen
Zhiyuan Niu
Zheng Gong
Ruiheng Zhang
Zhenjie Cao
Fang Li

Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human–computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying. Unfortunately, no practical solutions exist before Large Language Models(LLMs). In this paper, we investigate the ophthalmic visual diagnostic interaction problem. We construct an ophthalmology large language-and-vision assistant, OphGLM, consisting of an image encoder, a text encoder, a fusion module, and an LLM module. We establish a new Chinese ophthalmic fine-tuning dataset, FundusTuning-CN, including the fundus instruction and conversation sets. Based on FundusTuning-CN, we establish a novel LLM-tuning strategy to introduce visual model understanding and ophthalmic knowledge into LLMs at a low cost and high efficiency. Leveraging the pre-training of the image encoder, OphGLM demonstrates strong visual understanding and surpasses open-source visual language models in common fundus disease classification tasks. The FundusTuning-CN enables OphGLM to surpass open-source medical LLMs in both ophthalmic knowledge and interactive capabilities. Our proposed OphGLM has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be publicly available at https: //github. com/ML-AILab/OphGLM.

Details DOI

EAAI Journal 2022 Journal Article

Incorporate long association into high-order fuzzy logical relationship based time series forecasting

Fang Li
Chen Liu
Xiyang Yang

In fuzzy logical relationship (FLR) based forecasting models, FLRs play a vital role. In each FLR of such models, the engaged observations in premise and consequent are consecutive at time. The association (named short association in this paper) implied by the premise and consequent of such an FLR can reflect some of the properties and regularities hidden in a time series, which has been verified by more successful applications. However, there exists other kind of associations that cannot be described by short association. In this paper, the concept of long association to describe these different associations is proposed. And long-association FLRs are then proposed to reflect such long associations. For a long-association FLR, the engaged observations in the premise and consequent are not consecutive at time. This new kind of FLRs exists more and can overcome one deficiency of the existing FLR based forecasting models: no FLR available for forecasting occurred often at some prediction moments. Departure from long-association FLRs, we construct trend long-association FLRs to reflect the trend long associations in time series and design a novel forecasting model. The advantages of the proposed (trend) long-association FLRs and the superiority of the proposed model are verified in experiments with comparisons with other forecasting models.

Details DOI

YNICL Journal 2018 Journal Article

Voxel-based comparison of brain glucose metabolism between patients with Cushing's disease and healthy subjects

Shuai Liu
Yinyan Wang
Kaibin Xu
Fan Ping
Fang Li
Renzhi Wang
Xin Cheng

F]-fluorodeoxyglucose positron emission tomography (FDG PET), between 92 patients with CD and 118 normal subjects on a voxel-wise basis. Pearson correlation was performed to evaluate the association between cerebral FDG uptake and serum cortisol level in patients with CD. We demonstrated that certain brain regions in patients with CD showed significantly increased FDG uptake, including the basal ganglia, anteromedial temporal lobe, thalamus, precentral cortex, and cerebellum. The clusters that demonstrated significantly decreased uptake were mainly located in the medial and lateral frontal cortex, superior and inferior parietal lobule, medial occipital cortex, and insular cortex. The metabolic rate of the majority of these regions was found to be significantly correlated with the serum cortisol level. Our findings may help to explain the underlying mechanisms of cognitive impairment and psychiatric symptoms in patients exposed to excessive glucocorticoids and evaluate the efficacy of treatments during follow-up.

Details DOI

YNICL Journal 2016 Journal Article

Brain glucose metabolism is associated with hormone level in Cushing's disease: A voxel-based study using FDG-PET

Shuai Liu
Yinyan Wang
Kaibin Xu
Fan Ping
Renzhi Wang
Fang Li
Xin Cheng

Chronic exposure to elevated levels of glucocorticoids can exert a neurotoxic effect in patients, possibly manifesting as molecular imaging alterations in patients. The aim of this study was to investigate the potential association between brain metabolism and elevated hormone level using (18)F-fluorodeoxyglucose positron emission tomography. We retrospectively enrolled 92 consecutive patients with confirmed diagnosis of Cushing's disease. A voxel-based analysis was performed to investigate the association between cerebral (18)F-fluorodeoxyglucose uptake and serum cortisol level. Relatively impaired metabolism of specific brain regions correlated with serum cortisol level was found. Specifically, notable correlations were found in the hippocampus, amygdala, and cerebellum, regions considered to be involved in the regulation and central action of glucocorticoids. Moreover, some hormone-associated regions were found in the frontal and occipital cortex, possibly mediating the cognitive changes seen in Cushing's disease. Our findings link patterns of perturbed brain metabolism relates to individual hormone level, thus presenting a substrate for cognitive disturbances seen in Cushing's disease patients, as well as in other conditions with abnormal cortisol levels.

Details DOI