Arrow Research search

Author name cluster

Cheng Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

55 papers
2 author rows

Possible papers

55

AAAI Conference 2026 Conference Paper

Beyond Adapter Retrieval: Latent Geometry-Preserving Composition via Sparse Task Projection

  • Pengfei Jin
  • Peng Shu
  • Sifan Song
  • Sekeun Kim
  • Qing Xiao
  • Cheng Chen
  • Tianming Liu
  • Xiang Li

Recent advances in parameter-efficient transfer learning have demonstrated the utility of composing LoRA adapters from libraries of pretrained modules. However, most existing approaches rely on simple retrieval heuristics or uniform averaging, which overlook the latent structure of task relationships in representation space. We propose a new framework for adapter reuse that moves beyond retrieval, formulating adapter composition as a geometry-aware sparse reconstruction problem. Specifically, we represent each task by a latent prototype vector derived from the base model’s encoder and aim to approximate the target task prototype as a sparse linear combination of retrieved reference prototypes, under an L1-regularized optimization objective. The resulting combination weights are then used to blend the corresponding LoRA adapters, yielding a composite adapter tailored to the target task. This formulation not only preserves the local geometric structure of the task representation manifold, but also promotes interpretability and efficient reuse by selecting a minimal set of relevant adapters. We demonstrate the effectiveness of our approach across multiple domains—including medical image segmentation, medical report generation and image synthesis. Our results highlight the benefit of coupling retrieval with latent geometry-aware optimization for improved zero-shot generalization.

AAAI Conference 2026 Conference Paper

DGP: A Dual-Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs

  • Yuan Li
  • Jun Hu
  • Bryan Hooi
  • Bingsheng He
  • Cheng Chen

Real-world fraud detection applications benefit from graph learning techniques that jointly exploit node features—often rich in textual data—and graph structural information. Recently, Graph-Enhanced LLMs have emerged as a promising graph learning approach that converts graph information into prompts, exploiting LLMs' ability to reason over both textual and structural information. Among them, text-only prompting, which converts graph information into prompts consisting solely of text tokens, offers a solution that relies only on LLM tuning without requiring additional graph-specific encoders. However, text-only prompting struggles on heterogeneous fraud-detection graphs: multi-hop relations expand exponentially with each additional hop, leading to rapidly growing neighborhoods associated with dense textual information. These neighborhoods may overwhelm the model with long, irrelevant content in the prompt and suppress key signals from the target node, thereby degrading performance. To address this challenge, we propose Dual Granularity Prompting (DGP), which mitigates information overload by preserving fine-grained textual details for the target node while summarizing neighbor information into coarse-grained text prompts. DGP introduces tailored summarization strategies for different data modalities—bi-level semantic abstraction for textual fields and statistical aggregation for numerical features—enabling effective compression of verbose neighbor content into concise, informative prompts. Experiments across public and industry datasets demonstrate that DGP operates within a manageable token budget while improving fraud detection performance by up to 6.8% (AUPRC) over state-of-the-art methods, showing the potential of Graph-Enhanced LLMs for fraud detection.

AAAI Conference 2026 Conference Paper

Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience

  • Zicheng Hu
  • Yuchen Wang
  • Cheng Chen

Decentralized cooperative multi-agent multi-armed bandits (DeCMA2B) considers how multiple agents collaborate in a decentralized multi-armed bandit setting. Though this problem has been extensively studied in previous work, most existing methods remain susceptible to various adversarial attacks. In this paper, we first study DeCMA2B with adversarial corruption, where an adversary can corrupt reward observations of all agents with a limited corruption budget. We propose a robust algorithm, called DeMABAR, which ensures that each agent’s individual regret suffers only an additive term proportional to the corruption budget. Then we consider a more realistic scenario where the adversary can only attack a small number of agents. Our theoretical analysis shows that the DeMABAR algorithm can also almost completely eliminate the influence of adversarial attacks and is inherently robust in the Byzantine setting, where an unknown fraction of the agents can be Byzantine, i.e., may arbitrarily select arms and communicate wrong information. We also conduct numerical experiments to illustrate the robustness and effectiveness of the proposed method.

AAAI Conference 2026 Conference Paper

Tackling Dual-stage Missing Modalities in Brain Tumor Segmentation via Robust Modality Reconstruction and Prompt-guided Modality Adaptation

  • Yunpeng Zhao
  • Cheng Chen
  • Qing You Pang
  • Yibing Fu
  • Quanzheng Li
  • Carol Tang
  • Beng Ti Ang
  • Yueming Jin

Addressing missing modalities is a critical challenge in multimodal brain tumor segmentation. Most existing approaches merely handle modality-incomplete inputs during inference, assuming a full set of modalities for all training samples. However, this unrealistic assumption limits the usage of abundant modality-incomplete data commonly observed in clinical practice. In this paper, we explore a more practical task of tackling missing modalities during both training and inference. We propose a universal model featuring robust modality reconstruction and prompt-guided modality adaptation. Our mask-reconstruction pre-training enables robust modality-invariant representation learning, during which we design a novel distribution approximation method that supervises the reconstruction of absent modalities without requiring full-modal training data. Afterwards, when adapting our model to the segmentation task, we introduce the complete-then-distill (CTD) paradigm, which first estimates missing modalities in training samples from the available ones, and then distills the knowledge from the reconstructed full-modal representations to enhance learning from modality-incomplete data. Moreover, we propose prompt-guided modality adaptation to personalize a subset of model parameters during CTD, enabling the model to adapt to each distinct modality input scenario by using prompts with rich visual-textual information. Extensive experiments on two brain tumor segmentation benchmarks show our method consistently surpasses previous state-of-the-art approaches under dual-stage missing modality settings across various missing ratios.

AAAI Conference 2026 Conference Paper

Unleashing the Power of Image-Tabular Self-Supervised Learning via Breaking Cross-Tabular Barriers

  • Yibing Fu
  • Yunpeng Zhao
  • Zhitao Zeng
  • Cheng Chen
  • Yueming Jin

Multi-modal learning integrating medical images and tabular data has significantly advanced clinical decision-making in recent years. Self-Supervised Learning (SSL) has emerged as a powerful paradigm for pretraining these models on large-scale unlabeled image-tabular data, aiming to learn discriminative representations. However, existing SSL methods for image-tabular representation learning are often confined to specific data cohorts, mainly due to their rigid tabular modeling mechanisms when modeling heterogeneous tabular data. This inter-tabular barrier hinders the multi-modal SSL methods from effectively learning transferrable medical knowledge shared across diverse cohorts. In this paper, we propose a novel SSL framework, namely CITab, designed to learn powerful multi-modal feature representations in a cross-tabular manner. We design the tabular modeling mechanism from a semantic-awareness perspective by integrating column headers as semantic cues, which facilitates transferrable knowledge learning and the scalability in utilizing multiple data sources for pretraining. Additionally, we propose a prototype-guided mixture-of-linear layer (P-MoLin) module for tabular feature specialization, empowering the model to effectively handle the heterogeneity of tabular data and explore the underlying medical concepts. We conduct comprehensive evaluations on Alzheimer's disease diagnosis task across three publicly available data cohorts containing 4,461 subjects. Experimental results demonstrate that CITab outperforms state-of-the-art approaches, paving the way for effective and scalable cross-tabular multi-modal learning.

AAAI Conference 2026 Conference Paper

Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction

  • Cheng Chen
  • Hao Huang
  • Saurabh Bagchi

Collaborative perception enables connected vehicles to share information, overcoming occlusions and extending the limited sensing range inherent in single-agent (non-collaborative) systems. Existing vision-only methods for 3D semantic occupancy prediction commonly rely on dense 3D voxels, which incur high communication costs, or 2D planar features, which require accurate depth estimation or additional supervision, limiting their applicability to collaborative scenarios. To address these challenges, we propose the first approach leveraging sparse 3D semantic Gaussian splatting for collaborative 3D semantic occupancy prediction. By sharing and fusing intermediate Gaussian primitives, our method provides three benefits: a neighborhood-based cross-agent fusion that removes duplicates and suppresses noisy or inconsistent Gaussians; a joint encoding of geometry and semantics in each primitive, which reduces reliance on depth supervision and allows simple rigid alignment; and sparse, object-centric messages that preserve structural information while reducing communication volume. Extensive experiments demonstrate that our approach outperforms single-agent perception and baseline collaborative methods by +8.42 and +3.28 points in mIoU, and +5.11 and +22.41 points in IoU, respectively. When further reducing the number of transmitted Gaussians, our method still achieves a +1.9 improvement in mIoU, using only 34.6% communication volume, highlighting robust performance under limited communication budgets.

NeurIPS Conference 2025 Conference Paper

A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond

  • Zicheng Hu
  • Cheng Chen

We investigate various stochastic bandit problems in the presence of adversarial corruptions. A seminal work for this problem is the BARBAR~\cite{gupta2019better} algorithm, which achieves both robustness and efficiency. However, it suffers from a regret of $O(KC)$, which does not match the lower bound of $\Omega(C)$, where $K$ denotes the number of arms and $C$ denotes the corruption level. In this paper, we first improve the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of $K$ to achieve an optimal regret bound up to a logarithmic factor. We also extend BARBAT to various settings, including multi-agent bandits, graph bandits, combinatorial semi-bandits and batched bandits. Compared with the Follow-the-Regularized-Leader framework, our methods are more amenable to parallelization, making them suitable for multi-agent and batched bandit settings, and they incur lower computational costs, particularly in semi-bandit problems. Numerical experiments verify the efficiency of the proposed methods.

ICML Conference 2025 Conference Paper

ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization

  • Wenhao Shen
  • Wanqi Yin
  • Xiaofeng Yang
  • Cheng Chen
  • Chaoyue Song
  • Zhongang Cai
  • Lei Yang 0045
  • Hao Wang 0094

Human mesh recovery (HMR) from a single image is inherently ill-posed due to depth ambiguity and occlusions. Probabilistic methods have tried to solve this by generating numerous plausible 3D human mesh predictions, but they often exhibit misalignment with 2D image observations and weak robustness to in-the-wild images. To address these issues, we propose ADHMR, a framework that A ligns a D iffusion-based HMR model in a preference optimization manner. First, we train a human mesh prediction assessment model, HMR-Scorer, capable of evaluating predictions even for in-the-wild images without 3D annotations. We then use HMR-Scorer to create a preference dataset, where each input image has a pair of winner and loser mesh predictions. This dataset is used to finetune the base model using direct preference optimization. Moreover, HMR-Scorer also helps improve existing HMR models by data cleaning, even with fewer training samples. Extensive experiments show that ADHMR outperforms current state-of-the-art methods. Code is available at: https: //github. com/shenwenhao01/ADHMR.

JBHI Journal 2025 Journal Article

Automated Pediatric Delirium Recognition via Deep Learning-Powered Video Analysis

  • Jiarong Chen
  • Suqin Xia
  • Wenqi Shi
  • Yemin Gong
  • Yali Huang
  • Lixiang Gu
  • Xiaoyu Lin
  • Haibao Chen

Delirium is an acute, fluctuating state of consciousness disturbance characterized by cognitive alterations and perceptual disturbances. Pediatric delirium has a notably higher incidence rate than adult delirium, and it is time-consuming and labor-intensive for clinicians to analyze, requiring effective recognition approaches. Deep learning has shown potential for the extraction of robust representations and improvement of patient outcomes. In this study, 129 video samples labeled by professional clinicians were collected from multiple hospitals, including 74 non-delirium and 55 delirium labeled samples. An 18-layer deep spatiotemporal convolutional neural network is employed, in which two-dimensional and one-dimensional convolutional filters are applied to individual video frames to extract frame-level and inter-frame-level features, respectively. The entire architecture is pretrained on a large-scale video analysis dataset, and a three-layer fully connected classification head is integrated for the delirium recognition task. The proposed model was fine-tuned with a training dataset and evaluated on a testing dataset, exploring various models and strategies. The proposed algorithm demonstrated robust classification performance, achieving an accuracy of 0. 8718, precision of 0. 8711, recall of 0. 8730, and F1-score of 0. 8715, with approximately 31. 54 million model parameters. These metric results validate the clinical applicability and technical reliability of the model under various training and testing strategies. In addition, the developed delirium classification model is deployed a hospital system to enable intelligent video diagnosis. The independent test accuracy for 100 newly collected samples is 0. 8800. Therefore, the proposed algorithm enables new methods for pediatric delirium recognition and cures.

JBHI Journal 2025 Journal Article

Development of a tongue image-based machine learning tool for the diagnosis of colorectal cancer: a prospective multicentre clinical cohort study

  • Xiaohe Sun
  • Letian Huang
  • Libo Qu
  • Cheng Chen
  • Xing Zeng
  • Zuojian Zhou
  • Hongyan Li
  • Jin Sun

Colorectal cancer (CRC) remains a persistent major global health burden, with traditional diagnostic methods like colonoscopy suffering from suboptimal patient compliance rates. This study develops an intelligent diagnostic model based on tongue images to assist in CRC diagnosis, leveraging the integrative potential of traditional tongue diagnosis and modern machine learning. Between June 2023 and July 2024, we collected and processed 1, 389 tongue images from CRC patients and 1, 543 from non-colorectal cancer (NCRC) participants. Our methodology combines innovative image segmentation using the Segment Anything Model (SAM) with Grounding DINO, extracts both hand-crafted features (color, texture, shape) and deep learning features via Swin-Transformer, and employs feature fusion and selection techniques. The diagnostic model achieves an accuracy of 87. 93% (F1-score: 0. 9072) in internal validation. In an independent external cohort of 119 CRC patients and 221 NCRC participants, it demonstrates 85. 18% precision (recall: 85%, F1-score: 0. 8507). This noninvasive, cost-effective approach demonstrates significant potential as a complementary screening tool for CRC, particularly in regions with limited access to conventional diagnostic resources.

JBHI Journal 2025 Journal Article

MediViSTA: Medical Video Segmentation Via Temporal Fusion SAM Adaptation for Echocardiography

  • Sekeun Kim
  • Pengfei Jin
  • Cheng Chen
  • Kyungsang Kim
  • Zhiliang Lyu
  • Hui Ren
  • Sunghwan Kim
  • Zhengliang Liu

Despite achieving impressive results in general-purpose semantic segmentation with strong generalization on natural images, the Segment Anything Model (SAM) has shown less precision and stability in medical image segmentation. In particular, the original SAM architecture is designed for 2D natural images and is therefore not support to handle three-dimensional information, which is particularly important for medical imaging modalities that are often volumetric or video data. In this paper, we introduce MediViSTA, a parameter-efficient fine-tuning method designed to adapt the vision foundation model for medical video, with a specific focus on echocardiography segmentation. To achieve spatial adaptation, we propose a frequency feature fusion technique that injects spatial frequency information from a CNN branch. For temporal adaptation, we integrate temporal adapters within the transformer blocks of the image encoder. Using a fine-tuning strategy, only a small subset of pre-trained parameters is updated, allowing efficient adaptation to echocardiography data. The effectiveness of our method has been comprehensively evaluated on three datasets, comprising two public datasets and one multi-center in-house dataset. Our method consistently outperforms various state-of-the-art approaches without using any prompts. Furthermore, our model exhibits strong generalization capabilities on unseen datasets, surpassing the second-best approach by 2. 15% in Dice and 0. 09 in temporal consistency. The results demonstrate the potential of MediViSTA to significantly advance echocardiography video segmentation, offering improved accuracy and robustness in cardiac assessment applications.

JBHI Journal 2025 Journal Article

Multi-Organ Segmentation From Partially Labeled and Unaligned Multi-Modal MRI in Thyroid-Associated Orbitopathy

  • Cheng Chen
  • Min Deng
  • Yuan Zhong
  • Jinyue Cai
  • Karen Kar Wun Chan
  • Qi Dou
  • Kelvin Kam Lung Chong
  • Pheng-Ann Heng

Thyroid-associated orbitopathy (TAO) is a prevalent inflammatory autoimmune disorder, leading to orbital disfigurement and visual disability. Automatic comprehensive segmentation tailored for quantitative multi-modal MRI assessment of TAO holds enormous promise but is still lacking. In this paper, we propose a novel method, named cross-modal attentive self-training (CMAST), for the multi-organ segmentation in TAO using partially labeled and unaligned multi-modal MRI data. Our method first introduces a dedicatedly designed cross-modal pseudo label self-training scheme, which leverages self-training to refine the initial pseudo labels generated by cross-modal registration, so as to complete the label sets for comprehensive segmentation. With the obtained pseudo labels, we further devise a learnable attentive fusion module to aggregate multi-modal knowledge based on learned cross-modal feature attention, which relaxes the requirement of pixel-wise alignment across modalities. A prototypical contrastive learning loss is further incorporated to facilitate cross-modal feature alignment. We evaluate our method on a large clinical TAO cohort with 100 cases of multi-modal orbital MRI. The experimental results demonstrate the promising performance of our method in achieving comprehensive segmentation of TAO-affected organs on both T1 and T1c modalities, outperforming previous methods by a large margin. Our code is available at: https://github.com/cchen-cc/CMAST.

JBHI Journal 2025 Journal Article

TD-SAM: Temporal and Distance-Guided Adaptations of SAM for Accurate Surgical Instrument Segmentation

  • Cheng Xue
  • Shiyu Zhao
  • Danqiong Wang
  • Cheng Chen
  • Guanyu Yang
  • Yang Chen

Accurate automatic surgical instrument segmentation plays a crucial role in robot-assisted surgery, but analyzing surgical videos remains challenging due to factors such as rapid instrument movements, high inter-category similarity, and frequent object occlusions. Current surgical instrument segmentation models struggle to capture both inter-frame variations and intra-frame details in complex surgical scenarios. The Segment Anything Model (SAM) has shown significant potential in various segmentation tasks. However, it has not fully addressed the unique challenges posed by surgical videos. To tackle these issues, we propose a Temporal and Distance-Guided SAM model (TD-SAM) for accurate surgical instrument segmentation. Specifically, we introduce a dynamic cross-frame attention module that effectively captures temporal information across frames, allowing the model to track the dynamic changes of surgical instruments and their environment, thus improving segmentation accuracy. In addition, we present a distance-guided instance refinement module, which enhances the model's ability to distinguish between similar categories, mitigating the class ambiguity caused by inter-category similarity. Extensive experiments on the EndoVis18 and EndoVis17 datasets show that the proposed TD-SAM model outperforms existing models, achieving state-of-the-art performance without using any prompts.

ICLR Conference 2025 Conference Paper

Text-to-Image Rectified Flow as Plug-and-Play Priors

  • Xiaofeng Yang
  • Cheng Chen
  • Xulei Yang
  • Fayao Liu
  • Guosheng Lin

Large-scale diffusion models have achieved remarkable performance in generative tasks. Beyond their initial training applications, these models have proven their ability to function as versatile plug-and-play priors. For instance, 2D diffusion models can serve as loss functions to optimize 3D implicit models. Rectified Flow, a novel class of generative models, has demonstrated superior performance across various domains. Compared to diffusion-based methods, rectified flow approaches surpass them in terms of generation quality and efficiency. In this work, we present theoretical and experimental evidence demonstrating that rectified flow based methods offer similar functionalities to diffusion models — they can also serve as effective priors. Besides the generative capabilities of diffusion priors, motivated by the unique time-symmetry properties of rectified flow models, a variant of our method can additionally perform image inversion. Experimentally, our rectified flow based priors outperform their diffusion counterparts — the SDS and VSD losses — in text-to-3D generation. Our method also displays competitive performance in image inversion and editing. Code is available at: https://github.com/yangxiaofeng/rectified_flow_prior.

TMLR Journal 2025 Journal Article

Unlearning Misalignment for Personalized LLM Adaptation via Instance-Response-Dependent Discrepancies

  • Cheng Chen
  • Atsushi Nitanda
  • Ivor Tsang

While Large Language Models (LLMs) have revolutionized chatbot interactions, they often fall short of aligning responses with the nuanced preferences of individual users, a challenge rooted in the inherently subjective and proprietary nature of those preferences. Consequently, prompt-based learning, though effective in enhancing factual accuracy due to its emphasis on universal correctness, remains insufficient for achieving accurate personalised response alignment. Because user preferences vary widely across individuals and contexts, aligning responses requires a more personalized and context-aware approach. To address this limitation, we propose Consistent Marginalization (CM), a novel framework that aims to unlearn misalignment by constructing a personalised memory bank of instance-response-dependent discrepancies, built from a small set of user preference samples. This personalised memory bank equips LLMs with the ability to understand, recall, and adapt to individual preferences, enabling more consistent and personalized responses. Evaluated across a diverse range of domain-specific datasets and model architectures, CM yields notable improvements in response alignment and robustness. We believe Consistent Marginalization represents a valuable step toward enabling LLMs to become genuinely personable and adaptive conversational agents by understanding user preferences and generating responses that are better aligned with individual user expectations.

NeurIPS Conference 2024 Conference Paper

CoIN: A Benchmark of Continual Instruction Tuning for Multimodel Large Language Models

  • Cheng Chen
  • Junchen Zhu
  • Xu Luo
  • Heng T. Shen
  • Jingkuan Song
  • Lianli Gao

Instruction tuning demonstrates impressive performance in adapting Multimodal Large Language Models (MLLMs) to follow task instructions and improve generalization ability. By extending tuning across diverse tasks, MLLMs can further enhance their understanding of world knowledge and instruction intent. However, continual instruction tuning has been largely overlooked and there are no public benchmarks available. In this paper, we present CoIN, a comprehensive benchmark tailored for assessing the behavior of existing MLLMs under continual instruction tuning. CoIN comprises 10 meticulously crafted datasets spanning 8 tasks, ensuring diversity and serving as a robust evaluation framework to assess crucial aspects of continual instruction tuning, such as task order, instruction diversity and volume. Additionally, apart from traditional evaluation, we design another LLM-based metric to assess the knowledge preserved within MLLMs for reasoning. Following an in-depth evaluation of several MLLMs, we demonstrate that they still suffer catastrophic forgetting, and the failure in instruction alignment assumes the main responsibility, instead of reasoning knowledge forgetting. To this end, we introduce MoELoRA which is effective in retaining the previous instruction alignment.

ICLR Conference 2024 Conference Paper

Fast Updating Truncated SVD for Representation Learning with Sparse Matrices

  • Haoran Deng
  • Yang Yang 0009
  • Jiahe Li 0008
  • Cheng Chen
  • Weihao Jiang
  • Shiliang Pu

Updating truncated Singular Value Decomposition (SVD) has extensive applications in representation learning. The continuous evolution of massive-scaled data matrices in practical scenarios highlights the importance of aligning SVD-based models with fast-paced updates. Recent methods for updating truncated SVD can be recognized as Rayleigh-Ritz projection procedures where their projection matrices are augmented based on the original singular vectors. However, the updating process in these methods densifies the update matrix and applies the projection to all singular vectors, resulting in inefficiency. This paper presents a novel method for dynamically approximating the truncated SVD of a sparse and temporally evolving matrix. The proposed method takes advantage of sparsity in the orthogonalization process of the augment matrices and employs an extended decomposition to store projections in the column space of singular vectors independently. Numerical experimental results on updating truncated SVD for evolving sparse matrices show an order of magnitude improvement in the efficiency of our proposed method while maintaining precision comparing to previous methods.

NeurIPS Conference 2024 Conference Paper

Few-Shot Diffusion Models Escape the Curse of Dimensionality

  • Ruofeng Yang
  • Bo Jiang
  • Cheng Chen
  • Ruinan Jin
  • Baoxiang Wang
  • Shuai Li

While diffusion models have demonstrated impressive performance, there is a growing need for generating samples tailored to specific user-defined concepts. The customized requirements promote the development of few-shot diffusion models, which use limited $n_{ta}$ target samples to fine-tune a pre-trained diffusion model trained on $n_s$ source samples. Despite the empirical success, no theoretical work specifically analyzes few-shot diffusion models. Moreover, the existing results for diffusion models without a fine-tuning phase can not explain why few-shot models generate great samples due to the curse of dimensionality. In this work, we analyze few-shot diffusion models under a linear structure distribution with a latent dimension $d$. From the approximation perspective, we prove that few-shot models have a $\widetilde{O}(n_s^{-2/d}+n_{ta}^{-1/2})$ bound to approximate the target score function, which is better than $n_{ta}^{-2/d}$ results. From the optimization perspective, we consider a latent Gaussian special case and prove that the optimization problem has a closed-form minimizer. This means few-shot models can directly obtain an approximated minimizer without a complex optimization process. Furthermore, we also provide the accuracy bound $\widetilde{O}(1/n_{ta}+1/\sqrt{n_s})$ for the empirical solution, which still has better dependence on $n_{ta}$ compared to $n_s$. The results of the real-world experiments also show that the models obtained by only fine-tuning the encoder and decoder specific to the target distribution can produce novel images with the target feature, which supports our theoretical results.

AAAI Conference 2024 Conference Paper

Robustness Verification of Deep Reinforcement Learning Based Control Systems Using Reward Martingales

  • Dapeng Zhi
  • Peixin Wang
  • Cheng Chen
  • Min Zhang

Deep Reinforcement Learning (DRL) has gained prominence as an effective approach for control systems. However, its practical deployment is impeded by state perturbations that can severely impact system performance. Addressing this critical challenge requires robustness verification about system performance, which involves tackling two quantitative questions: (i) how to establish guaranteed bounds for expected cumulative rewards, and (ii) how to determine tail bounds for cumulative rewards. In this work, we present the first approach for robustness verification of DRL-based control systems by introducing reward martingales, which offer a rigorous mathematical foundation to characterize the impact of state perturbations on system performance in terms of cumulative rewards. Our verified results provide provably quantitative certificates for the two questions. We then show that reward martingales can be implemented and trained via neural networks, against different types of control policies. Experimental results demonstrate that our certified bounds tightly enclose simulation outcomes on various DRL-based control systems, indicating the effectiveness and generality of the proposed approach.

TMLR Journal 2023 Journal Article

Assisted Learning for Organizations with Limited Imbalanced Data

  • Cheng Chen
  • Jiaying Zhou
  • Jie Ding
  • Yi Zhou

In the era of big data, many big organizations are integrating machine learning into their work pipelines to facilitate data analysis. However, the performance of their trained models is often restricted by limited and imbalanced data available to them. In this work, we develop an assisted learning framework for assisting organizations to improve their learning performance. The organizations have sufficient computation resources but are subject to stringent data-sharing and collaboration policies. Their limited imbalanced data often cause biased inference and sub-optimal decision-making. In assisted learning, an organizational learner purchases assistance service from an external service provider and aims to enhance its model performance within only a few assistance rounds. We develop effective stochastic training algorithms for both assisted deep learning and assisted reinforcement learning. Different from existing distributed algorithms that need to frequently transmit gradients or models, our framework allows the learner to only occasionally share information with the service provider, but still obtain a model that achieves near-oracle performance as if all the data were centralized.

NeurIPS Conference 2023 Conference Paper

Block Broyden's Methods for Solving Nonlinear Equations

  • Chengchang Liu
  • Cheng Chen
  • Luo Luo
  • John C. S. Lui

This paper studies quasi-Newton methods for solving nonlinear equations. We propose block variants of both good and bad Broyden's methods, which enjoy explicit local superlinear convergence rates. Our block good Broyden's method has faster condition-number-free convergence rate than existing Broyden's methods because it takes the advantage of multiple rank modification on the Jacobian estimator. On the other hand, our block bad Broyden's method directly estimates the inverse of the Jacobian provably, which reduces the computational cost of the iteration. Our theoretical results provide some new insights on why good Broyden's method outperforms bad Broyden's method in most of the cases. The empirical results also demonstrate the superiority of our methods and validate our theoretical analysis.

NeurIPS Conference 2023 Conference Paper

Boosting Verification of Deep Reinforcement Learning via Piece-Wise Linear Decision Neural Networks

  • Jiaxu Tian
  • Dapeng Zhi
  • Si Liu
  • Peixin Wang
  • Cheng Chen
  • Min Zhang

Formally verifying deep reinforcement learning (DRL) systems suffers from both inaccurate verification results and limited scalability. The major obstacle lies in the large overestimation introduced inherently during training and then transforming the inexplicable decision-making models, i. e. , deep neural networks (DNNs), into easy-to-verify models. In this paper, we propose an inverse transform-then-train approach, which first encodes a DNN into an equivalent set of efficiently and tightly verifiable linear control policies and then optimizes them via reinforcement learning. We accompany our inverse approach with a novel neural network model called piece-wise linear decision neural networks (PLDNNs), which are compatible with most existing DRL training algorithms with comparable performance against conventional DNNs. Our extensive experiments show that, compared to DNN-based DRL systems, PLDNN-based systems can be more efficiently and tightly verified with up to $438$ times speedup and a significant reduction in overestimation. In particular, even a complex $12$-dimensional DRL system is efficiently verified with up to 7 times deeper computation steps.

NeurIPS Conference 2023 Conference Paper

Uncertainty Estimation for Safety-critical Scene Segmentation via Fine-grained Reward Maximization

  • Hongzheng Yang
  • Cheng Chen
  • Yueyao CHEN
  • Scheppach Scheppach
  • Hon Chi Yip
  • DOU QI

Uncertainty estimation plays an important role for future reliable deployment of deep segmentation models in safety-critical scenarios such as medical applications. However, existing methods for uncertainty estimation have been limited by the lack of explicit guidance for calibrating the prediction risk and model confidence. In this work, we propose a novel fine-grained reward maximization (FGRM) framework, to address uncertainty estimation by directly utilizing an uncertainty metric related reward function with a reinforcement learning based model tuning algorithm. This would benefit the model uncertainty estimation with direct optimization guidance for model calibration. Specifically, our method designs a new uncertainty estimation reward function using the calibration metric, which is maximized to fine-tune an evidential learning pre-trained segmentation model for calibrating prediction risk. Importantly, we innovate an effective fine-grained parameter update scheme, which imposes fine-grained reward-weighting of each network parameter according to the parameter importance quantified by the fisher information matrix. To the best of our knowledge, this is the first work exploring reward optimization for model uncertainty estimation in safety-critical vision tasks. The effectiveness of our method is demonstrated on two large safety-critical surgical scene segmentation datasets under two different uncertainty estimation settings. With real-time one forward pass at inference, our method outperforms state-of-the-art methods by a clear margin on all the calibration metrics of uncertainty estimation, while maintaining a high task accuracy for the segmentation results. Code is available at https: //github. com/med-air/FGRM.

NeurIPS Conference 2022 Conference Paper

Finding Second-Order Stationary Points in Nonconvex-Strongly-Concave Minimax Optimization

  • Luo Luo
  • Yujun Li
  • Cheng Chen

We study the smooth minimax optimization problem $\min_{\bf x}\max_{\bf y} f({\bf x}, {\bf y})$, where $f$ is $\ell$-smooth, strongly-concave in ${\bf y}$ but possibly nonconvex in ${\bf x}$. Most of existing works focus on finding the first-order stationary point of the function $f({\bf x}, {\bf y})$ or its primal function $P({\bf x})\triangleq \max_{\bf y} f({\bf x}, {\bf y})$, but few of them focus on achieving the second-order stationary point, which is essential to nonconvex problems. In this paper, we propose a novel approach for minimax optimization, called Minimax Cubic Newton (MCN), which could find an ${\mathcal O}\left(\varepsilon, \kappa^{1. 5}\sqrt{\rho\varepsilon}\right)$-second-order stationary point of $P({\bf x})$ with calling ${\mathcal O}\left(\kappa^{1. 5}\sqrt{\rho}\varepsilon^{-1. 5}\right)$ times of second-order oracles and $\tilde{\mathcal O}\left(\kappa^{2}\sqrt{\rho}\varepsilon^{-1. 5}\right)$ times of first-order oracles, where $\kappa$ is the condition number and $\rho$ is the Lipschitz continuous constant for the Hessian of $f({\bf x}, {\bf y})$. In addition, we propose an inexact variant of MCN for high-dimensional problems to avoid calling the expensive second-order oracles. Instead, our method solves the cubic sub-problem inexactly via gradient descent and matrix Chebyshev expansion. This strategy still obtains the desired approximate second-order stationary point with high probability but only requires $\tilde{\mathcal O}\left(\kappa^{1. 5}\ell\varepsilon^{-2}\right)$ Hessian-vector oracle calls and $\tilde{\mathcal O}\left(\kappa^{2}\sqrt{\rho}\varepsilon^{-1. 5}\right)$ first-order oracle calls. To the best of our knowledge, this is the first work that considers the non-asymptotic convergence behavior of finding second-order stationary points for minimax problems without the convex-concave assumptions.

AAAI Conference 2022 Conference Paper

Self-Supervised Graph Neural Networks via Diverse and Interactive Message Passing

  • Liang Yang
  • Cheng Chen
  • Weixun Li
  • Bingxin Niu
  • Junhua Gu
  • Chuan Wang
  • Dongxiao He
  • Yuanfang Guo

By interpreting Graph Neural Networks (GNNs) as the message passing from the spatial perspective, their success is attributed to Laplacian smoothing. However, it also leads to serious over-smoothing issue by stacking many layers. Recently, many efforts have been paid to overcome this issue in semi-supervised learning. Unfortunately, it is more serious in unsupervised node representation learning task due to the lack of supervision information. Thus, most of the unsupervised or self-supervised GNNs often employ onelayer GCN as the encoder. Essentially, the over-smoothing issue is caused by the over-simplification of the existing message passing, which possesses two intrinsic limits: blind message and uniform passing. In this paper, a novel Diverse and Interactive Message Passing (DIMP) is proposed for selfsupervised learning by overcoming these limits. Firstly, to prevent the message from blindness and make it interactive between two connected nodes, the message is determined by both the two connected nodes instead of the attributes of one node. Secondly, to prevent the passing from uniformness and make it diverse over different attribute channels, different propagation weights are assigned to different elements in the message. To this end, a natural implementation of the message in DIMP is the element-wise product of the representations of two connected nodes. From the perspective of numerical optimization, the proposed DIMP is equivalent to performing an overlapping community detection via expectation-maximization (EM). Both the objective function of the community detection and the convergence of EM algorithm guarantee that DMIP can prevent from over-smoothing issue. Extensive evaluations on node-level and graph-level tasks demonstrate the superiority of DIMP on improving performance and overcoming over-smoothing issue.

AAAI Conference 2022 Conference Paper

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

  • Cheng Chen
  • Canzhe Zhao
  • Shuai Li

Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users’ click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the positionbased model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves O(log T) regret in the stochastic environment and O(m √ nT) regret in the adversarial environment, where T is the number of rounds, n is the number of items and m is the number of positions. We also provide a lower bound of order Ω(m √ nT) for adversarial PBM, which matches our upper bound and improves over the state-of-theart lower bound. The experiments show that our algorithm could simultaneously learn in both stochastic and adversarial environments and is competitive compared to existing methods that are designed for a single environment.

AAAI Conference 2022 Conference Paper

Single-Domain Generalization in Medical Image Segmentation via Test-Time Adaptation from Shape Dictionary

  • Quande Liu
  • Cheng Chen
  • Qi Dou
  • Pheng-Ann Heng

Domain generalization typically requires data from multiple source domains for model learning. However, such strong assumption may not always hold in practice, especially in medical field where the data sharing is highly concerned and sometimes prohibitive due to privacy issue. This paper studies the important yet challenging single domain generalization problem, in which a model is learned under the worstcase scenario with only one source domain to directly generalize to different unseen target domains. We present a novel approach to address this problem in medical image segmentation, which extracts and integrates the semantic shape prior information of segmentation that are invariant across domains and can be well-captured even from single domain data to facilitate segmentation under distribution shifts. Besides, a testtime adaptation strategy with dual-consistency regularization is further devised to promote dynamic incorporation of these shape priors under each unseen domain to improve model generalizability. Extensive experiments on two medical image segmentation tasks demonstrate the consistent improvements of our method across various unseen domains, as well as its superiority over state-of-the-art approaches in addressing domain generalization under the worst-case scenario.

AAAI Conference 2021 Conference Paper

C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling

  • Yutai Hou
  • Sanyuan Chen
  • Wanxiang Che
  • Cheng Chen
  • Ting Liu

Slot filling, a fundamental module of spoken language understanding, often suffers from insufficient quantity and diversity of training data. To remedy this, we propose a novel Cluster-to-Cluster generation framework for Data Augmentation (DA), named C2C-GenDA. It enlarges the training set by reconstructing existing utterances into alternative expressions while keeping semantic. Different from previous DA works that reconstruct utterances one by one independently, C2C-GenDA jointly encodes multiple existing utterances of the same semantics and simultaneously decodes multiple unseen expressions. Jointly generating multiple new utterances allows to consider the relations between generated instances and encourages diversity. Besides, encoding multiple existing utterances endows C2C with a wider view of existing expressions, helping to reduce generation that duplicates existing data. Experiments on ATIS and Snips datasets show that instances augmented by C2C-GenDA improve slot filling by 7. 99 (11. 9%↑) and 5. 76 (13. 6%↑) F-scores respectively, when there are only hundreds of training utterances. Code: https: //github. com/Sanyuan-Chen/C2C-DA.

AAAI Conference 2021 Conference Paper

Revisiting Co-Occurring Directions: Sharper Analysis and Efficient Algorithm for Sparse Matrices

  • Luo Luo
  • Cheng Chen
  • Guangzeng Xie
  • Haishan Ye

We study the streaming model for approximate matrix multiplication (AMM). We are interested in the scenario that the algorithm can only take one pass over the data with limited memory. The state-of-the-art deterministic sketching algorithm for streaming AMM is the co-occurring directions (COD), which has much smaller approximation errors than randomized algorithms and outperforms other deterministic sketching methods empirically. In this paper, we provide a tighter error bound for COD whose leading term considers the potential approximate low-rank structure and the correlation of input matrices. We prove COD is space optimal with respect to our improved error bound. We also propose a variant of COD for sparse matrices with theoretical guarantees. The experiments on realworld sparse datasets show that the proposed algorithm is more efficient than baseline methods.

IJCAI Conference 2020 Conference Paper

Efficient and Robust High-Dimensional Linear Contextual Bandits

  • Cheng Chen
  • Luo Luo
  • Weinan Zhang
  • Yong Yu
  • Yijiang Lian

The linear contextual bandits is a sequential decision-making problem where an agent decides among sequential actions given their corresponding contexts. Since large-scale data sets become more and more common, we study the linear contextual bandits in high-dimensional situations. Recent works focus on employing matrix sketching methods to accelerating contextual bandits. However, the matrix approximation error will bring additional terms to the regret bound. In this paper we first propose a novel matrix sketching method which is called Spectral Compensation Frequent Directions (SCFD). Then we propose an efficient approach for contextual bandits by adopting SCFD to approximate the covariance matrices. By maintaining and manipulating sketched matrices, our method only needs O(md) space and O(md) updating time in each round, where d is the dimensionality of the data and m is the sketching size. Theoretical analysis reveals that our method has better regret bounds than previous methods in high-dimensional cases. Experimental results demonstrate the effectiveness of our algorithm and verify our theoretical guarantees.

NeurIPS Conference 2020 Conference Paper

Efficient Projection-free Algorithms for Saddle Point Problems

  • Cheng Chen
  • Luo Luo
  • Weinan Zhang
  • Yong Yu

The Frank-Wolfe algorithm is a classic method for constrained optimization problems. It has recently been popular in many machine learning applications because its projection-free property leads to more efficient iterations. In this paper, we study projection-free algorithms for convex-strongly-concave saddle point problems with complicated constraints. Our method combines Conditional Gradient Sliding with Mirror-Prox and show that it only requires $\tilde{\cO}(1/\sqrt{\epsilon})$ gradient evaluations and $\tilde{\cO}(1/\epsilon^2)$ linear optimizations in the batch setting. We also extend our method to the stochastic setting and propose first stochastic projection-free algorithms for saddle point problems. Experimental results demonstrate the effectiveness of our algorithms and verify our theoretical guarantees.

IROS Conference 2020 Conference Paper

Hybrid fluidic actuation for a foam-based soft actuator

  • Jan Peters 0004
  • Bani Anvari
  • Cheng Chen
  • Zara Lim
  • Helge A. Wurdemann

Actuation means for soft robotic structures are manifold: despite actuation mechanisms such as tendon-driven manipulators or shape memory alloys, the majority of soft robotic actuators are fluidically actuated - either purely by positive or negative air pressure or by hydraulic actuation only. This paper presents the novel idea of employing hybrid fluidic - hydraulic and pneumatic - actuation for soft robotic systems. The concept and design of the hybrid actuation system as well as the fabrication of the soft actuator are presented: Polyvinyl Alcohol (PVA) foam is embedded inside a casted, reinforced silicone chamber. A hydraulic and pneumatic robotic syringe pump are connected to the base and top of the soft actuator. We found that a higher percentage of hydraulics resulted in a higher output force. Hydraulic actuation further is able to change displacements at a higher rate compared to pneumatic actuation. Changing between Hydraulic: Pneumatic (HP) ratios shows how stiffness properties of a soft actuator can be varied.

JMLR Journal 2019 Journal Article

Robust Frequent Directions with Application in Online Learning

  • Luo Luo
  • Cheng Chen
  • Zhihua Zhang
  • Wu-Jun Li
  • Tong Zhang

The frequent directions (FD) technique is a deterministic approach for online sketching that has many applications in machine learning. The conventional FD is a heuristic procedure that often outputs rank deficient matrices. To overcome the rank deficiency problem, we propose a new sketching strategy called robust frequent directions (RFD) by introducing a regularization term. RFD can be derived from an optimization problem. It updates the sketch matrix and the regularization term adaptively and jointly. RFD reduces the approximation error of FD without increasing the computational cost. We also apply RFD to online learning and propose an effective hyperparameter-free online Newton algorithm. We derive a regret bound for our online Newton algorithm based on RFD, which guarantees the robustness of the algorithm. The experimental studies demonstrate that the proposed method outperforms state-of-the-art second order online learning algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

AAAI Conference 2019 Conference Paper

Synergistic Image and Feature Adaptation: Towards Cross-Modality Domain Adaptation for Medical Image Segmentation

  • Cheng Chen
  • Qi Dou
  • Hao Chen
  • Jing Qin
  • Pheng-Ann Heng

This paper presents a novel unsupervised domain adaptation framework, called Synergistic Image and Feature Adaptation (SIFA), to effectively tackle the problem of domain shift. Domain adaptation has become an important and hot topic in recent studies on deep learning, aiming to recover performance degradation when applying the neural networks to new testing domains. Our proposed SIFA is an elegant learning diagram which presents synergistic fusion of adaptations from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features towards the segmentation task. The feature encoder layers are shared by both perspectives to grasp their mutual benefits during the end-to-end learning procedure. Without using any annotation from the target domain, the learning of our unified model is guided by adversarial losses, with multiple discriminators employed from various aspects. We have extensively validated our method with a challenging application of crossmodality medical image segmentation of cardiac structures. Experimental results demonstrate that our SIFA model recovers the degraded performance from 17. 2% to 73. 0%, and outperforms the state-of-the-art methods by a significant margin.

IJCAI Conference 2018 Conference Paper

Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss

  • Qi Dou
  • Cheng Ouyang
  • Cheng Chen
  • Hao Chen
  • Pheng-Ann Heng

Convolutional networks (ConvNets) have achieved great successes in various challenging vision tasks. However, the performance of ConvNets would degrade when encountering the domain shift. The domain adaptation is more significant while challenging in the field of biomedical image analysis, where cross-modality data have largely different distributions. Given that annotating the medical data is especially expensive, the supervised transfer learning approaches are not quite optimal. In this paper, we propose an unsupervised domain adaptation framework with adversarial learning for cross-modality biomedical image segmentations. Specifically, our model is based on a dilated fully convolutional network for pixel-wise prediction. Moreover, we build a plug-and-play domain adaptation module (DAM) to map the target input to features which are aligned with source domain feature space. A domain critic module (DCM) is set up for discriminating the feature space of both domains. We optimize the DAM and DCM via an adversarial loss without using any target domain label. Our proposed method is validated by adapting a ConvNet trained with MRI images to unpaired CT data for cardiac structures segmentations, and achieved very promising results.

IJCAI Conference 2017 Conference Paper

Tag-Aware Personalized Recommendation Using a Hybrid Deep Model

  • Zhenghua Xu
  • Thomas Lukasiewicz
  • Cheng Chen
  • Yishu Miao
  • Xiangwu Meng

Recently, many efforts have been put into tag-aware personalized recommendation. However, due to uncontrolled vocabularies, social tags are usually redundant, sparse, and ambiguous. In this paper, we propose a deep neural network approach to solve this problem by mapping the tag-based user and item profiles to an abstract deep feature space, where the deep-semantic similarities between users and their target items (resp. , irrelevant items) are maximized (resp. , minimized). To ensure the scalability in practice, we further propose to improve this model's training efficiency by using hybrid deep learning and negative sampling. Experimental results show that our approach can significantly outperform the state-of-the-art baselines in tag-aware personalized recommendation (3. 8 times better than the best baseline), and that using hybrid deep learning and negative sampling can dramatically enhance the model's training efficiency (hundreds of times quicker), while maintaining similar (and sometimes even better) training quality and recommendation performance.

IROS Conference 2015 Conference Paper

A Real-time relative probabilistic mapping algorithm for high-speed off-road autonomous driving

  • Cheng Chen
  • Yuqing He
  • Feng Gu 0004
  • Chunguang Bu
  • Jianda Han

Reliable mapping and hazard detection are prerequisites for autonomous navigation for unmanned ground vehicles. Because of the uncertainty and vibration induced by high-speed navigation and rugged terrain, the problem of mapping for high-speed off-road autonomous navigation has not been completely solved yet. A relative probabilistic mapping (RPM) algorithm is introduced to address the problem. Firstly, the relative probabilistic map is updated by Kalman filter and Gaussian Mixture algorithm based on the probabilistic exteroceptive measurements model. Then, terrain traversability is evaluated to identify obstacles in the map. Experiments on off-road high-speed autonomous vehicle, which suffers from severe vibration, with different sensor configurations are carried out to demonstrate the capability of the RPM algorithm.

ICRA Conference 2014 Conference Paper

Quartic Bézier curve based trajectory generation for autonomous vehicles with curvature and velocity constraints

  • Cheng Chen
  • Yuqing He
  • Chunguang Bu
  • Jianda Han
  • Xuebo Zhang

To generate local trajectory between initial states and target states for autonomous vehicles, a feasible trajectory generation algorithm based on quartic Bézier curve is proposed. The problem of trajectory generation is firstly separated into generating continuous and bounded curvature profile to shape the trajectory and generating linear velocity profile to execute the trajectory. The curvature profile generation is further converted to an optimization problem with only 3 parameters owing to the specific properties of quartic Bézier curve. Sequential quadratic programming is employed to find optimal solution with respect to specific objective function. To avoid sideslip and ensure velocity-continuity and acceleration limits, the framework of linear velocity profile generation is also proposed. A simple profile with constant acceleration is also provided as an example. Simulation results on lane keeping and changing and path following demonstrate the capability and the real-time performance of the proposed algorithm.

IS Journal 2011 Journal Article

Agent Recommendation for Agent-Based Urban-Transportation Systems

  • Cheng Chen
  • Shuang Shuang Li
  • Bo Chen
  • Ding Wen

Mobile-agent technology has been adopted in many transportation fields to take advantages of different agents to deal with dynamic changes and uncertainty in traffic environments. However, few research studies have been conducted in urban-transportation systems on decision making about what kind of agents to be used in coping with a specific traffic states. With the increasing availability of control and service agents for agent-based urban-transportation systems, an agent recommendation system is necessary to manage and select those agents so original objectives can be fulfilled. In this article, the authors address issues related to the creation of such a platform.

IS Journal 2011 Journal Article

Cloud Computing for Agent-Based Urban Transportation Systems

  • ZhenJiang Li
  • Cheng Chen
  • Kai Wang

Agent-based traffic management systems can use the autonomy, mobility, and adaptability of mobile agents to deal with dynamic traffic environments. Cloud computing can help such systems cope with the large amounts of storage and computing resources required to effectively use of traffic strategy agents and mass transport data. This article reviews the history of the development of traffic control and management systems within the evolving computing paradigm and shows the state of traffic control and management systems based on mobile multiagent technology. An intelligent transportation cloud could provide services such as decision support, a standard development environment for traffic management strategy, and so on. Moreover, the cloud can generate, store, manage, test, optimize, and use mobile traffic strategy agents to maximize advantages of cloud computing and agent technology to effectively control and manage urban-traffic systems.

IROS Conference 2008 Conference Paper

Pattern recognition for loosely-coupled GPS/odometer fusion

  • Cheng Chen
  • Javier Ibañez-Guzmán
  • Olivier Le-Marchand

Conventionally GPS receivers and odometers are used in localization systems for ground vehicles/robots due to cost constraints. When these are deployed in urban conditions, multi-path and wheel slippage often result in large localization estimation errors. In this paper, pattern recognition techniques are employed to improve the localization estimates of a loosely coupled GPS-odometer solution. The presented method filters out from the fusion process false GPS estimates and uses extensively information on the vehicle ego-state. The approach comprises three phases. First, a detection algorithm is used to recognize likely false GPS estimates, which are then excluded from Kalman Filter updates. Second, we model the vehicle motion as a weighted sum of individual maneuvers. These are processed by a multiple model Kalman Filter to improve accuracy. Third, a maneuver recognition algorithm is used to select automatically the type of motion taken by the vehicle. The performance of our localization system has been evaluated in a quantitative manner by comparing it with a reference trajectory. This reference trajectory is estimated by a localization system based on high-grade GPS-IMU-odometer. Extensive trials were performed in different real traffic conditions; results have validated the approach and demonstrated tremendous potential.

ICRA Conference 2006 Conference Paper

Large-scale Loop-closing with Pictorial Matching

  • Cheng Chen
  • Han Wang 0001

This paper presents a mapping method that can accurately map large environment with one single robot by visiting the environment for only once, and the resulting map can provide thorough 3D description for the environment in a predefined global coordinate. Our first contribution is to represent the map as a collection of submaps arranged in a deformable configuration, and to perform loop-closing by registering this submap configuration to an aerial image. The second contribution is to introduce the active contour technique to the SLAM domain, so that the registration is efficiently solved in an iterative energy minimization process. The constraints from robot mapping are modeled as forces trying to keep the submaps consistent to each other, while the pictorial matching is represented by forces guiding submaps to a globally consistent configuration. In the experiment, we demonstrate the proposed algorithm's capability to close a 1, 890 meters with only one visiting. The result is compared with ground truth, and high accuracy is observed

IROS Conference 2005 Conference Paper

Appearance-based topological Bayesian inference for loop-closing detection in cross-country environment

  • Cheng Chen
  • Han Wang 0001

In this paper, an appearance-based environment modelling technique is presented. Based on this approach, the probabilistic Bayesian inference can work together with symbolic topological map to re-localize a mobile robot. One prominent advantage offered by this algorithm is that, it can be applied to cross-country environment where no features or landmarks are available. Furthermore, the loop-closing can be detected independent of estimated map and vehicle location. High dimensional laser measurements are projected into a low dimensional space (mapspace) which describes the appearance of the environment. Since laser scans from the same region share the similar appearance, after the projection, they are expected to form a distinct cluster in the low dimensional space. This small cluster essentially encodes the appearance information of the specific region in the environment, and it can be approximated by a Gaussian distribution. This Gaussian model can serve as the 'joint' between the topological map structure and the probabilistic Bayesian inference. By employing such 'joints', the Bayesian inference in the metric level can be conveniently implemented on topological level. Based on appearance, the proposed inference process is thus completely independent of local metric features. Extensive experiments were conducted using a tracked vehicle travelling in an open jungle environments. Results from live runs verified the feasibility of the proposed methods to detect loop-closing. The performances are also given and thoroughly analyzed.

ICRA Conference 2004 Conference Paper

Vehicle Following with Obstacle Avoidance Capabilities in Natural Environments

  • Teck Chew Ng
  • Javier Ibañez-Guzmán
  • Jian Shen
  • Zhiming Gong
  • Han Wang 0001
  • Cheng Chen

A robust vehicle following system with obstacle avoidance capabilities for operation in natural environments is described in this paper. By combining a novel vehicle-tracking and detection algorithm with our path-planner for autonomous navigation, it was possible for a tracked logistics armoured ambulance carrier to follow a multi purpose vehicle in an equatorial jungle where few non-paved roads and markers exist. With this new approach, vehicle following performance is enhanced and vehicle safety ensured. Field trials performed in tropical jungle conditions have demonstrated the validity of the approach; results from the field works are included and discussed in this paper.

IROS Conference 2000 Conference Paper

e-service robot in home healthcare

  • Max Q. -H. Meng
  • Cheng Chen
  • Peter X. Liu
  • Ming Rao

As the population-aging problem is increasingly pressing on society, and the associated healthcare costs are taking up an incremental percentage of the GNP, various inexpensive support systems for elderly people staying alone at home are becoming very demanding. Fortunately, as the Internet continues to expand exponentially and accesses to the Internet become more prevalent in our daily life, home healthcare systems based on teleoperated mobile robot platform via the Internet become feasible. In this paper, we describe a feasibility study and a basic platform design of a teleoperated home healthcare system via the Internet, which include a literature review of the state of the art in research on this topic, discussions on some open problems and challenges facing researchers in this area and a basic platform design of a home healthcare system via the Internet current under implementation in our research lab.