Arrow Research search

Author name cluster

Xin Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

79 papers
2 author rows

Possible papers

79

AAAI Conference 2026 Conference Paper

G-IR: Geometric Image Representation for Learning

  • Xin Chen
  • Qi Zhao
  • Wei Zeng
  • Zongben Xu

Images are generally represented by pixel intensities or color values, which are usually used as direct inputs for learning. This study innovatively proposes a geometric image representation method and refreshes the general learning model (e.g., autoencoder) in the diffeomorphic space. Based on the theory of geometric optimal transport and quasiconformal mapping, we equivalently transform the intensity representation into a shape representation. The image space becomes a diffeomorphic space, where any image can be uniquely represented as a Beltrami coefficient function defined on a uniform grid reference, and vice versa. This innovative geometric image representation (G-IR) captures the fine-grained structure inherent in the entire image, which is different from the traditional feature extraction that focuses on the internal geometric objects of the image (such as boundaries and axes). The diffeomorphic property preserves structure in the generation process, which is very necessary in the field of real physics. It can be assembled into existing pipelines as a plug-in, providing structure-preserving properties for the entire framework. Experiments on image restoration and interpolation validated the high efficiency, efficacy and applicability of the G-IR method, demonstrating its superior performance compared to common pixel-level image appearance representations.

AAAI Conference 2026 Conference Paper

Parallelizable Riemannian Alternating Direction Method of Multipliers for Non-convex Pose Graph Optimization

  • Xin Chen
  • Chunfeng Cui
  • Deren Han
  • Liqun Qi

Pose graph optimization (PGO) is fundamental to robot perception and navigation systems, serving as the mathematical backbone for solving simultaneous localization and mapping (SLAM). Existing solvers suffer from polynomial growth in computational complexity with graph size, hindering real-time deployment in large-scale scenarios. In this paper, by duplicating variables and introducing equality constraints, we reformulate the problem and propose a Parallelizable Riemannian Alternating Direction Method of Multipliers (PRADMM) to solve it efficiently. Compared with the state-of-the-art methods that usually exhibit polynomial time complexity growth with graph size, PRADMM enables efficient parallel computation across vertices regardless of graph size. Crucially, all subproblems admit closed-form solutions, ensuring PRADMM maintains exceptionally stable performance. Furthermore, by carefully exploiting the structures of the coefficient matrices in the constraints, we establish the global convergence of PRADMM under mild conditions, enabling larger relaxation step sizes within the interval (0,2). Extensive empirical validation on two synthetic datasets and multiple real-world 3D SLAM benchmarks confirms the superior computational performance of PRADMM.

AAAI Conference 2026 Conference Paper

PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis

  • Jiao Xu
  • Junwei Liu
  • Jiangwei Lao
  • Qi Zhu
  • Yunpeng Zhao
  • Congyun Jin
  • Shinan Liu
  • Zhihong Lu

Recent advances in medical multi-modal models focus on specialized image analysis like dermatology, pathology, or radiology. However, they do not fully capture the complexity of real-world clinical diagnostics, which involve heterogeneous inputs and require ongoing contextual understanding during patient-physician interactions. To bridge this gap, we introduce PulseMind, a new family of multi-modal diagnostic models that integrates a systematically curated dataset, a comprehensive evaluation benchmark, and a tailored training framework. Specifically, we first construct a diagnostic dataset, MediScope, which comprises 98,000 real-world multi-turn consultations and 601,500 medical images, spanning over 10 major clinical departments and more than 200 sub-specialties. Then, to better reflect the requirements of real-world clinical diagnosis, we develop the PulseMind Benchmark, a multi-turn diagnostic consultation benchmark with a four-dimensional evaluation protocol comprising proactiveness, accuracy, usefulness, and language quality. Finally, we design a training framework tailored for multi-modal clinical diagnostics, centered around a core component named Comparison-based Reinforcement Policy Optimization (CRPO). Compared to absolute score rewards, CRPO uses relative preference signals from multi-dimensional comparisons to provide stable and human-aligned training guidance. Extensive experiments demonstrate that PulseMind achieves competitive performance on both the diagnostic consultation benchmark and public medical benchmarks.

JBHI Journal 2026 Journal Article

Source-Resilient Joint Learning Framework for Preserving Stable Generalization on Diverse Ultrasonic Source Scenarios

  • Bin Huang
  • Zhong Liu
  • Ziyue Xu
  • Shing-Chow Chan
  • Huiying Wen
  • Chao HOU
  • Qicai Huang
  • Meiqin Jiang

Joint learning on diverse ultrasonic source scenarios presents a challenge in preserving stable generalization due to the combination of heterogeneity of different sources and the inconsistency of joint learning features. Previous joint learning studies, which are not source-resilient frameworks, may not preserve stable generalization when trained on diverse source scenarios. Furthermore, the limited variations in single-source data and the interference from ultrasound imaging, which are common in ultrasonic source scenarios, further decrease generalization. To address these problems, we proposed a source-resilient joint learning framework consisting of three stages: 1) Source transforming, where our 1-to-N transformation unifies diverse source scenarios for source-resiliency. 2) Our feature enhancement modules model the source-resilient joint learning network, including a manifold-constraint normalization module (MCNM) for addressing heterogeneity by minimizing manifold-based loss, a task-consistent attention module (TCAM) shares the multi-scale features with self-attention to address inconsistency, and an adaptive feature-shifting module (AFSM) for feature-level augmentation to overcome single-source data. 3) Our ultrasound-hybrid linear mapping (USmapping) cascades speckle randomization and mask-guiding Monge-Kantorovitch linear mapping to achieve ultrasonic style randomization for addressing the interference of ultrasonic data. Our framework was evaluated on eight ultrasound datasets from various scanners at multiple centers and surpassed previous comparable studies in both segmentation (DSC $WAvg$ of 75. 7%) and classification (AUROC $WAvg$ of 68. 8%) tasks. Our framework has the potential to serve as a general framework for enhancing the performance of joint learning under diverse ultrasonic source scenarios.

IROS Conference 2025 Conference Paper

ChatBuilder: LLM-assisted Modular Robot Creation

  • Xin Chen
  • Xifeng Gao
  • Lifeng Zhu
  • Aiguo Song
  • Zherong Pan

Modular robotic structures simplify robot design and manufacturing by using standardized modules, enhancing flexibility and adaptability. However, the need for manual input in design and assembly limit their potential. Current methods to automate this process still require significant human effort and technical expertise. This paper introduces a novel approach that employs Large Language Models (LLMs) as intelligent agents to automate the creation of modular robotic structures. We decompose the modular robot creation task and develop two agents based on LLM to plan and assemble the modular robots from text prompts. By inputting a textual description, users can generate robot designs that are validated in both simulated and real-world environments. This method reduces the need for manual intervention and lowers the technical barrier to creating complex robotic systems.

AAAI Conference 2025 Conference Paper

CohEx: A Generalized Framework for Cohort Explanation

  • Fanyu Meng
  • Xin Liu
  • Zhaodan Kong
  • Xin Chen

eXplainable Artificial Intelligence (XAI) has garnered significant attention for enhancing transparency and trust in machine learning models. However, the scopes of most existing explanation techniques focus either on offering a holistic view of the explainee model (global explanation) or on individual instances (local explanation), while the middle ground, i.e., cohort-based explanation, is less explored. Cohort explanations offer insights into the explainee's behavior on a specific group or cohort of instances, enabling a deeper understanding of model decisions within a defined context. In this paper, we discuss the unique challenges and opportunities associated with measuring cohort explanations, define their desired properties, and create a generalized framework for generating cohort explanations based on supervised clustering.

ICML Conference 2025 Conference Paper

Disentangled Graph Spectral Domain Adaptation

  • Liang Yang 0002
  • Xin Chen
  • Jiaming Zhuo
  • Di Jin 0001
  • Chuan Wang 0002
  • Xiaochun Cao
  • Zhen Wang 0004
  • Yuanfang Guo

The distribution shifts and the scarcity of labels prevent graph learning methods, especially graph neural networks (GNNs), from generalizing across domains. Compared to Unsupervised Domain Adaptation (UDA) with embedding alignment, Unsupervised Graph Domain Adaptation (UGDA) becomes more challenging in light of the attribute and topology entanglement in the representation. Beyond embedding alignment, UGDA turns to topology alignment but is limited by the ability of the employed topology model and the estimation of pseudo labels. To alleviate this issue, this paper proposed a Disentangled Graph Spectral Domain adaptation (DGSDA) by disentangling attribute and topology alignments and directly aligning flexible graph spectral filters beyond topology. Specifically, Bernstein polynomial approximation, which mimics the behavior of the function to be approximated to a remarkable degree, is employed to capture complicated topology characteristics and avoid the expensive eigenvalue decomposition. Theoretical analysis reveals the tight GDA bound of DGSDA and the rationality of polynomial coefficient regularization. Quantitative and qualitative experiments justify the superiority of the proposed DGSDA.

NeurIPS Conference 2025 Conference Paper

DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel

  • Alan J. X. Guo
  • Sihan Sun
  • Xiang Wei
  • Mengyi Wei
  • Xin Chen

With the emergence of new storage and communication methods, the insertion, deletion, and substitution (IDS) channel has attracted considerable attention. However, many topics on the IDS channel and the associated Levenshtein distance remain open, making the invention of a novel IDS-correcting code a hard task. Furthermore, current studies on single-IDS-correcting code misalign with the requirements of applications which necessitates the correcting of multiple errors. Compromise solutions have involved shortening codewords to reduce the chance of multiple errors. However, the code rates of existing codes are poor at short lengths, diminishing the overall storage density. In this study, a novel method is introduced for designing high-code-rate single-IDS-correcting codewords through deep Levenshtein distance embedding. A deep learning model is utilized to project the sequences into embedding vectors that preserve the Levenshtein distances between the original sequences. This embedding space serves as a proxy for the complex Levenshtein domain, within which algorithms for codeword search and segment correcting is developed. While the concept underpinning this approach is straightforward, it bypasses the mathematical challenges typically encountered in code design. The proposed method results in a code rate that outperforms existing combinatorial solutions, particularly for designing short-length codewords.

AAAI Conference 2025 Conference Paper

ERL-MPP: Evolutionary Reinforcement Learning with Multi-head Puzzle Perception for Solving Large-scale Jigsaw Puzzles of Eroded Gaps

  • Xingke Song
  • Xiaoying Yang
  • Chenglin Yao
  • Jianfeng Ren
  • Ruibin Bai
  • Xin Chen
  • Xudong Jiang

Solving jigsaw puzzles has been extensively studied. While most existing models focus on solving either small-scale puzzles or puzzles with no gap between fragments, solving large-scale puzzles with gaps presents distinctive challenges in both image understanding and combinatorial optimization. To tackle these challenges, we propose a framework of Evolutionary Reinforcement Learning with Multi-head Puzzle Perception (ERL-MPP) to derive a better set of swapping actions for solving the puzzles. Specifically, to tackle the challenges of perceiving the puzzle with gaps, a Multi-head Puzzle Perception Network (MPPN) with a shared encoder is designed, where multiple puzzlet heads comprehensively perceive the local assembly status, and a discriminator head provides a global assessment of the puzzle. To explore the large swapping action space efficiently, an Evolutionary Reinforcement Learning (EvoRL) agent is designed, where an actor recommends a set of suitable swapping actions from a large action space based on the perceived puzzle status, a critic updates the actor using the estimated rewards and the puzzle status, and an evaluator coupled with evolutionary strategies evolves the actions aligning with the historical assembly experience. The proposed ERL-MPP is comprehensively evaluated on the JPLEG-5 dataset with large gaps and the MIT dataset with large-scale puzzles. It significantly outperforms all state-of-the-art models on both datasets.

AAAI Conference 2025 Conference Paper

Exploring Enhanced Contextual Information for Video-Level Object Tracking

  • Ben Kang
  • Xin Chen
  • Simiao Lai
  • Yang Liu
  • Yi Liu
  • Dong Wang

Contextual information at the video level has become increasingly crucial for visual object tracking. However, existing methods typically use only a few tokens to convey this information, which can lead to information loss and limit their ability to fully capture the context. To address this issue, we propose a new video-level visual object tracking framework called MCITrack. It leverages Mamba's hidden states to continuously record and transmit extensive contextual information throughout the video stream, resulting in more robust object tracking. The core component of MCITrack is the Contextual Information Fusion module, which consists of the mamba layer and the cross-attention layer. The mamba layer stores historical contextual information, while the cross-attention layer integrates this information into the current visual features of each backbone block. This module enhances the model's ability to capture and utilize contextual information at multiple levels through deep integration with the backbone. Experiments demonstrate that MCITrack achieves competitive performance across numerous benchmarks. For instance, it gets 76.6% AUC on LaSOT and 80.0% AO on GOT-10k, establishing a new state-of-the-art performance.

JBHI Journal 2025 Journal Article

Fluid Intake Action Detection Based on Egocentric Videos and YOLOv8 Models

  • Xin Chen
  • Xinqi Bao
  • Ernest N. Kamavuako

Dehydration in older adults poses significant health risks, requiring effective monitoring solutions. This study addresses the challenge of detecting fluid intake accurately using a first-person, vision-based approach with wearable cameras and advanced object detection models. We developed a comprehensive dataset comprising 17 hours of drinking footage (∼3100 events) and 15 hours of non-drinking activities (∼3600 events) recorded as interference, from 36 participants, collected between October 2022 and January 2023 at King's College London. We include various container types and daily activities to enhance the model's robustness and generalizability. YOLOv8 models were used to detect drinking-related objects, and a mechanism was developed to analyse the size and position of the detection output to identify hand-container interactions and movements. The models achieved mAP@50 over 0. 97 and F1-score over 0. 95 in detecting drinking-related objects. Action detection testing results from video streams demonstrated an F1-score of 0. 917, which dropped to 0. 863 when interference activities were added. Additionally, the model detected the start of drinking activities with an average latency of 0. 24 seconds and the end with 0. 04 seconds, indicating high temporal accuracy. These results demonstrate the feasibility of egocentric, vision-based fluid-intake detection and its potential application in preventing dehydration. To our knowledge, this is the first vision-based dataset focusing on fluid-intake actions from a first-person viewpoint—offering a novel foundation for advancing hydration monitoring in older adults and various real-world contexts.

ICML Conference 2025 Conference Paper

La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation

  • Kai Liu
  • Bowen Xu
  • Shaoyu Wu
  • Xin Chen
  • Hao Zhou
  • Yongliang Tao
  • Lulu Hu

Activation sparsity can reduce the computational overhead and memory transfers during the forward pass of Large Language Model (LLM) inference. Existing methods face limitations, either demanding time-consuming recovery training that hinders real-world adoption, or relying on empirical magnitude-based pruning, which causes fluctuating sparsity and unstable inference speed-up. This paper introduces LaRoSA ( La yerwise Ro tated S parse A ctivation), a novel method for activation sparsification designed to improve LLM efficiency without requiring additional training or magnitude-based pruning. We leverage layerwise orthogonal rotations to transform input activations into rotated forms that are more suitable for sparsification. By employing a Top-K selection approach within the rotated activations, we achieve consistent model-level sparsity and reliable wall-clock time speed-up. LaRoSA is effective across various sizes and types of LLMs, demonstrating minimal performance degradation and robust inference acceleration. Specifically, for LLaMA2-7B at 40% sparsity, LaRoSA achieves a mere 0. 17 perplexity gap with a consistent 1. 30$\times$ wall-clock time speed-up, and reduces the accuracy gap in zero-shot tasks compared to the dense model to just 0. 54%, while surpassing TEAL by 1. 77% and CATS by 17. 14%.

ICML Conference 2025 Conference Paper

Learning Safety Constraints for Large Language Models

  • Xin Chen
  • Yarden As
  • Andreas Krause 0001

Large language models (LLMs) have emerged as powerful tools but pose significant safety risks through harmful outputs and vulnerability to adversarial attacks. We propose SaP–short for Safety Polytope–a geometric approach to LLM safety, that learns and enforces multiple safety constraints directly in the model’s representation space. We develop a framework that identifies safe and unsafe regions via the polytope’s facets, enabling both detection and correction of unsafe outputs through geometric steering. Unlike existing approaches that modify model weights, SaP operates post-hoc in the representation space, preserving model capabilities while enforcing safety constraints. Experiments across multiple LLMs demonstrate that our method can effectively detect unethical inputs, reduce adversarial attack success rates while maintaining performance on standard tasks, thus highlighting the importance of having an explicit geometric model for safety. Analysis of the learned polytope facets reveals emergence of specialization in detecting different semantic notions of safety, providing interpretable insights into how safety is captured in LLMs’ representation space.

NeurIPS Conference 2025 Conference Paper

MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation

  • Yang Han
  • Pengyu Wang
  • Kai Yu
  • Xin Chen
  • Lu Chen

Mass spectrometry (MS) plays a critical role in molecular identification, significantly advancing scientific discovery. However, structure elucidation from MS data remains challenging due to the scarcity of annotated spectra. While large-scale pretraining has proven effective in addressing data scarcity in other domains, applying this paradigm to mass spectrometry is hindered by the complexity and heterogeneity of raw spectral signals. To address this, we propose MS-BART, a unified modeling framework that maps mass spectra and molecular structures into a shared token vocabulary, enabling cross-modal learning through large-scale pretraining on reliably computed fingerprint–molecule datasets. Multi-task pretraining objectives further enhance MS-BART's generalization by jointly optimizing denoising and translation task. The pretrained model is subsequently transferred to experimental spectra through finetuning on fingerprint predictions generated with MIST, a pre-trained spectral inference model, thereby enhancing robustness to real-world spectral variability. While finetuning alleviates the distributional difference, MS-BART still suffers molecular hallucination and requires further alignment. We therefore introduce a chemical feedback mechanism that guides the model toward generating molecules closer to the reference structure. Extensive evaluations demonstrate that MS-BART achieves SOTA performance across 5/12 key metrics on MassSpecGym and NPLIB1 and is faster by one order of magnitude than competing diffusion-based methods, while comprehensive ablation studies systematically validate the model's effectiveness and robustness. We provide the data and code at https: //github. com/OpenDFM/MS-BART.

AAAI Conference 2025 Conference Paper

SUTrack: Towards Simple and Unified Single Object Tracking

  • Xin Chen
  • Ben Kang
  • Wanting Geng
  • Jiawen Zhu
  • Yi Liu
  • Dong Wang
  • Huchuan Lu

In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a single session. Due to the distinct nature of the data, current methods typically design individual architectures and train separate models for each task. This fragmentation results in redundant training processes, repetitive technological innovations, and limited cross-modal knowledge sharing. In contrast, SUTrack demonstrates that a single model with a unified input representation can effectively handle various SOT tasks, eliminating the need for task-specific designs and separate training sessions. Additionally, we introduce a task-recognition training strategy and a soft token type embedding to further enhance SUTrack's performance with minimal overhead. Experiments show that SUTrack outperforms previous task-specific counterparts across 11 datasets spanning five SOT tasks. Moreover, we provide a range of models catering edge devices as well as high-performance GPUs, striking a good trade-off between speed and accuracy. We hope SUTrack could serve as a strong foundation for further compelling research into unified tracking models.

AAAI Conference 2025 Conference Paper

Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking

  • Jiawen Zhu
  • Huayi Tang
  • Xin Chen
  • Xinying Wang
  • Dong Wang
  • Huchuan Lu

Efficient tracking has garnered attention for its ability to operate on resource-constrained platforms for real-world deployment beyond desktop GPUs. Current efficient trackers mainly follow precision-oriented trackers, adopting a one-stream framework with lightweight modules. However, blindly adhering to the one-stream paradigm may not be optimal, as incorporating template computation in every frame leads to redundancy, and pervasive semantic interaction between template and search region places stress on edge devices. In this work, we propose a novel asymmetric Siamese tracker named AsymTrack for efficient tracking. AsymTrack disentangles template and search streams into separate branches, with template computing only once during initialization to generate modulation signals. Building on this architecture, we devise an efficient template modulation mechanism to unidirectional inject crucial cues into the search features, and design an object perception enhancement module that integrates abstract semantics and local details to overcome the limited representation in lightweight tracker. Extensive experiments demonstrate that AsymTrack offers superior speed-precision trade-offs across different platforms compared to the current state-of-the-arts. For instance, AsymTrack-T achieves 60.8% AUC on LaSOT and 224/81/84 FPS on GPU/CPU/AGX, surpassing HiT-Tiny by 6.0% AUC with higher speeds.

NeurIPS Conference 2024 Conference Paper

3DET-Mamba: Causal Sequence Modelling for End-to-End 3D Object Detection

  • Mingsheng Li
  • Jiakang Yuan
  • Sijin Chen
  • Lin Zhang
  • Anyu Zhu
  • Xin Chen
  • Tao Chen

Transformer-based architectures have been proven successful in detecting 3D objects from point clouds. However, the quadratic complexity of the attention mechanism struggles to encode rich information as point cloud resolution increases. Recently, state space models (SSM) such as Mamba have gained great attention due to their linear complexity and long sequence modeling ability for language understanding. To exploit the potential of Mamba on 3D scene-level perception, for the first time, we propose 3DET-Mamba, which is a novel SSM-based model designed for indoor 3d object detection. Specifically, we divide the point cloud into different patches and use a lightweight yet effective Inner Mamba to capture local geometric information. To observe the scene from a global perspective, we introduce a novel Dual Mamba module that models the point cloud in terms of spatial distribution and continuity. Additionally, we design a Query-aware Mamba module that decodes context features into object sets under the guidance of learnable queries. Extensive experiments demonstrate that 3DET-Mamba surpasses previous 3DETR on indoor 3D detection benchmarks such as ScanNet, improving AP25/AP50 from 65. 0\%/47. 0\% to 70. 4\%/54. 4\%, respectively.

NeurIPS Conference 2024 Conference Paper

Achieving Optimal Clustering in Gaussian Mixture Models with Anisotropic Covariance Structures

  • Xin Chen
  • Anderson Ye Zhang

We study clustering under anisotropic Gaussian Mixture Models (GMMs), where covariance matrices from different clusters are unknown and are not necessarily the identity matrix. We analyze two anisotropic scenarios: homogeneous, with identical covariance matrices, and heterogeneous, with distinct matrices per cluster. For these models, we derive minimax lower bounds that illustrate the critical influence of covariance structures on clustering accuracy. To solve the clustering problem, we consider a variant of Lloyd's algorithm, adapted to estimate and utilize covariance information iteratively. We prove that the adjusted algorithm not only achieves the minimax optimality but also converges within a logarithmic number of iterations, thus bridging the gap between theoretical guarantees and practical efficiency.

NeurIPS Conference 2024 Conference Paper

Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

  • Yimeng Zhang
  • Xin Chen
  • Jinghan Jia
  • Yihua Zhang
  • Chongyu Fan
  • Jiancheng Liu
  • Mingyi Hong
  • Ke Ding

Diffusion models (DMs) have achieved remarkable success in text-to-image generation, but they also pose safety risks, such as the potential generation of harmful content and copyright violations. The techniques of machine unlearning, also known as concept erasing, have been developed to address these risks. However, these techniques remain vulnerable to adversarial prompt attacks, which can prompt DMs post-unlearning to regenerate undesired images containing concepts (such as nudity) meant to be erased. This work aims to enhance the robustness of concept erasing by integrating the principle of adversarial training (AT) into machine unlearning, resulting in the robust unlearning framework referred to as AdvUnlearn. However, achieving this effectively and efficiently is highly nontrivial. First, we find that a straightforward implementation of AT compromises DMs’ image generation quality post-unlearning. To address this, we develop a utility-retaining regularization on an additional retain set, optimizing the trade-off between concept erasure robustness and model utility in AdvUnlearn. Moreover, we identify the text encoder as a more suitable module for robustification compared to UNet, ensuring unlearning effectiveness. And the acquired text encoder can serve as a plug-and-play robust unlearner for various DM types. Empirically, we perform extensive experiments to demonstrate the robustness advantage of AdvUnlearn across various DM unlearning scenarios, including the erasure of nudity, objects, and style concepts. In addition to robustness, AdvUnlearn also achieves a balanced tradeoff with model utility. To our knowledge, this is the first work to systematically explore robust DM unlearning through AT, setting it apart from existing methods that overlook robustness in concept erasing. Codes are available at https: //github. com/OPTML-Group/AdvUnlearn. Warning: This paper contains model outputs that may be offensive in nature.

UAI Conference 2024 Conference Paper

Gradient descent in matrix factorization: Understanding large initialization

  • Hengchao Chen
  • Xin Chen
  • Mohamad Elmasri
  • Qiang Sun

Gradient Descent (GD) has been proven effective in solving various matrix factorization problems. However, its optimization behavior with large initial values remains less understood. To address this gap, this paper presents a novel theoretical framework for examining the convergence trajectory of GD with a large initialization. The framework is grounded in signal-to-noise ratio concepts and inductive arguments. The results uncover an implicit incremental learning phenomenon in GD and offer a deeper understanding of its performance in large initialization scenarios.

JBHI Journal 2024 Journal Article

Improving Tumor Classification by Reusing Self-Predicted Segmentation of Medical Images as Guiding Knowledge

  • Xiaoyi Lin
  • Mingyu Wang
  • Fei Li
  • Ziyue Xu
  • Jia Chen
  • Xin Chen
  • Chenglang Yuan
  • Songxiong Wu

Differential diagnosis of tumors is important for computer-aided diagnosis. In computer-aided diagnosis systems, expert knowledge of lesion segmentation masks is limited as it is only used during preprocessing or as supervision to guide feature extraction. To improve the utilization of lesion segmentation masks, this study proposes a simple and effective multitask learning network that improves medical image classification using self-predicted segmentation as guiding knowledge; we call this network RS $^{2}$ -net. In RS $^{2}$ -net, the predicted segmentation probability map obtained from the initial segmentation inference is added to the original image to form a new input, which is then reinput to the network for the final classification inference. We validated the proposed RS $^{2}$ -net using three datasets: the pNENs-Grade dataset, which tested the prediction of pancreatic neuroendocrine neoplasm grading, and the HCC-MVI dataset, which tested the prediction of microvascular invasion of hepatocellular carcinoma, and ISIC 2017 public skin lesion dataset. The experimental results indicate that the proposed strategy of reusing self-predicted segmentation is effective, and RS $^{2}$ -net outperforms other popular networks and existing state-of-the-art studies. Interpretive analytics based on feature visualization demonstrates that the improved classification performance of our reuse strategy is due to the semantic information that can be acquired in advance in a shallow network.

IJCAI Conference 2024 Conference Paper

LocMoE: A Low-overhead MoE for Large Language Model Training

  • Jing Li
  • Zhijie Sun
  • Xuan He
  • Li Zeng
  • Yi Lin
  • Entong Li
  • Binfan Zheng
  • Rongqian Zhao

The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All communication, along with relatively redundant computation owing to large expert capacity. Load imbalance may result from existing routing policies that consistently tend to select certain experts. The frequent inter-node communication in the All-to-All procedure also significantly prolongs the training time. To alleviate the above performance problems, we propose a novel routing strategy that combines load balance and locality by converting partial inter-node communication to that of intra-node. Notably, we elucidate that there is a minimum threshold for expert capacity, calculated through the maximal angular deviation between the gating weights of the experts and the assigned tokens. We port these modifications on the PanGu-Σ model based on the MindSpore framework with multi-level routing and conduct experiments on Ascend clusters. The experiment results demonstrate that the proposed LocMoE reduces training time per epoch by 12. 68% to 22. 24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.

NeurIPS Conference 2024 Conference Paper

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

  • Sijin Chen
  • Xin Chen
  • Anqi Pang
  • Xianfang Zeng
  • Wei Cheng
  • Yijun Fu
  • Fukun Yin
  • Zhibin Wang

The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation process can be seamlessly treated as an auto-regressive problem. In this paper, we validate Neural Coordinate Field (NeurCF), an explicit coordinate representation with implicit neural embeddings, is a simple-yet-effective representation for large-scale sequential mesh modeling. After that, we present MeshXL, a family of generative pre-trained auto-regressive models that addresses 3D mesh generation with modern large language model approaches. Extensive experiments show that MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.

AAAI Conference 2024 Conference Paper

Plug-In Diffusion Model for Sequential Recommendation

  • Haokai Ma
  • Ruobing Xie
  • Lei Meng
  • Xin Chen
  • Xu Zhang
  • Leyu Lin
  • Zhanhui Kang

Pioneering efforts have verified the effectiveness of the diffusion models in exploring the informative uncertainty for recommendation. Considering the difference between recommendation and image synthesis tasks, existing methods have undertaken tailored refinements to the diffusion and reverse process. However, these approaches typically use the highest-score item in corpus for user interest prediction, leading to the ignorance of the user's generalized preference contained within other items, thereby remaining constrained by the data sparsity issue. To address this issue, this paper presents a novel Plug-in Diffusion Model for Recommendation (PDRec) framework, which employs the diffusion model as a flexible plugin to jointly take full advantage of the diffusion-generating user preferences on all items. Specifically, PDRec first infers the users' dynamic preferences on all items via a time-interval diffusion model and proposes a Historical Behavior Reweighting (HBR) mechanism to identify the high-quality behaviors and suppress noisy behaviors. In addition to the observed items, PDRec proposes a Diffusion-based Positive Augmentation (DPA) strategy to leverage the top-ranked unobserved items as the potential positive samples, bringing in informative and diverse soft signals to alleviate data sparsity. To alleviate the false negative sampling issue, PDRec employs Noise-free Negative Sampling (NNS) to select stable negative samples for ensuring effective model optimization. Extensive experiments and analyses on four datasets have verified the superiority of the proposed PDRec over the state-of-the-art baselines and showcased the universality of PDRec as a flexible plugin for commonly-used sequential encoders in different recommendation scenarios. The code is available in https://github.com/hulkima/PDRec.

AAAI Conference 2024 Conference Paper

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation

  • Yiying Yang
  • Fukun Yin
  • Wen Liu
  • Jiayuan Fan
  • Xin Chen
  • Gang Yu
  • Tao Chen

Recent advancements in implicit neural representations have contributed to high-fidelity surface reconstruction and photorealistic novel view synthesis. However, with the expansion of the scene scale, such as block or city level, existing methods will encounter challenges because traditional sampling cannot cope with the cubically growing sampling space. To alleviate the dependence on filling the sampling space, we explore using multi-modal priors to assist individual points to obtain more global semantic information and propose a priorrich multi-modal implicit neural representation network, Pm-INR, for the outdoor unbounded large-scale scene. The core of our method is multi-modal prior extraction and crossmodal prior fusion modules. The former encodes codebooks from different modality inputs and extracts valuable priors, while the latter fuses priors to maintain view consistency and preserve unique features among multi-modal priors. Finally, feature-rich cross-modal priors are injected into the sampling regions to allow each region to perceive global information without filling the sampling space. Extensive experiments have demonstrated the effectiveness and robustness of our method for outdoor unbounded large-scale scene novel view synthesis, which outperforms state-of-the-art methods in terms of PSNR, SSIM, and LPIPS.

AAAI Conference 2024 Conference Paper

REGLO: Provable Neural Network Repair for Global Robustness Properties

  • Feisi Fu
  • Zhilu Wang
  • Weichao Zhou
  • Yixuan Wang
  • Jiameng Fan
  • Chao Huang
  • Qi Zhu
  • Xin Chen

We present REGLO, a novel methodology for repairing pretrained neural networks to satisfy global robustness and individual fairness properties. A neural network is said to be globally robust with respect to a given input region if and only if all the input points in the region are locally robust. This notion of global robustness also captures the notion of individual fairness as a special case. We prove that any counterexample to a global robustness property must exhibit a corresponding large gradient. For ReLU networks, this result allows us to efficiently identify the linear regions that violate a given global robustness property. By formulating and solving a suitable robust convex optimization problem, REGLO then computes a minimal weight change that will provably repair these violating linear regions.

JBHI Journal 2023 Journal Article

Automatic Diagnosis of Significant Liver Fibrosis From Ultrasound B-Mode Images Using a Handcrafted-Feature-Assisted Deep Convolutional Neural Network

  • Zhong Liu
  • Bin Huang
  • Huiying Wen
  • Zhicheng Lu
  • Qicai Huang
  • Meiqin Jiang
  • Changfeng Dong
  • Yingxia Liu

The accurate diagnosis of significant liver fibrosis ( $ \boldsymbol {\geq}$ F2) in patients with chronic liver disease (CLD) is critical, as $\boldsymbol {\geq }$ F2 is a crucial factor that should be considered in selecting an antiviral therapy for these patients. This article proposes a handcrafted-feature-assisted deep convolutional neural network (HFA-DCNN) that helps radiologists automatically and accurately diagnose significant liver fibrosis from ultrasound (US) brightness (B)-mode images. The HFA-DCNN model has three main branches: one for automatic region of interest (ROI) segmentation in the US images, another for attention deep feature learning from the segmented ROI, and the third for handcrafted feature extraction. The attention deep learning features and handcrafted features are fused in the back end of the model to enable more accurate diagnosis of significant liver fibrosis. The usefulness and effectiveness of the proposed model were validated on a dataset built upon 321 CLD patients with liver fibrosis stages confirmed by pathological evaluations. In a fivefold cross validation (FFCV), the proposed model achieves accuracy, sensitivity, specificity, and area under the receiver-operating-characteristic (ROC) curve (AUC) values of 0. 863 (95% confidence interval (CI) 0. 820–0. 899), 0. 879 (95% CI 0. 823–0. 920), 0. 872 (95% CI 0. 800–0. 925), and 0. 925 (95% CI 0. 891–0. 952), which are significantly better than those obtained by the comparative methods. Given its excellent performance, the proposed HFA-DCNN model can serve as a promising tool for the noninvasive and accurate diagnosis of significant liver fibrosis in CLD patients.

NeurIPS Conference 2023 Conference Paper

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

  • Zibo Zhao
  • Wen Liu
  • Xin Chen
  • Xianfang Zeng
  • Rui Wang
  • Pei Cheng
  • Bin Fu
  • Tao Chen

We present a novel alignment-before-generation approach to tackle the challenging task of generating general 3D shapes based on 2D images or texts. Directly learning a conditional generative model from images or texts to 3D shapes is prone to producing inconsistent results with the conditions because 3D shapes have an additional dimension whose distribution significantly differs from that of 2D images and texts. To bridge the domain gap among the three modalities and facilitate multi-modal-conditioned 3D shape generation, we explore representing 3D shapes in a shape-image-text-aligned space. Our framework comprises two models: a Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model (ASLDM). The former model encodes the 3D shapes into the shape latent space aligned to the image and text and reconstructs the fine-grained 3D neural fields corresponding to given shape embeddings via the transformer-based decoder. The latter model learns a probabilistic mapping function from the image or text space to the latent shape space. Our extensive experiments demonstrate that our proposed approach can generate higher-quality and more diverse 3D shapes that better semantically conform to the visual or textural conditional inputs, validating the effectiveness of the shape-image-text-aligned space for cross-modality 3D shape generation.

NeurIPS Conference 2023 Conference Paper

MotionGPT: Human Motion as a Foreign Language

  • Biao Jiang
  • Xin Chen
  • Wen Liu
  • Jingyi Yu
  • Gang Yu
  • Tao Chen

Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multimodal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

TMLR Journal 2023 Journal Article

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

  • Stephen Casper
  • Xander Davies
  • Claudia Shi
  • Thomas Krendl Gilbert
  • Jérémy Scheurer
  • Javier Rando
  • Rachel Freedman
  • Tomek Korbak

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-layered approach to the development of safer AI systems.

NeurIPS Conference 2023 Conference Paper

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

  • Yuhan Ding
  • Fukun Yin
  • Jiayuan Fan
  • Hui Li
  • Xin Chen
  • Wen Liu
  • Chongshan Lu
  • Gang Yu

Recent advances in implicit neural representations have achieved impressive results by sampling and fusing individual points along sampling rays in the sampling space. However, due to the explosively growing sampling space, finely representing and synthesizing detailed textures remains a challenge for unbounded large-scale outdoor scenes. To alleviate the dilemma of using individual points to perceive the entire colossal space, we explore learning the surface distribution of the scene to provide structural priors and reduce the samplable space and propose a Point Diffusion implicit Function, PDF, for large-scale scene neural representation. The core of our method is a large-scale point cloud super-resolution diffusion module that enhances the sparse point cloud reconstructed from several training images into a dense point cloud as an explicit prior. Then in the rendering stage, only sampling points with prior points within the sampling radius are retained. That is, the sampling space is reduced from the unbounded space to the scene surface. Meanwhile, to fill in the background of the scene that cannot be provided by point clouds, the region sampling based on Mip-NeRF 360 is employed to model the background representation. Expensive experiments have demonstrated the effectiveness of our method for large-scale scene novel view synthesis, which outperforms relevant state-of-the-art baselines.

JBHI Journal 2023 Journal Article

Self-Supervised Tumor Segmentation With Sim2Real Adaptation

  • Xiaoman Zhang
  • Weidi Xie
  • Chaoqin Huang
  • Ya Zhang
  • Xin Chen
  • Qi Tian
  • Yanfeng Wang

This paper targets on self-supervised tumor segmentation. We make the following contributions: (i) we take inspiration from the observation that tumors are often characterised independently of their contexts, we propose a novel proxy task “layer-decomposition”, that closely matches the goal of the downstream task, and design a scalable pipeline for generating synthetic tumor data for pre-training; (ii) we propose a two-stage Sim2Real training regime for unsupervised tumor segmentation, where we first pre-train a model with simulated tumors, and then adopt a self-training strategy for downstream data adaptation; (iii) when evaluating on different tumor segmentation benchmarks, e. g. BraTS2018 for brain tumor segmentation and LiTS2017 for liver tumor segmentation, our approach achieves state-of-the-art segmentation performance under the unsupervised setting. While transferring the model for tumor segmentation under a low-annotation regime, the proposed approach also outperforms all existing self-supervised approaches; (iv) we conduct extensive ablation studies to analyse the critical components in data simulation, and validate the necessity of different proxy tasks. We demonstrate that, with sufficient texture randomization in simulation, model trained on synthetic data can effortlessly generalise to datasets with real tumors.

ICML Conference 2023 Conference Paper

Sketched Ridgeless Linear Regression: The Role of Downsampling

  • Xin Chen
  • Yicheng Zeng
  • Siyue Yang
  • Qiang Sun 0007

Overparametrization often helps improve the generalization performance. This paper presents a dual view of overparametrization suggesting that downsampling may also help generalize. Focusing on the proportional regime $m\asymp n \asymp p$, where $m$ represents the sketching size, $n$ is the sample size, and $p$ is the feature dimensionality, we investigate two out-of-sample prediction risks of the sketched ridgeless least square estimator. Our findings challenge conventional beliefs by showing that downsampling does not always harm generalization but can actually improve it in certain cases. We identify the optimal sketching size that minimizes out-of-sample prediction risks and demonstrate that the optimally sketched estimator exhibits stabler risk curves, eliminating the peaks of those for the full-sample estimator. To facilitate practical implementation, we propose an empirical procedure to determine the optimal sketching size. Finally, we extend our analysis to cover central limit theorems and misspecified models. Numerical studies strongly support our theory.

ECAI Conference 2023 Conference Paper

Specializing Small Language Models Towards Complex Style Transfer via Latent Attribute Pre-Training

  • Yongfeng Huang 0001
  • Xin Chen
  • Lin Zhang

In this work, we introduce the concept of complex text style transfer tasks, and constructed complex text datasets based on two widely applicable scenarios. Our dataset is the first large-scale data set of its kind, with 700 rephrased sentences and 1, 000 sentences from the game Genshin Impact. While large language models (LLM) have shown promise in complex text style transfer, they have drawbacks such as data privacy concerns, network instability, and high deployment costs. To address these issues, we explore the effectiveness of small models (less than T5-3B) with implicit style pre-training through contrastive learning. We also propose a method for automated evaluation of text generation quality based on alignment with human evaluations using ChatGPT. Finally, we compare our approach with existing methods and show that our model achieves state-of-art performances of few-shot text style transfer models.

AAAI Conference 2022 Conference Paper

Anisotropic Fourier Features for Neural Image-Based Rendering and Relighting

  • Huangjie Yu
  • Anpei Chen
  • Xin Chen
  • Lan Xu
  • Ziyu Shao
  • Jingyi Yu

Recent neural rendering techniques have greatly benefited image-based modeling and relighting tasks. They provide a continuous, compact, and parallelable representation by modeling the plenoptic function as multilayer perceptrons (MLPs). However, vanilla MLPs suffer from spectral biases on multidimensional datasets. Recent rescues based on isotropic Fourier features mapping mitigate the problem but still fall short of handling heterogeneity across different dimensions, causing imbalanced regression and visual artifacts such as excessive blurs. We present an anisotropic random Fourier features (RFF) mapping scheme to tackle spectral biases. We first analyze the influence of bandwidth from a different perspective: we show that the optimal bandwidth exhibits strong correlations with the frequency spectrum of the training data across various dimensions. We then introduce an anisotropic feature mapping scheme with multiple bandwidths to model the multidimensional signal characteristics. We further propose an efficient bandwidth searching scheme through iterative golden-section search that can significantly reduce the training overload from polynomial time to logarithm. Our anisotropic scheme directly applies to neural surface light-field rendering and image-based relighting. Comprehensive experiments show that our scheme can more faithfully model lighting conditions and object features as well as preserve fine texture details and smooth view transitions even when angular and spatial samples are highly imbalanced.

JBHI Journal 2022 Journal Article

Multiparametric Quantitative US Examination of Liver Fibrosis: A Feature-Engineering and Machine-Learning Based Analysis

  • Huiying Wen
  • Wei Zheng
  • Min Li
  • Qing Li
  • Qiang Liu
  • Jianhua Zhou
  • Zhong Liu
  • Xin Chen

Quantitative ultrasound (QUS), which attempts to extract quantitative features from the US radiofrequency (RF) or envelope data for tissue characterization, is becoming a promising technique for noninvasive assessments of liver fibrosis. However, the number of feature variables examined and finally used in the existing QUS methods is typically small, limiting the diagnostic performance. Therefore, this paper devises a new multiparametric QUS (MP-QUS) method which enables the extraction of a large number of feature variables from US RF signals and allows for the use of feature-engineering and machine-learning based algorithms for liver fibrosis assessment. In the MP-QUS, eighty-four feature variables were extracted from multiple QUS parametric maps derived from the RF signals and the envelope data. Afterwards, feature reduction and selection were performed in turn to remove the feature redundancy and identify the best combination of features in the reduced feature set. Finally, a variety of machine-learning algorithms were tested for fibrosis classification with the selected features, based on the results of which the optimal classifier was established. The performance of the proposed MP-QUS method for staging liver fibrosis was evaluated on an animal model, with histologic examination as the reference standard. The mean accuracy, sensitivity, specificity and area under the receiver-operating-characteristic curve achieved by MP-QUS are respectively 83. 38%, 86. 04%, 80. 82%, and 0. 891 for recognizing significant liver fibrosis, and 85. 50%, 88. 92%, 85. 24%, and 0. 924 for diagnosing liver cirrhosis. The proposed MP-QUS method paves a way for its future extension to assess liver fibrosis in human subjects.

IJCAI Conference 2022 Conference Paper

Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization

  • Yanyu Li
  • Pu Zhao
  • Geng Yuan
  • Xue Lin
  • Yanzhi Wang
  • Xin Chen

Neural architecture search (NAS) and network pruning are widely studied efficient AI techniques, but not yet perfect. NAS performs exhaustive candidate architecture search, incurring tremendous search cost. Though (structured) pruning can simply shrink model dimension, it remains unclear how to decide the per-layer sparsity automatically and optimally. In this work, we revisit the problem of layer-width optimization and propose Pruning-as-Search (PaS), an end-to-end channel pruning method to search out desired sub-network automatically and efficiently. Specifically, we add a depth-wise binary convolution to learn pruning policies directly through gradient descent. By combining the structural reparameterization and PaS, we successfully searched out a new family of VGG-like and lightweight networks, which enable the flexibility of arbitrary width with respect to each layer instead of each stage. Experimental results show that our proposed architecture outperforms prior arts by around 1. 0% top-1 accuracy under similar inference speed on ImageNet-1000 classification task. Furthermore, we demonstrate the effectiveness of our width search on complex tasks including instance segmentation and image translation. Code and models are released.

JBHI Journal 2021 Journal Article

2D and 3D CT Radiomic Features Performance Comparison in Characterization of Gastric Cancer: A Multi-Center Study

  • Lingwei Meng
  • Di Dong
  • Xin Chen
  • Mengjie Fang
  • Rongpin Wang
  • Jing Li
  • Zaiyi Liu
  • Jie Tian

Objective: Radiomics, an emerging tool for medical image analysis, is potential towards precisely characterizing gastric cancer (GC). Whether using one-slice 2D annotation or whole-volume 3D annotation remains a long-time debate, especially for heterogeneous GC. We comprehensively compared 2D and 3D radiomic features’ representation and discrimination capacity regarding GC, via three tasks ( ${\boldsymbol{T}^{\boldsymbol{LNM}}}$, lymph node metastasis’ prediction; ${\boldsymbol{T}^{\boldsymbol{LVI}}}$, lymphovascular invasion's prediction; ${\boldsymbol{T}^{\boldsymbol{pT}}}$, pT4 or other pT stages’ classification). Methods: Four-center 539 GC patients were retrospectively enrolled and divided into the training and validation cohorts. From 2D or 3D regions of interest (ROIs) annotated by radiologists, radiomic features were extracted respectively. Feature selection and model construction procedures were customed for each combination of two modalities (2D or 3D) and three tasks. Subsequently, six machine learning models ( $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{LNM}}$, $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{LNM}}$; $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{LVI}}$, $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{LVI}}$; $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{pT}}$, $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{pT}}$ ) were derived and evaluated to reflect modalities’ performances in characterizing GC. Furthermore, we performed an auxiliary experiment to assess modalities’ performances when resampling spacing different. Results: Regarding three tasks, the yielded areas under the curve (AUCs) were: $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{LNM}}$ 's 0. 712 (95% confidence interval, 0. 613–0. 811), $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{LNM}}$ 's 0. 680 (0. 584–0. 775); $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{LVI}}$ 's 0. 677 (0. 595–0. 761), $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{LVI}}$ 's 0. 615 (0. 528-0. 703); $\boldsymbol{Model}_{2\boldsymbol{D}}^{\boldsymbol{pT}}$ 's 0. 840 (0. 779–0. 901), $\boldsymbol{Model}_{3\boldsymbol{D}}^{\boldsymbol{pT}}$ 's 0. 813 (0. 747–0. 879). Moreover, the auxiliary experiment indicated that $\boldsymbol{Model}{\boldsymbol{s}_{2\boldsymbol{D}}}$ are statistically advantageous than $\boldsymbol{Model}{\boldsymbol{s}_{3\boldsymbol{D}}}$ with different resampling spacings. Conclusion: Models constructed with 2D radiomic features revealed comparable performances with those constructed with 3D features in characterizing GC. Significance: Our work indicated that time-saving 2D annotation would be the better choice in GC, and provided a related reference to further radiomics-based researches.

JBHI Journal 2021 Journal Article

Accurate and Feasible Deep Learning Based Semi-Automatic Segmentation in CT for Radiomics Analysis in Pancreatic Neuroendocrine Neoplasms

  • Bingsheng Huang
  • Xiaoyi Lin
  • Jingxian Shen
  • Xin Chen
  • Jia Chen
  • Zi-Ping Li
  • Mingyu Wang
  • Chenglang Yuan

Current clinical practice or radiomics studies of pancreatic neuroendocrine neoplasms (pNENs) require manual delineation of the lesions in computed tomography (CT) images, which is time-consuming and subjective. We used a semi-automatic deep learning (DL) method for segmentation of pNENs and verified its feasibility in radiomics analysis. This retrospective study included two datasets: Dataset 1, contrast-enhanced CT images (CECT) of 80 and 18 patients respectively collected from two centers; and Dataset 2, CECT of 56 and 16 patients respectively from two centers. A DL-based semi-automatic segmentation model was developed and validated with Dataset 1 and Dataset 2, and the segmentation results were used for radiomics analysis from which the performance was compared against that based on manual segmentation. The mean Dice similarity coefficient of the trained segmentation model was 81. 8% and 74. 8% for external validation with Dataset 1 and Dataset 2 respectively. Four classifiers frequently used in radiomics studies were trained and tested with leave-one-out cross-validation strategy. For pathological grading prediction with Dataset 1, the area under the receiver operating characteristic curve (AUC) with semi-automatic segmentation was up to 0. 76 and 0. 87 respectively for internal and external validation. For recurrence study with Dataset 2, the AUC with semi-automatic segmentation was up to 0. 78. All these AUCs were not statistically significant from the corresponding results based on manual segmentation. Our study showed that DL-based semi-automatic segmentation is accurate and feasible for the radiomics analysis in pNENs.

NeurIPS Conference 2021 Conference Paper

An Empirical Investigation of Representation Learning for Imitation

  • Cynthia Chen
  • Xin Chen
  • Sam Toyer
  • Cody Wild
  • Scott Emmons
  • Ian Fischer
  • Kuang-Huei Lee
  • Neel Alex

Imitation learning often needs a large demonstration set in order to handle the full range of situations that an agent might find itself in during deployment. However, collecting expert demonstrations can be expensive. Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific data. Our Empirical Investigation of Representation Learning for Imitation (EIRLI) investigates whether similar benefits apply to imitation learning. We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites. In the settings we evaluate, we find that existing algorithms for image-based representation learning provide limited value relative to a well-tuned baseline with image augmentations. To explain this result, we investigate differences between imitation learning and other settings where representation learning has provided significant benefit, such as image classification. Finally, we release a well-documented codebase which both replicates our findings and provides a modular framework for creating new representation learning algorithms out of reusable components.

AAAI Conference 2021 Conference Paper

Correlation-Aware Heuristic Search for Intelligent Virtual Machine Provisioning in Cloud Systems

  • Chuan Luo
  • Bo Qiao
  • Wenqian Xing
  • Xin Chen
  • Pu Zhao
  • Chao Du
  • Randolph Yao
  • Hongyu Zhang

The optimization of resource is crucial for the operation of public cloud systems such as Microsoft Azure, as well as servers dedicated to the workloads of large customers such as Microsoft 365. Those optimization tasks often need to take unknown parameters into consideration and can be formulated as Prediction+Optimization problems. This paper proposes a new Prediction+Optimization method named Correlation-Aware Heuristic Search (CAHS) that is capable of accounting for the uncertainty in unknown parameters and delivering effective solutions to difficult optimization problems. We apply this method to solving the predictive virtual machine (VM) provisioning (PreVMP) problem, where the VM provisioning plans are optimized based on the predicted demands of different VM types, to ensure rapid provisions upon customers’ requests and to pursue high resource utilization. Unlike the current state-of-the-art PreVMP approaches that assume independence among the demands for different VM types, CAHS incorporates demand correlation when conducting prediction and optimization in a novel and effective way. Our experiments on two public benchmarks and one industrial benchmark demonstrate that CAHS can achieve better performance than its nine state-of-the-art competitors. CAHS has been successfully deployed in Microsoft Azure and significantly improved its performance. The main ideas of CAHS have also been leveraged to improve the efficiency and the reliability of the cloud services provided by Microsoft 365.

IJCAI Conference 2021 Conference Paper

Few-shot Neural Human Performance Rendering from Sparse RGBD Videos

  • Anqi Pang
  • Xin Chen
  • Haimin Luo
  • Minye Wu
  • Jingyi Yu
  • Lan Xu

Recent neural rendering approaches for human activities achieve remarkable view synthesis results, but still rely on dense input views or dense training with all the capture frames, leading to deployment difficulty and inefficient training overload. However, existing advances will be ill-posed if the input is both spatially and temporally sparse. To fill this gap, in this paper we propose a few-shot neural human rendering approach (FNHR) from only sparse RGBD inputs, which exploits the temporal and spatial redundancy to generate photo-realistic free-view output of human activities. Our FNHR is trained only on the key-frames which expand the motion manifold in the input sequences. We introduce a two-branch neural blending to combine the neural point render and classical graphics texturing pipeline, which integrates reliable observations over sparse key-frames. Furthermore, we adopt a patch-based adversarial training process to make use of the local redundancy and avoids over-fitting to the key-frames, which generates fine-detailed rendering results. Extensive experiments demonstrate the effectiveness of our approach to generate high-quality free view-point results for challenging human performances under the sparse setting.

NeurIPS Conference 2021 Conference Paper

FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark

  • Mingjie Li
  • Wenjia Cai
  • Rui Liu
  • Yuetian Weng
  • Xiaoyun Zhao
  • Cong Wang
  • Xin Chen
  • Zhong Liu

The automatic generation of long and coherent medical reports given medical images (e. g. Chest X-ray and Fundus Fluorescein Angiography (FFA)) has great potential to support clinical practice. Researchers have explored advanced methods from computer vision and natural language processing to incorporate medical domain knowledge for the generation of readable medical reports. However, existing medical report generation (MRG) benchmarks lack both explainable annotations and reliable evaluation tools, hindering the current research advances from two aspects: firstly, existing methods can only predict reports without accurate explanation, undermining the trustworthiness of the diagnostic methods; secondly, the comparison among the predicted reports from different MRG methods is unreliable using the evaluation metrics of natural-language generation (NLG). To address these issues, in this paper, we propose an explainable and reliable MRG benchmark based on FFA Images and Reports (FFA-IR). Specifically, FFA-IR is large, with 10, 790 reports along with 1, 048, 584 FFA images from clinical practice; it includes explainable annotations, based on a schema of 46 categories of lesions; and it is bilingual, providing both English and Chinese reports for each case. Besides using the widely used NLG metrics, we propose a set of nine human evaluation criteria to evaluate the generated reports. We envision FFA-IR as a testbed for explainable and reliable medical report generation. We also hope that it can broadly accelerate medical imaging research and facilitate interaction between the fields of medical imaging, computer vision, and natural language processing.

AAAI Conference 2021 Conference Paper

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

  • Xin Chen
  • Lingxi Xie
  • Jun Wu
  • Longhui Wei
  • Yuhui Xu
  • Qi Tian

Neural architecture search has attracted wide attentions in both academia and industry. To accelerate it, researchers proposed weight-sharing methods which first train a super-network to reuse computation among different operators, from which exponentially many sub-networks can be sampled and efficiently evaluated. These methods enjoy great advantages in terms of computational costs, but the sampled sub-networks are not guaranteed to be estimated precisely unless an individual training process is taken. This paper owes such inaccuracy to the inevitable mismatch between assembled network layers, so that there is a random error term added to each estimation. We alleviate this issue by training a graph convolutional network to fit the performance of sampled sub-networks so that the impact of random errors becomes minimal. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates, which consequently leads to better performance of the final architecture. In addition, our approach also enjoys the flexibility of being used under different hardware constraints, since the graph convolutional network has provided an efficient lookup table of the performance of architectures in the entire search space.

NeurIPS Conference 2021 Conference Paper

On the Bias-Variance-Cost Tradeoff of Stochastic Optimization

  • Yifan Hu
  • Xin Chen
  • Niao He

We consider stochastic optimization when one only has access to biased stochastic oracles of the objective, and obtaining stochastic gradients with low biases comes at high costs. This setting captures a variety of optimization paradigms widely used in machine learning, such as conditional stochastic optimization, bilevel optimization, and distributionally robust optimization. We examine a family of multi-level Monte Carlo (MLMC) gradient methods that exploit a delicate trade-off among the bias, the variance, and the oracle cost. We provide a systematic study of their convergences and total computation complexities for strongly convex, convex, and nonconvex objectives, and demonstrate their superiority over the naive biased stochastic gradient method. Moreover, when applied to conditional stochastic optimization, the MLMC gradient methods significantly improve the best-known sample complexity in the literature.

NeurIPS Conference 2020 Conference Paper

Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

  • Yifan Hu
  • Siqi Zhang
  • Xin Chen
  • Niao He

Conditional stochastic optimization covers a variety of applications ranging from invariant learning and causal inference to meta-learning. However, constructing unbiased gradient estimators for such problems is challenging due to the composition structure. As an alternative, we propose a biased stochastic gradient descent (BSGD) algorithm and study the bias-variance tradeoff under different structural assumptions. We establish the sample complexities of BSGD for strongly convex, convex, and weakly convex objectives under smooth and non-smooth conditions. Our lower bound analysis shows that the sample complexities of BSGD cannot be improved for general convex objectives and nonconvex objectives except for smooth nonconvex objectives with Lipschitz continuous gradient estimator. For this special setting, we propose an accelerated algorithm called biased SpiderBoost (BSpiderBoost) that matches the lower bound complexity. We further conduct numerical experiments on invariant logistic regression and model-agnostic meta-learning to illustrate the performance of BSGD and BSpiderBoost.

NeurIPS Conference 2020 Conference Paper

Graph Stochastic Neural Networks for Semi-supervised Learning

  • Haibo Wang
  • Chuan Zhou
  • Xin Chen
  • Jia Wu
  • Shirui Pan
  • Jilong Wang

Graph Neural Networks (GNNs) have achieved remarkable performance in the task of the semi-supervised node classification. However, most existing models learn a deterministic classification function, which lack sufficient flexibility to explore better choices in the presence of kinds of imperfect observed data such as the scarce labeled nodes and noisy graph structure. To improve the rigidness and inflexibility of deterministic classification functions, this paper proposes a novel framework named Graph Stochastic Neural Networks (GSNN), which aims to model the uncertainty of the classification function by simultaneously learning a family of functions, i. e. , a stochastic function. Specifically, we introduce a learnable graph neural network coupled with a high-dimensional latent variable to model the distribution of the classification function, and further adopt the amortised variational inference to approximate the intractable joint posterior for missing labels and the latent variable. By maximizing the lower-bound of the likelihood for observed node labels, the instantiated models can be trained in an end-to-end manner effectively. Extensive experiments on three real-world datasets show that GSNN achieves substantial performance gain in different scenarios compared with stat-of-the-art baselines.

IJCAI Conference 2020 Conference Paper

Intelligent Virtual Machine Provisioning in Cloud Computing

  • Chuan Luo
  • Bo Qiao
  • Xin Chen
  • Pu Zhao
  • Randolph Yao
  • Hongyu Zhang
  • Wei Wu
  • Andrew Zhou

Virtual machine (VM) provisioning is a common and critical problem in cloud computing. In industrial cloud platforms, there are a huge number of VMs provisioned per day. Due to the complexity and resource constraints, it needs to be carefully optimized to make cloud platforms effectively utilize the resources. Moreover, in practice, provisioning a VM from scratch requires fairly long time, which would degrade the customer experience. Hence, it is advisable to provision VMs ahead for upcoming demands. In this work, we formulate the practical scenario as the predictive VM provisioning (PreVMP) problem, where upcoming demands are unknown and need to be predicted in advance, and then the VM provisioning plan is optimized based on the predicted demands. Further, we propose Uncertainty-Aware Heuristic Search (UAHS) for solving the PreVMP problem. UAHS first models the prediction uncertainty, and then utilizes the prediction uncertainty in optimization. Moreover, UAHS leverages Bayesian optimization to interact prediction and optimization to improve its practical performance. Extensive experiments show that UAHS performs much better than state-of-the-art competitors on two public datasets and an industrial dataset. UAHS has been successfully applied in Microsoft Azure and brought practical benefits in real-world applications.

IROS Conference 2020 Conference Paper

Path Planning Under MIMO Network Constraints for Throughput Enhancement in Multi-robot Data Aggregation Tasks

  • Alexandra Pogue
  • Samer S. Hanna
  • Andy Nichols
  • Xin Chen
  • Danijela Cabric
  • Ankur Mehta

Under line-of-sight (LOS) network conditions, multi-input multi-output (MIMO) wireless communications can increase the channel capacity between a team of robots and a multi-antenna array at a stationary base station. This increased capacity can result in greater data throughput, shortening the time necessary to complete channel-limited data aggregation tasks. To take advantage of this higher capacity channel, the robots in the team must be positioned to maximize complex channel orthogonality between each robot and receiver antenna. Using geometrically motivated assumptions, we derive transmitter spacing rules that can be easily be added on to existing path plans to improve backhaul throughput for data offloading from the robot team, with minimal impact on other system objectives. We demonstrate the effectiveness of the approach- both in ideal as well as realistic channels outside the domain of our simplifying assumptions-with numerical examples of robot-coordinated path plans in two example environments, achieving up to 42% improvement in task completion times.

NeurIPS Conference 2019 Conference Paper

Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis

  • Yingying Li
  • Xin Chen
  • Na Li

This paper studies the online optimal control problem with time-varying convex stage costs for a time-invariant linear dynamical system, where a finite lookahead window of accurate predictions of the stage costs are available at each time. We design online algorithms, Receding Horizon Gradient-based Control (RHGC), that utilize the predictions through finite steps of gradient computations. We study the algorithm performance measured by dynamic regret: the online performance minus the optimal performance in hindsight. It is shown that the dynamic regret of RHGC decays exponentially with the size of the lookahead window. In addition, we provide a fundamental limit of the dynamic regret for any online algorithms by considering linear quadratic tracking problems. The regret upper bound of one RHGC method almost reaches the fundamental limit, demonstrating the effectiveness of the algorithm. Finally, we numerically test our algorithms for both linear and nonlinear systems to show the effectiveness and generality of our RHGC.

TAAS Journal 2017 Journal Article

Integrating Reinforcement Learning with Multi-Agent Techniques for Adaptive Service Composition

  • Hongbign Wang
  • Xin Chen
  • Qin Wu
  • Qi Yu
  • Xingguo Hu
  • Zibin Zheng
  • Athman Bouguettaya

Service-oriented architecture is a widely used software engineering paradigm to cope with complexity and dynamics in enterprise applications. Service composition, which provides a cost-effective way to implement software systems, has attracted significant attention from both industry and research communities. As online services may keep evolving over time and thus lead to a highly dynamic environment, service composition must be self-adaptive to tackle uninformed behavior during the evolution of services. In addition, service composition should also maintain high efficiency for large-scale services, which are common for enterprise applications. This article presents a new model for large-scale adaptive service composition based on multi-agent reinforcement learning. The model integrates reinforcement learning and game theory, where the former is to achieve adaptation in a highly dynamic environment and the latter is to enable agents to work for a common task (i.e., composition). In particular, we propose a multi-agent Q-learning algorithm for service composition, which is expected to achieve better performance when compared with the single-agent Q-learning method and multi-agent SARSA (State-Action-Reward-State-Action) method. Our experimental results demonstrate the effectiveness and efficiency of our approach.

IJCAI Conference 2017 Conference Paper

Switched Linear Multi-Robot Navigation Using Hierarchical Model Predictive Control

  • Chao Huang
  • Xin Chen
  • Yifan Zhang
  • Shengchao Qin
  • Yifeng Zeng
  • Xuandong Li

Multi-robot navigation control in the absence of reference trajectory is rather challenging as it is expected to ensure stability and feasibility while still offer fast computation on control decisions. The intrinsic high complexity of switched linear dynamical robots makes the problem even more challenging. In this paper, we propose a novel HMPC based method to address the navigation problem of multiple robots with switched linear dynamics. We develop a new technique to compute the reachable sets of switched linear systems and use them to enable the parallel computation of control parameters. We present theoretical results on stability, feasibility and complexity of the proposed approach, and demonstrate its empirical advance in performance against other approaches.

IJCAI Conference 2016 Conference Paper

Hierarchical Model Predictive Control for Multi-Robot Navigation

  • Chao Huang
  • Xin Chen
  • Yifan Zhang
  • Shengchao Qin
  • Yifeng Zeng
  • Xuandong Li

Ensuring the stability is the most important requirement for the navigation control of multi-robot systems with no reference trajectory. The popular heuristic-search methods cannot provide theoretical guarantees on stability. In this paper, we propose a Hierarchical Model Predictive Control scheme that employs reachable sets to decouple the navigation problem of linear dynamical multi-robot systems. The proposed control scheme guarantees the stability and feasibility, and is more efficient and viable than other Model Predictive Control schemes, as evidenced by our simulation results.

ICRA Conference 2010 Conference Paper

Wearable accelerometer based extendable activity recognition system

  • Jie Yang 0002
  • Shuangquan Wang
  • Ningjiang Chen
  • Xin Chen
  • Pengfei Shi

Recognizing the human activities of daily living (ADL) is an important research issue in the pervasive environment. Activity recognition is treated as a classification problem and the multi-class classifier is often used. Though the multi-class classifier can obtain high classification accuracy, it can not detect the noise activities and unknown activities, and the system has no extendable recognition capability. In this paper, we proposed a recognition system which can recognize known activities and detect unknown activities simultaneously. For each known activity, one one-class classification model is built up and the combined one-class classification models are used to judge whether a test sample belongs to known activities. For the known samples, the multi-class classifier is used to recognize their types. For the continuous unknown samples, based on segmentation algorithm, training samples of new activities are extracted and added into the recognition system to extend the system's recognition capability.

IROS Conference 2006 Conference Paper

On-Line Vibration Source Detection of Running Trains Based on Acceleration Measurement

  • Chengyou Wang
  • Qiugen Xiao
  • Hua Liang
  • Xin Chen
  • Xuanping Cai
  • Yun-Hui Liu 0001

To ensure safety of railway operation, it is important to regularly check railway conditions such as deformation of the rails. To monitor rail deformation, this paper presents a method for detecting sources of vibrations a running train on-line by measuring accelerations, which include the train bogie's lateral acceleration, and the crossbeam's lateral and vertical accelerations. A series of detection algorithms including peak-peak value entropy comparison, weighted correlation coefficients comparison etc. are proposed in the method, according to different characters of vibrations from train itself and rail deformation. To eliminate the vibration due to the train itself, the algorithm employs the peak-peak value entropy comparison. To identify the order of the vibrations between crossbeam and bogie, a weighted correlation coefficient is applied. Weight center and maximum position are used to detect at last. The algorithms were implemented on a passenger train using ARM processor and real experiments were conducted on the train on the railway between Shenyang and Dalian in China. The experiments demonstrated that the proposed method can produce satisfactory results.

NeurIPS Conference 2005 Conference Paper

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

  • Gregory Zelinsky
  • Wei Zhang
  • Bing Yu
  • Xin Chen
  • Dimitris Samaras

To investigate how top-down (TD) and bottom-up (BU) information is weighted in the guidance of human search behavior, we manipulated the proportions of BU and TD components in a saliency-based model. The model is biologically plausible and implements an artificial retina and a neuronal population code. The BU component is based on feature- contrast. The TD component is defined by a feature-template match to a stored target representation. We compared the model’s behavior at differ- ent mixtures of TD and BU components to the eye movement behavior of human observers performing the identical search task. We found that a purely TD model provides a much closer match to human behavior than any mixture model using BU information. Only when biological con- straints are removed (e. g. , eliminating the retina) did a BU/TD mixture model begin to approximate human behavior.