Arrow Research search

Author name cluster

Xiang Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

57 papers
2 author rows

Possible papers

57

JBHI Journal 2026 Journal Article

DF-DiffVSR: Deformable Field-Driven Diffusion Model for Inter-Slice Continuity Enhancement in Medical Volume Super-Resolution

  • Can Wang
  • Min Liu
  • Qinghao Liu
  • Yuehao Zhu
  • Xiang Chen
  • Licheng Liu
  • Yaonan Wang
  • Erik Meijering

Medical volumetric imaging is crucial for precise diagnosis, but limited by equipment and acquisition constraints, anisotropic resolution leads to challenges in detecting small lesions and 3D visualization. While volumetric super-resolution methods can mitigate this issue, existing techniques suffer from limited receptive fields, failing to fully exploit inter-slice correlations and resulting in compromised inter-slice continuity. To address this limitation, we propose DF-DiffVSR, a novel deformable field-enhanced diffusion model for medical volume super resolution. The proposed method integrates optical flow principles with diffusion models through a Deformable Field Extraction (DFE) module, which explicitly learns inter slice motion information to enhance structural continuity in the through-plane direction. Furthermore, we design a Multiscale Large Kernel Convolution (MLKC) module that employs striped convolutions with varying kernel sizes to expand the receptive field and capture global anatomical context. Evaluated on RPLHR-CT and IXI-T2 datasets, DF DiffVSR achieves state-of-the-art (SOTA) performance, surpassing the sub-optimal method by 0. 732 dB and 0. 214 dB in PSNR, respectively, demonstrating superior capabilities in preserving inter-slice continuity and recovering fine grained details.

AAAI Conference 2026 Conference Paper

Dual-Kernel Graph Community Contrastive Learning

  • Xiang Chen
  • Kun Yue
  • Wenjie Liu
  • Zhenyu Zhang
  • Liang Duan

Graph Contrastive Learning (GCL) has emerged as a powerful paradigm for training Graph Neural Networks (GNNs) in the absence of task-specific labels. However, its scalability on large-scale graphs is hindered by the intensive message passing mechanism of GNN and the quadratic computational complexity of contrastive loss over positive and negative node pairs. To address these issues, we propose an efficient GCL framework that transforms the input graph into a compact network of interconnected node sets while preserving structural information across communities. We firstly introduce a kernelized graph community contrastive loss with linear complexity, enabling effective information transfer among node sets to capture hierarchical structural information of the graph. We then incorporate a knowledge distillation technique into the decoupled GNN architecture to accelerate inference while maintaining strong generalization performance. Extensive experiments on sixteen real-world datasets of varying scales demonstrate that our method outperforms state-of-the-art GCL baselines in both effectiveness and scalability.

AAAI Conference 2026 Conference Paper

Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness

  • Jiaxing Zhao
  • Boyuan Sun
  • Xiang Chen
  • Xihan Wei

Facial expression captioning has found widespread application across various domains. Recently, the emergence of video Multimodal Large Language Models (MLLMs) has shown promise in general video understanding tasks. However, describing facial expressions within videos poses two major challenges for these models: (1) the lack of adequate datasets and benchmarks, and (2) the limited visual token capacity of video MLLMs. To address these issues, this paper introduces a new instruction-following dataset tailored for dynamic facial expression caption. The dataset comprises 5,033 high-quality video clips annotated manually, containing over 700,000 tokens. Its purpose is to improve the capability of video MLLMs to discern subtle facial nuances. Furthermore, we propose FaceTrack-MM, which leverages a limited number of tokens to encode the main character’s face. This model demonstrates superior performance in tracking faces and focusing on the facial expressions of the main characters, even in intricate multiperson scenarios. Additionally, we introduce a novel evaluation metric combining event extraction, relation classification, and the longest common subsequence (LCS) algorithm to assess the content consistency and temporal sequence consistency of generated text. Moreover, we present FECBench, a benchmark designed to assess the performance of existing video MLLMs in this specific task.

AAAI Conference 2026 Conference Paper

MultiMedBench: A Scenario-Aware Benchmark for Evaluating Knowledge Editing in Medical VQA

  • Shengtao Wen
  • Haodong Chen
  • Yadong Wang
  • Zhongying Pan
  • Xiang Chen
  • Yu Tian
  • Bo Qian
  • Dong Liang

Knowledge editing (KE) provides a scalable approach for updating factual knowledge in large language models without full retraining. While previous studies have demonstrated effectiveness in general domains and medical QA tasks, little attention has been paid to KE in multimodal medical scenarios. Unlike text-only settings, medical KE demands integrating updated knowledge with visual reasoning to support safe and interpretable clinical decisions. To address this gap, we propose MultiMedBench, the first benchmark tailored to evaluating KE in clinical multimodal tasks. Our framework spans both understanding and reasoning task types, defines a three-dimensional metric suite (reliability, generality, and locality), and supports cross-paradigm comparisons across general and domain-specific models. We conduct extensive experiments under single-editing and lifelong-editing settings. Results suggest that current methods struggle with generalization and long-tail reasoning, particularly in complex clinical workflows. We further present an efficiency analysis (e.g., edit latency, memory footprint), revealing practical trade-offs in real-world deployment across KE paradigms. Overall, MultiMedBench not only reveals the limitations of current approaches but also provides a solid foundation for developing clinically robust knowledge editing techniques in the future.

AAAI Conference 2026 Conference Paper

Potent but Stealthy: Rethink Profile Pollution Against Sequential Recommendation via Bi-Level Constrained Reinforcement Paradigm

  • Jiajie Su
  • Zihan Nan
  • Yunshan Ma
  • Xiaobo Xia
  • XiaoHua Feng
  • Weiming Liu
  • Xiang Chen
  • Xiaolin Zheng

Sequential Recommenders, which exploit dynamic user intents through interaction sequences, are vulnerable to adversarial attacks. While existing attacks primarily rely on data poisoning, they require large-scale user access or fake profiles thus lacking practicality. In this paper, we focus on the Profile Pollution Attack (PPA) that subtly contaminates partial user interactions to induce targeted mispredictions. Previous PPA methods suffer from two limitations, i.e., i) over-reliance on sequence horizon impact restricts fine-grained perturbations on item transitions, and ii) holistic modifications cause detectable distribution shifts. To address these challenges, we propose a constrained reinforcement driven attack CREAT that synergizes a bi-level optimization framework with multi-reward reinforcement learning to balance adversarial efficacy and stealthiness. We first develop a Pattern Balanced Rewarding Policy, which integrates pattern inversion rewards to invert critical patterns and distribution consistency rewards to minimize detectable shifts via unbalanced co-optimal transport. Then we employ a Constrained Group Relative Reinforcement Learning paradigm, enabling step-wise perturbations through dynamic barrier constraints and group-shared experience replay, achieving targeted pollution with minimal detectability. Extensive experiments demonstrate the effectiveness of CREAT.

AAAI Conference 2026 Conference Paper

Reflect Then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion

  • Dong Zhao
  • Yadong Wang
  • Xiang Chen
  • Chenxi Wang
  • Hongliang Dai
  • Chuanxing Geng
  • Shengzhong Zhang
  • Shao-Yuan Li

Large Language Models (LLMs) show remarkable potential for few-shot information extraction (IE), yet their performance is highly sensitive to the choice of in-context examples. Conventional selection strategies often fail to provide informative guidance, as they overlook a key source of model fallibility: confusion stemming not just from semantic content, but also from the generation of well-structured formats required by IE tasks. To address this, we introduce Active Prompting for Information Extraction (APIE), a novel active prompting framework guided by a principle we term introspective confusion. Our method empowers an LLM to assess its own confusion through a dual-component uncertainty metric that uniquely quantifies both Format Uncertainty (difficulty in generating correct syntax) and Content Uncertainty (inconsistency in extracted semantics). By ranking unlabeled data with this comprehensive score, our framework actively selects the most challenging and informative samples to serve as few-shot exemplars. Extensive experiments on four benchmarks show that our approach consistently outperforms strong baselines, yielding significant improvements in both extraction accuracy and robustness. Our work highlights the critical importance of a fine-grained, dual-level view of model uncertainty when it comes to building effective and reliable structured generation systems.

AAAI Conference 2026 Conference Paper

Rethinking Rainy 3D Scene Reconstruction via Perspective Transforming and Brightness Tuning

  • Qianfeng Yang
  • Xiang Chen
  • Pengpeng Li
  • Qiyuan Guan
  • Guiyue Jin
  • Jiyu Jin

Rain degrades the visual quality of multi-view images, which are essential for 3D scene reconstruction, resulting in inaccurate and incomplete reconstruction results. Existing datasets often overlook two critical characteristics of real rainy 3D scenes: the viewpoint-dependent variation in the appearance of rain streaks caused by their projection onto 2D images, and the reduction in ambient brightness resulting from cloud coverage during rainfall. To improve data realism, we construct a new dataset named OmniRain3D that incorporates perspective heterogeneity and brightness dynamicity, enabling more faithful simulation of rain degradation in 3D scenes. Based on this dataset, we propose an end-to-end reconstruction framework named REVR-GSNet (Rain Elimination and Visibility Recovery for 3D Gaussian Splatting). Specifically, REVR-GSNet integrates recursive brightness enhancement, Gaussian primitive optimization, and GS-guided rain elimination into a unified architecture through joint alternating optimization, achieving high-fidelity reconstruction of clean 3D scenes from rain-degraded inputs. Extensive experiments show the effectiveness of our dataset and method. Our dataset and method provide a foundation for future research on multi-view image deraining and rainy 3D scene reconstruction.

AAAI Conference 2026 Conference Paper

VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use

  • Zhehao Zhang
  • Ryan A. Rossi
  • Tong Yu
  • Franck Dernoncourt
  • Ruiyi Zhang
  • Jiuxiang Gu
  • Sungchul Kim
  • Xiang Chen

While vision-language models (VLMs) have demonstrated remarkable performance across various tasks combining textual and visual information, they continue to struggle with fine-grained visual perception tasks that require detailed pixel-level analysis. Effectively eliciting comprehensive reasoning from VLMs on such intricate visual elements remains an open challenge. In this paper, we present VipAct, an agent framework that enhances VLMs by integrating multi-agent collaboration and vision expert models, enabling more precise visual understanding and comprehensive reasoning. VipAct consists of an orchestrator agent, which manages task requirement analysis, planning, and coordination, along with specialized agents that handle specific tasks such as image captioning and vision expert models that provide high-precision perceptual information. This multi-agent approach allows VLMs to better perform fine-grained visual perception tasks by synergizing planning, reasoning, and tool use. We evaluate VipAct on benchmarks featuring a diverse set of visual perception tasks, with experimental results demonstrating significant performance improvements over state-of-the-art baselines across all tasks. Furthermore, comprehensive ablation studies reveal the critical role of multi-agent collaboration in eliciting more detailed System-2 reasoning and highlight the importance of image input for task planning. Additionally, our error analysis identifies patterns of VLMs' inherent limitations in visual perception, providing insights into potential future improvements. VipAct offers a flexible and extensible framework, paving the way for more advanced visual perception systems across various real-world applications.

AAAI Conference 2025 Conference Paper

AIA: Autoregression-Based Injection Attacks Against Text2SQL Models

  • Deyin Li
  • Xiang Ling
  • Changjiang Li
  • Xiang Chen
  • Chunming Wu

To facilitate understanding of users' diverse queries against the back-end databases in web applications, researchers have introduced Text-to-SQL (Text2SQL) models that can generate well-structured SQL queries from users' query texts in natural language. As the Text2SQL model decouples the user queries with the back-end databases, it inherently mitigates the SQL injection risk posed by inserting users' input into pre-written SQL queries. However, what security risks to web applications may be posed by Text2SQL models remains an open question. In this paper, we present a new attack framework, named Autoregression-based Injection Attacks (AIA), to evaluate the security risks of Text2SQL models. In particular, AIA makes target models generate attack payloads by constructing specific inputs and adjusting the input auto-regressively. Our evaluation demonstrates that AIA can cause Text2SQL models to generate target output by adversarial inputs with success rates of over 70% in most scenarios. The generated adversarial input has certain transferability in target Text2SQL models. Additionally, practice experiments show that AIA can make Text2SQL models extract user lists from databases and even delete data in databases directly.

TMLR Journal 2025 Journal Article

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

  • Leitian Tao
  • Xiang Chen
  • Tong Yu
  • Tung Mai
  • Ryan A. Rossi
  • Yixuan Li
  • Saayan Mitra

Large Language Models (LLMs) have revolutionized code generation but are require significant resources and tend to over-generalize, limiting their task-specific efficiency. Fine-tuning smaller, open-source LLMs is a cost-effective alternative, yet standard supervised approaches rely solely on correct examples, overlooking valuable insights from failures. We introduce CodeLutra, a new framework that leverages both correct and incorrect code attempts. Instead of purely instructing with correct solutions, CodeLutra uses iterative preference-based refinement, comparing successful and failed outputs to better approximate desired results. This process narrows the performance gap with state-of-the-art, larger models, without requiring massive datasets or auxiliary models. For example, on a challenging data science coding task, using only 500 samples improved Llama-3-8B’s accuracy from 28.2% to 48.6%, approaching GPT-4’s level. By capitalizing on both successes and mistakes, \textsc{CodeLutra} offers a scalable, efficient path to high-quality code generation, making smaller open-source models more competitive with leading closed-source alternatives.

AAAI Conference 2025 Conference Paper

Depth-Centric Dehazing and Depth-Estimation from Real-World Hazy Driving Video

  • Junkai Fan
  • Kun Wang
  • Zhiqiang Yan
  • Xiang Chen
  • Shangbing Gao
  • Jun Li
  • Jian Yang

In this paper, we study the challenging problem of simultaneously removing haze and estimating depth from real monocular hazy videos. These tasks are inherently complementary: enhanced depth estimation improves dehazing via the atmospheric scattering model (ASM), while superior dehazing contributes to more accurate depth estimation through the brightness consistency constraint (BCC). To tackle these intertwined tasks, we propose a novel depth-centric learning framework that integrates the ASM model with the BCC constraint. Our key idea is that both ASM and BCC rely on a shared depth estimation network. This network simultaneously exploits adjacent dehazed frames to enhance depth estimation via BCC and uses the refined depth cues to more effectively remove haze through ASM. Additionally, we leverage a non-aligned clear video and its estimated depth to independently regularize the dehazing and depth estimation networks. This is achieved by designing two discriminator networks: D_MFIR enhances high-frequency details in dehazed videos, and D_MDR reduces the occurrence of black holes in low-texture regions. Extensive experiments demonstrate that the proposed method outperforms current state-of-the-art techniques in both video dehazing and depth estimation tasks, especially in real-world hazy scenes.

AAAI Conference 2025 Conference Paper

DeRainGS: Gaussian Splatting for Enhanced Scene Reconstruction in Rainy Environments

  • Shuhong Liu
  • Xiang Chen
  • Hongming Chen
  • Quanfeng Xu
  • Mingrui Li

Reconstruction under adverse rainy conditions poses significant challenges due to reduced visibility and the distortion of visual perception. These conditions can severely impair the quality of geometric maps, which is essential for applications ranging from autonomous planning to environmental monitoring. In response to these challenges, this study introduces the novel task of 3D Reconstruction in Rainy Environments (3DRRE), specifically designed to address the complexities of reconstructing 3D scenes under rainy conditions. To benchmark this task, we construct the HydroViews dataset that comprises a diverse collection of both synthesized and real-world scene images characterized by various intensities of rain streaks and raindrops. Furthermore, we propose DeRainGS, the first 3DGS method tailored for reconstruction in adverse rainy environments. Extensive experiments across a wide range of rain scenarios demonstrate that our method delivers state-of-the-art performance, remarkably outperforming existing occlusion-free methods by a large margin.

JMLR Journal 2025 Journal Article

Dynamic Bayesian Learning for Spatiotemporal Mechanistic Models

  • Sudipto Banerjee
  • Xiang Chen
  • Ian Frankenburg
  • Daniel Zhou

We develop an approach for Bayesian learning of spatiotemporal dynamical mechanistic models. Such learning consists of statistical emulation of the mechanistic system that can efficiently interpolate the output of the system from arbitrary inputs. The emulated learner can then be used to train the system from noisy data achieved by melding information from observed data with the emulated mechanistic system. This joint melding of mechanistic systems employ hierarchical state-space models with Gaussian process regression. Assuming the dynamical system is controlled by a finite collection of inputs, Gaussian process regression learns the effect of these parameters through a number of training runs, driving the stochastic innovations of the spatiotemporal state-space component. This enables efficient modeling of the dynamics over space and time. This article details exact inference with analytically accessible posterior distributions in hierarchical matrix-variate Normal and Wishart models in designing the emulator. This step obviates expensive iterative algorithms such as Markov chain Monte Carlo or variational approximations. We also show how emulation is applicable to large-scale emulation by designing a dynamic Bayesian transfer learning framework. Inference on mechanistic model parameters proceeds using Markov chain Monte Carlo as a post-emulation step using the emulator as a regression component. We demonstrate this framework through solving inverse problems arising in the analysis of ordinary and partial nonlinear differential equations and, in addition, to a black-box computer model generating spatiotemporal dynamics across a graphical model. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

AAAI Conference 2025 System Paper

ECLAIR: Enhanced Clarification for Interactive Responses in an Enterprise AI Assistant

  • John Murzaku
  • Zifan Liu
  • Vaishnavi Muppala
  • Md Mehrab Tanjim
  • Xiang Chen
  • Yunyao Li

Large language models (LLMs) have shown remarkable progress in understanding and generating natural language across various applications. However, they often struggle with resolving ambiguities in real-world, enterprise-level interactions, where context and domain-specific knowledge play a crucial role. In this demonstration, we introduce ECLAIR (Enhanced CLArification for Interactive Responses), a multi-agent framework for interactive disambiguation. ECLAIR enhances ambiguous user query clarification through an interactive process where custom agents are defined, ambiguity reasoning is conducted by the agents, clarification questions are generated, and user feedback is leveraged to refine the final response. When tested on real-world customer data, ECLAIR demonstrates significant improvements in clarification question generation compared to standard few-shot methods.

UAI Conference 2025 Conference Paper

Improving Graph Contrastive Learning with Community Structure

  • Xiang Chen
  • Kun Yue
  • Liang Duan
  • Lixing Yu

Graph contrastive learning (GCL) has demonstrated remarkable success in training graph neural networks (GNNs) by distinguishing positive and negative node pairs without human labeling. However, existing GCL methods often suffer from two limitations: the repetitive message-passing mechanism in GNNs and the quadratic computational complexity of exhaustive node pair sampling in loss function. To address these issues, we propose an efficient and effective GCL framework that leverages community structure rather than relying on the intricate node-to-node adjacency information. Inspired by the concept of sparse low-rank approximation of graph diffusion matrices, our model delivers node messages to the corresponding communities instead of individual neighbors. By exploiting community structures, our method significantly improves GCL efficiency by reducing the number of node pairs needed for contrastive loss calculation. Furthermore, we theoretically prove that our model effectively captures essential structure information for downstream tasks. Extensive experiments conducted on real-world datasets illustrate that our method not only achieves the state-of-the-art performance but also substantially reduces time and memory consumption compared with other GCL methods. Our code is available at [https: //github. com/chenx-hi/IGCL-CS](https: //github. com/chenx-hi/IGCL-CS).

TMLR Journal 2025 Journal Article

Interactive Large Language Models for Reliable Answering under Incomplete Context

  • Jing-Cheng Pang
  • Heng-Bo Fan
  • Pengyuan Wang
  • Jia-Hao Xiao
  • Nan Tang
  • Si-Hang Yang
  • Chengxing Jia
  • Ming-Kun Xie

The rise of large language models (LLMs) has revolutionized the way humans interact with artificial intelligence systems. However, their reliability in sensitive applications—such as personal consultations or clinical decision-making—remains limited. A critical shortfall lies in LLMs’ inherent lack of interactivity: these models generate responses even when essential context or domain-specific knowledge is absent, risking inaccurate or misleading outputs. A potential approach to mitigate this issue is to enable LLMs to pose clarifying questions, thereby uncovering the missing information required to provide accurate responses. However, previous methods often tend to greedily prompt LLMs to ask questions. This burdens the user to respond to potentially irrelevant questions and makes the system less flexible. In this paper, we introduce LaMSeI (Language Model with Selective Interaction) method, which enhances LLMs’ ability to judge when interaction is necessary under ambiguous or incomplete contexts. The motivation of LaMSeI is to measure the level of LLMs’ uncertainty about the user query, and interacts with user only when the uncertainty is high. Additionally, we incorporate active learning techniques to select the most informative questions from question candidates, for effectively uncovering the missing context. Our empirical studies, across various challenging question answering benchmarks, where LLMs are posed queries with incomplete context, demonstrate the effectiveness of LaMSeI. The method improves answer accuracy from 31.9% to 50.9%, outperforming other leading question-answering frameworks. Moreover, in experiments involving human participants, LaMSeI consistently generates answers superior to or comparable to baselines in more than 82% of the cases. Moreover, we verify the performance of LaMSeI on various LLMs, such as LLAMA2, LLAMA3, Vicuna and GPT-3.5, highlighting its capability to improve interactive language models.

IJCAI Conference 2025 Conference Paper

K-Buffers: A Plug-in Method for Enhancing Neural Fields with Multiple Buffers

  • Haofan Ren
  • Zunjie Zhu
  • Xiang Chen
  • Ming Lu
  • Rongfeng Lu
  • Chenggang Yan

Neural fields are now the central focus of research in 3D vision and computer graphics. Existing methods mainly focus on various scene representations, such as neural points and 3D Gaussians. However, few works have studied the rendering process to enhance the neural fields. In this work, we propose a plug-in method named K-Buffers that leverages multiple buffers to improve the rendering performance. Our method first renders K buffers from scene representations and constructs K pixel-wise feature maps. Then, We introduce a K-Feature Fusion Network (KFN) to merge the K pixel-wise feature maps. Finally, we adopt a feature decoder to generate the rendering image. We also introduce an acceleration strategy to improve rendering speed and quality. We apply our method to well-known radiance field baselines, including neural point fields and 3D Gaussian Splatting (3DGS). Extensive experiments demonstrate that our method effectively enhances the rendering performance of neural point fields and 3DGS.

AAAI Conference 2025 Conference Paper

Numerical Pruning for Efficient Autoregressive Models

  • Xuan Shen
  • Zhao Song
  • Yufa Zhou
  • Bo Chen
  • Jing Liu
  • Ruiyi Zhang
  • Ryan A. Rossi
  • Hao Tan

Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing. However, their impressive performance often incurs high computational costs due to their substantial model size. This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning to improve the model efficiency while preserving performance for both language and image generation tasks. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and MLP modules, respectively. Besides, we further propose another compensation algorithm to recover the pruned model for better performance. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments. Our experiments show that our method achieves state-of-the-art performance with reduced memory usage and faster generation speeds on GPUs.

TMLR Journal 2025 Journal Article

Personalization of Large Language Models: A Survey

  • Zhehao Zhang
  • Ryan A. Rossi
  • Branislav Kveton
  • Yijia Shao
  • Diyi Yang
  • Hamed Zamani
  • Franck Dernoncourt
  • Joe Barrow

Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications. Despite the importance and recent progress, most existing works on personalized LLMs have focused either entirely on (a) personalized text generation or (b) leveraging LLMs for personalization-related downstream applications, such as recommendation systems. In this work, we bridge the gap between these two separate main directions for the first time by introducing a taxonomy for personalized LLM usage and summarizing the key differences and challenges. We provide a formalization of the foundations of personalized LLMs that consolidates and expands notions of personalization of LLMs, defining and discussing novel facets of personalization, usage, and desiderata of personalized LLMs. We then unify the literature across these diverse fields and usage scenarios by proposing systematic taxonomies for the granularity of personalization, personalization techniques, datasets, evaluation methods, and applications of personalized LLMs. Finally, we highlight challenges and important open problems that remain to be addressed. By unifying and surveying recent research using the proposed taxonomies, we aim to provide a clear guide to the existing literature and different facets of personalization in LLMs, empowering both researchers and practitioners.

NeurIPS Conference 2025 Conference Paper

Rethinking Nighttime Image Deraining via Learnable Color Space Transformation

  • Qiyuan Guan
  • Xiang Chen
  • Guiyue Jin
  • Jiyu Jin
  • Shumin Fan
  • Tianyu Song
  • Jinshan Pan

Compared to daytime image deraining, nighttime image deraining poses significant challenges due to inherent complexities of nighttime scenarios and the lack of high-quality datasets that accurately represent the coupling effect between rain and illumination. In this paper, we rethink the task of nighttime image deraining and contribute a new high-quality benchmark, HQ-NightRain, which offers higher harmony and realism compared to existing datasets. In addition, we develop an effective Color Space Transformation Network (CST-Net) for better removing complex rain from nighttime scenes. Specifically, we propose a learnable color space converter (CSC) to better facilitate rain removal in the Y channel, as nighttime rain is more pronounced in the Y channel compared to the RGB color space. To capture illumination information for guiding nighttime deraining, implicit illumination guidance is introduced enabling the learned features to improve the model's robustness in complex scenarios. Extensive experiments show the value of our dataset and the effectiveness of our method. The source code and datasets are available at https: //github. com/guanqiyuan/CST-Net.

NeurIPS Conference 2024 Conference Paper

Agent Planning with World Knowledge Model

  • Shuofei Qiao
  • Runnan Fang
  • Ningyu Zhang
  • Yuqi Zhu
  • Xiang Chen
  • Shumin Deng
  • Yong Jiang
  • Pengjun Xie

Recent endeavors towards directly using large language models (LLMs) as agent models to execute interactive planning tasks have shown commendable results. Despite their achievements, however, they still struggle with brainless trial-and-error in global planning and generating hallucinatory actions in local planning due to their poor understanding of the "real" physical world. Imitating humans' mental world knowledge model which provides global prior knowledge before the task and maintains local dynamic knowledge during the task, in this paper, we introduce parametric World Knowledge Model (WKM) to facilitate agent planning. Concretely, we steer the agent model to self-synthesize knowledge from both expert and sampled trajectories. Then we develop WKM, providing prior task knowledge to guide the global planning and dynamic state knowledge to assist the local planning. Experimental results on three real-world simulated datasets with Mistral-7B, Gemma-7B, and Llama-3-8B demonstrate that our method can achieve superior performance compared to various strong baselines. Besides, we analyze to illustrate that our WKM can effectively alleviate the blind trial-and-error and hallucinatory action issues, providing strong support for the agent's understanding of the world. Other interesting findings include: 1) our instance-level task knowledge can generalize better to unseen tasks, 2) weak WKM can guide strong agent model planning, and 3) unified WKM training has promising potential for further development.

IJCAI Conference 2024 Conference Paper

Continual Multimodal Knowledge Graph Construction

  • Xiang Chen
  • Jingtian Zhang
  • Xiaohan Wang
  • Ningyu Zhang
  • Tongtong Wu
  • Yuxiang Wang
  • Yongheng Wang
  • Huajun Chen

Current Multimodal Knowledge Graph Construction (MKGC) models struggle with the real-world dynamism of continuously emerging entities and relations, often succumbing to catastrophic forgetting—loss of previously acquired knowledge. This study introduces benchmarks aimed at fostering the development of the continual MKGC domain. We further introduce the MSPT framework, designed to surmount the shortcomings of existing MKGC approaches during multimedia data processing. MSPT harmonizes the retention of learned knowledge (stability) and the integration of new data (plasticity), outperforming current continual learning and multimodal methods. Our results confirm MSPT's superior performance in evolving knowledge environments, showcasing its capacity to navigate the balance between stability and plasticity.

IJCAI Conference 2024 Conference Paper

FactCHD: Benchmarking Fact-Conflicting Hallucination Detection

  • Xiang Chen
  • Duanzheng Song
  • Honghao Gui
  • Chenxi Wang
  • Ningyu Zhang
  • Yong Jiang
  • Fei Huang
  • Chengfei Lyu

Despite their impressive generative capabilities, LLMs are hindered by fact-conflicting hallucinations in real-world applications. The accurate identification of hallucinations in texts generated by LLMs, especially in complex inferential scenarios, is a relatively unexplored area. To address this gap, we present FactCHD, a dedicated benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. A distinctive element of FactCHD is its integration of fact-based evidence chains, significantly enhancing the depth of evaluating the detectors' explanations. Experiments on different LLMs expose the shortcomings of current approaches in detecting factual errors accurately. Furthermore, we introduce TRUTH-TRIANGULATOR which synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2, aiming to yield more credible detection through the amalgamation of predictive results and evidence.

NeurIPS Conference 2024 Conference Paper

Infinite-Dimensional Feature Interaction

  • Chenhui Xu
  • Fuxun Yu
  • Maoliang Li
  • Zihao Zheng
  • Zirui Xu
  • Jinjun Xiong
  • Xiang Chen

The past neural network design has largely focused on feature \textit{representation space} dimension and its capacity scaling (e. g. , width, depth), but overlooked the feature \textit{interaction space} scaling. Recent advancements have shown shifted focus towards element-wise multiplication to facilitate higher-dimensional feature interaction space for better information transformation. Despite this progress, multiplications predominantly capture low-order interactions, thus remaining confined to a finite-dimensional interaction space. To transcend this limitation, classic kernel methods emerge as a promising solution to engage features in an infinite-dimensional space. We introduce InfiNet, a model architecture that enables feature interaction within an infinite-dimensional space created by RBF kernel. Our experiments reveal that InfiNet achieves new state-of-the-art, owing to its capability to leverage infinite-dimensional interactions, significantly enhancing model performance.

ECAI Conference 2024 Conference Paper

Learning A Closed-Loop Bidirectional Scale-Recurrent Network for Image Deraining

  • Peizhou Huang
  • Zixuan Zhong
  • Pengjie Wang 0007
  • Xiangyu Wang
  • Xiang Chen

Recent years have witnessed significant advances in image deraining tasks due to the emergence of numerous effective Transformers and multi-layer perceptron (MLP) models. However, these models still rely on unidirectional information flow and fail to fully exploit the potentially useful information from multiple image scales, thus limiting the robustness of the models in complex rainy scenes. To this end, we develop an effective closed-loop bidirectional scale-recurrent network (called CBS-Net) for image deraining, which organically integrates both Transformer and MLP models to jointly explore multi-scale rain representations. Specifically, we introduce a sparse Transformer block within the intra-scale branch to adaptively capture the most useful content-aware features. Furthermore, we construct a dimensional MLP block within the inter-scale branch to dynamically modulate spatial-aware features from different scales. To ensure more accurate bidirectional estimations in our scale-recurrent network, a simple yet effective feedback propagation block is embedded to perform coarse-to-fine and fine-to-coarse information communication. Extensive experimental results show that our approach achieves state-of-the-art performance on multiple benchmark datasets, demonstrating its effectiveness and scalability.

IJCAI Conference 2024 Conference Paper

Learning a Spiking Neural Network for Efficient Image Deraining

  • Tianyu Song
  • Guiyue Jin
  • Pengpeng Li
  • Kui Jiang
  • Xiang Chen
  • Jiyu Jin

Recently, spiking neural networks (SNNs) have demonstrated substantial potential in computer vision tasks. In this paper, we present an Efficient Spiking Deraining Network, called ESDNet. Our work is motivated by the observation that rain pixel values will lead to a more pronounced intensity of spike signals in SNNs. However, directly applying deep SNNs to image deraining task still remains a significant challenge. This is attributed to the information loss and training difficulties that arise from discrete binary activation and complex spatiotemporal dynamics. To this end, we develop a spiking residual block to convert the input into spike signals, then adaptively optimize the membrane potential by introducing attention weights to adjust spike responses in a data-driven manner, alleviating information loss caused by discrete binary activation. By this way, our ESDNet can effectively detect and analyze the characteristics of rain streaks by learning their fluctuations. This also enables better guidance for the deraining process and facilitates high-quality image reconstruction. Instead of relying on the ANN-SNN conversion strategy, we introduce a gradient proxy strategy to directly train the model for overcoming the challenge of training. Experimental results show that our approach gains comparable performance against ANN-based methods while reducing energy consumption by 54%. The code source is available at https: //github. com/MingTian99/ESDNet.

AAAI Conference 2024 Conference Paper

Rethinking Multi-Scale Representations in Deep Deraining Transformer

  • Hongming Chen
  • Xiang Chen
  • Jiyang Lu
  • Yufeng Li

Existing Transformer-based image deraining methods depend mostly on fixed single-input single-output U-Net architecture. In fact, this not only neglects the potentially explicit information from multiple image scales, but also lacks the capability of exploring the complementary implicit information across different scales. In this work, we rethink the multi-scale representations and design an effective multi-input multi-output framework that constructs intra- and inter-scale hierarchical modulation to better facilitate rain removal and help image restoration. We observe that rain levels reduce dramatically in coarser image scales, thus proposing to restore rain-free results from the coarsest scale to the finest scale in image pyramid inputs, which also alleviates the difficulty of model learning. Specifically, we integrate a sparsity-compensated Transformer block and a frequency-enhanced convolutional block into a coupled representation module, in order to jointly learn the intra-scale content-aware features. To facilitate representations learned at different scales to communicate with each other, we leverage a gated fusion module to adaptively aggregate the inter-scale spatial-aware features, which are rich in correlated information of rain appearances, leading to high-quality results. Extensive experiments demonstrate that our model achieves consistent gains on five benchmarks.

AAAI Conference 2024 Conference Paper

Structural Entropy Based Graph Structure Learning for Node Classification

  • Liang Duan
  • Xiang Chen
  • Wenjie Liu
  • Daliang Liu
  • Kun Yue
  • Angsheng Li

As one of the most common tasks in graph data analysis, node classification is frequently solved by using graph structure learning (GSL) techniques to optimize graph structures and learn suitable graph neural networks. Most of the existing GSL methods focus on fusing different structural features (basic views) extracted from the graph, but very little graph semantics, like hierarchical communities, has been incorporated. Thus, they might be insufficient when dealing with the graphs containing noises from real-world complex systems. To address this issue, we propose a novel and effective GSL framework for node classification based on the structural information theory. Specifically, we first prove that an encoding tree with the minimal structural entropy could contain sufficient information for node classification and eliminate redundant noise via the graph's hierarchical abstraction. Then, we provide an efficient algorithm for constructing the encoding tree to enhance the basic views. Combining the community influence deduced from the encoding tree and the prediction confidence of each view, we further fuse the enhanced views to generate the optimal structure. Finally, we conduct extensive experiments on a variety of datasets. The results demonstrate that our method outperforms the state-of-the-art competitors on effectiveness and robustness.

NeurIPS Conference 2024 Conference Paper

Towards Universal Mesh Movement Networks

  • Mingrui Zhang
  • Chunyang Wang
  • Stephan Kramer
  • Joseph G. Wallwork
  • Siyi Li
  • Jiancheng Liu
  • Xiang Chen
  • Matthew D. Piggott

Solving complex Partial Differential Equations (PDEs) accurately and efficiently is an essential and challenging problem in all scientific and engineering disciplines. Mesh movement methods provide the capability to improve the accuracy of the numerical solution without increasing the overall mesh degree of freedom count. Conventional sophisticated mesh movement methods are extremely expensive and struggle to handle scenarios with complex boundary geometries. However, existing learning-based methods require re-training from scratch given a different PDE type or boundary geometry, which limits their applicability, and also often suffer from robustness issues in the form of inverted elements. In this paper, we introduce the Universal Mesh Movement Network (UM2N), which -- once trained -- can be applied in a non-intrusive, zero-shot manner to move meshes with different size distributions and structures, for solvers applicable to different PDE types and boundary geometries. UM2N consists of a Graph Transformer (GT) encoder for extracting features and a Graph Attention Network (GAT) based decoder for moving the mesh. We evaluate our method on advection and Navier-Stokes based examples, as well as a real-world tsunami simulation case. Our method out-performs existing learning-based mesh movement methods in terms of the benchmarks described above. In comparison to the conventional sophisticated Monge-Ampère PDE-solver based method, our approach not only significantly accelerates mesh movement, but also proves effective in scenarios where the conventional method fails. Our project page can be found at https: //erizmr. github. io/UM2N/.

AAAI Conference 2023 Conference Paper

Hybrid CNN-Transformer Feature Fusion for Single Image Deraining

  • Xiang Chen
  • Jinshan Pan
  • Jiyang Lu
  • Zhentao Fan
  • Hao Li

Since rain streaks exhibit diverse geometric appearances and irregular overlapped phenomena, these complex characteristics challenge the design of an effective single image deraining model. To this end, rich local-global information representations are increasingly indispensable for better satisfying rain removal. In this paper, we propose a lightweight Hybrid CNN-Transformer Feature Fusion Network (dubbed as HCT-FFN) in a stage-by-stage progressive manner, which can harmonize these two architectures to help image restoration by leveraging their individual learning strengths. Specifically, we stack a sequence of the degradation-aware mixture of experts (DaMoE) modules in the CNN-based stage, where appropriate local experts adaptively enable the model to emphasize spatially-varying rain distribution features. As for the Transformer-based stage, a background-aware vision Transformer (BaViT) module is employed to complement spatially-long feature dependencies of images, so as to achieve global texture recovery while preserving the required structure. Considering the indeterminate knowledge discrepancy among CNN features and Transformer features, we introduce an interactive fusion branch at adjacent stages to further facilitate the reconstruction of high-quality deraining results. Extensive evaluations show the effectiveness and extensibility of our developed HCT-FFN. The source code is available at https://github.com/cschenxiang/HCT-FFN.

AAAI Conference 2023 Short Paper

On Analyzing the Role of Image for Visual-Enhanced Relation Extraction (Student Abstract)

  • Lei Li
  • Xiang Chen
  • Shuofei Qiao
  • Feiyu Xiong
  • Huajun Chen
  • Ningyu Zhang

Multimodal relation extraction is an essential task for knowledge graph construction. In this paper, we take an in-depth empirical analysis that indicates the inaccurate information in the visual scene graph leads to poor modal alignment weights, further degrading performance. Moreover, the visual shuffle experiments illustrate that the current approaches may not take full advantage of visual information. Based on the above observation, we further propose a strong baseline with an implicit fine-grained multimodal alignment based on Transformer for multimodal relation extraction. Experimental results demonstrate the better performance of our method. Codes are available at https://github.com/zjunlp/DeepKE/tree/main/example/re/multimodal.

IJCAI Conference 2023 Conference Paper

One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER

  • Xiang Chen
  • Lei Li
  • Shuofei Qiao
  • Ningyu Zhang
  • Chuanqi Tan
  • Yong Jiang
  • Fei Huang
  • Huajun Chen

Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks.

NeurIPS Conference 2023 Conference Paper

Online PCA in Converging Self-consistent Field Equations

  • Xihan Li
  • Xiang Chen
  • Rasul Tutunov
  • Haitham Bou Ammar
  • Lei Wang
  • Jun Wang

Self-consistent Field (SCF) equation is a type of nonlinear eigenvalue problem in which the matrix to be eigen-decomposed is a function of its own eigenvectors. It is of great significance in computational science for its connection to the Schrödinger equation. Traditional fixed-point iteration methods for solving such equations suffer from non-convergence issues. In this work, we present a novel perspective on such SCF equations as a principal component analysis (PCA) for non-stationary time series, in which a distribution and its own top principal components are mutually updated over time, and the equilibrium state of the model corresponds to the solution of the SCF equations. By the new perspective, online PCA techniques are able to engage in so as to enhance the convergence of the model towards the equilibrium state, acting as a new set of tools for converging the SCF equations. With several numerical adaptations, we then develop a new algorithm for converging the SCF equation, and demonstrated its high convergence capacity with experiments on both synthesized and real electronic structure scenarios.

AAAI Conference 2023 Conference Paper

Self-Supervised Interest Transfer Network via Prototypical Contrastive Learning for Recommendation

  • Guoqiang Sun
  • Yibin Shen
  • Sijin Zhou
  • Xiang Chen
  • Hongyan Liu
  • Chunming Wu
  • Chenyi Lei
  • Xianhui Wei

Cross-domain recommendation has attracted increasing attention from industry and academia recently. However, most existing methods do not exploit the interest invariance between domains, which would yield sub-optimal solutions. In this paper, we propose a cross-domain recommendation method: Self-supervised Interest Transfer Network (SITN), which can effectively transfer invariant knowledge between domains via prototypical contrastive learning. Specifically, we perform two levels of cross-domain contrastive learning: 1) instance-to-instance contrastive learning, 2) instance-to-cluster contrastive learning. Not only that, we also take into account users' multi-granularity and multi-view interests. With this paradigm, SITN can explicitly learn the invariant knowledge of interest clusters between domains and accurately capture users' intents and preferences. We conducted extensive experiments on a public dataset and a large-scale industrial dataset collected from one of the world's leading e-commerce corporations. The experimental results indicate that SITN achieves significant improvements over state-of-the-art recommendation methods. Additionally, SITN has been deployed on a micro-video recommendation platform, and the online A/B testing results further demonstrate its practical value. Supplement is available at: https://github.com/fanqieCoffee/SITN-Supplement.

JBHI Journal 2023 Journal Article

TW-Net: Transformer Weighted Network for Neonatal Brain MRI Segmentation

  • Shengjie Zhang
  • Bohan Ren
  • Ziqi Yu
  • Haibo Yang
  • Xiaoyang Han
  • Xiang Chen
  • Yuan Zhou
  • Dinggang Shen

Accurate neonatal brain MRI segmentation is valuable for investigating brain growth patterns and tracking the progression of neurodevelopmental disorders. However, it is a challenging task to use intensity-based methods to segment neonatal brain structures because of small contrast differences between brain regions caused by the inherent myelination process. Although convolutional neural networks offer the potential to segment brain structures in an intensity-independent manner, they suffer from lack of in-plane long-range dependency which is essential for the segmentation. To solve this problem, we propose a novel Transformer-Weighted network (TW-Net) to incorporate in-plane long-range dependency information. TW-Net employs a conventional encoder-decoder architecture with a Transformer module in the middle. The Transformer module uses a rotate-and-flip layer to better calculate the similarity between two patches in a slice to leverage similar patterns of geometrical and texture features within brain structures. In addition, a deep supervision module and squeeze-and-excitation blocks are introduced to incorporate boundary information of brain structures. Compared with state-of-the-art deep learning algorithms, TW-Net outperforms these methods for multiple-label tasks in 2D and 2. 5D configurations on two independent public datasets, demonstrating that TW-Net is a promising method for neonatal brain MRI segmentation.

JBHI Journal 2022 Journal Article

An Effective Photoplethysmography Heart Rate Estimation Framework Integrating Two-Level Denoising Method and Heart Rate Tracking Algorithm Guided by Finite State Machine

  • Jingbin Guo
  • Xiang Chen
  • Jiaqi Zhao
  • Xu Zhang
  • Xun Chen

In order to achieve accurate heart rate (HR) estimation in complex scenes, this paper presents an effective photoplethysmography (PPG) HR estimation framework integrating two-level denoising method and HR tracking algorithm guided by finite state machine (FSM). Aiming at solving the problems of low signal-to-noise ratio and co-frequency (the noise frequency is close to the HR frequency) caused by motion artifacts, the two-level denoising method consisting of the cascaded adaptive filtering and the differential denoising guided by FSM are designed to remove motion-related noises in PPG signals. In order to solve the problem of HR tracking error caused by poor wrist contact, the HR tracking algorithm guided by FSM is proposed to obtain the global optimization capability. The results of HR estimation experiments conducted on the IEEE Signal Processing Cup database and the WeData database created by ourselves show that the proposed framework can effectively cope with the problems of low signal-to-noise ratio and co-frequency. Even if tracking errors occur due to poor wristband contact, the proposed HR tracking algorithm guided by FSM can correct them in time when the HR component appears again. The average absolute error of HR estimation on the two databases are 1. 76 BPM (beats per minute) and 2. 77 BPM, respectively, which is more accurate compared to other algorithms.

NeurIPS Conference 2022 Conference Paper

Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning

  • Xiang Chen
  • Lei Li
  • Ningyu Zhang
  • Xiaozhuan Liang
  • Shumin Deng
  • Chuanqi Tan
  • Fei Huang
  • Luo Si

Prompt learning approaches have made waves in natural language processing by inducing better few-shot performance while they still follow a parametric-based learning paradigm; the oblivion and rote memorization problems in learning may encounter unstable generalization issues. Specifically, vanilla prompt learning may struggle to utilize atypical instances by rote during fully-supervised training or overfit shallow patterns with low-shot data. To alleviate such limitations, we develop RetroPrompt with the motivation of decoupling knowledge from memorization to help the model strike a balance between generalization and memorization. In contrast with vanilla prompt learning, RetroPrompt constructs an open-book knowledge-store from training instances and implements a retrieval mechanism during the process of input, training and inference, thus equipping the model with the ability to retrieve related contexts from the training corpus as cues for enhancement. Extensive experiments demonstrate that RetroPrompt can obtain better performance in both few-shot and zero-shot settings. Besides, we further illustrate that our proposed RetroPrompt can yield better generalization abilities with new datasets. Detailed analysis of memorization indeed reveals RetroPrompt can reduce the reliance of language models on memorization; thus, improving generalization for downstream tasks. Code is available in https: //github. com/zjunlp/PromptKG/tree/main/research/RetroPrompt.

IJCAI Conference 2022 Conference Paper

Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data

  • Tian Li
  • Xiang Chen
  • Zhen Dong
  • Kurt Keutzer
  • Shanghang Zhang

Domain adaptive text classification is a challenging problem for the large-scale pretrained language models because they often require expensive additional labeled data to adapt to new domains. Existing works usually fails to leverage the implicit relationships among words across domains. In this paper, we propose a novel method, called Domain Adaptation with Structured Knowledge (DASK), to enhance domain adaptation by exploiting word-level semantic relationships. DASK first builds a knowledge graph to capture the relationship between pivot terms (domain-independent words) and non-pivot terms in the target domain. Then during training, DASK injects pivot-related knowledge graph information into source domain texts. For the downstream task, these knowledge-injected texts are fed into a BERT variant capable of processing knowledge-injected textual data. Thanks to the knowledge injection, our model learns domain-invariant features for non-pivots according to their relationships with pivots. DASK ensures the pivots to have domain-invariant behaviors by dynamically inferring via the polarity scores of candidate pivots during training with pseudo-labels. We validate DASK on a wide range of cross-domain sentiment classification tasks and observe up to 2. 9% absolute performance improvement over baselines for 20 different domain pairs. Code is available at https: //github. com/hikaru-nara/DASK.

NeurIPS Conference 2022 Conference Paper

M2N: Mesh Movement Networks for PDE Solvers

  • Wenbin Song
  • Mingrui Zhang
  • Joseph G Wallwork
  • Junpeng Gao
  • Zheng Tian
  • Fanglei Sun
  • Matthew Piggott
  • Junqing Chen

Numerical Partial Differential Equation (PDE) solvers often require discretizing the physical domain by using a mesh. Mesh movement methods provide the capability to improve the accuracy of the numerical solution without introducing extra computational burden to the PDE solver, by increasing mesh resolution where the solution is not well-resolved, whilst reducing unnecessary resolution elsewhere. However, sophisticated mesh movement methods, such as the Monge-Ampère method, generally require the solution of auxiliary equations. These solutions can be extremely expensive to compute when the mesh needs to be adapted frequently. In this paper, we propose to the best of our knowledge the first learning-based end-to-end mesh movement framework for PDE solvers. Key requirements of learning-based mesh movement methods are: alleviating mesh tangling, boundary consistency, and generalization to mesh with different resolutions. To achieve these goals, we introduce the neural spline model and the graph attention network (GAT) into our models respectively. While the Neural-Spline based model provides more flexibility for large mesh deformation, the GAT based model can handle domains with more complicated shapes and is better at performing delicate local deformation. We validate our methods on stationary and time-dependent, linear and non-linear equations, as well as regularly and irregularly shaped domains. Compared to the traditional Monge-Ampère method, our approach can greatly accelerate the mesh adaptation process by three to four orders of magnitude, whilst achieving comparable numerical error reduction.

NeurIPS Conference 2022 Conference Paper

Relation-Constrained Decoding for Text Generation

  • Xiang Chen
  • Zhixian Yang
  • Xiaojun Wan

The dominant paradigm for neural text generation nowadays is seq2seq learning with large-scale pretrained language models. However, it is usually difficult to manually constrain the generation process of these models. Prior studies have introduced Lexically Constrained Decoding (LCD) to ensure the presence of pre-specified words or phrases in the output. However, simply applying lexical constraints has no guarantee of the grammatical or semantic relations between words. Thus, more elaborate constraints are needed. To this end, we first propose a new constrained decoding scenario named Relation-Constrained Decoding (RCD), which requires the model's output to contain several given word pairs with respect to the given relations between them. For this scenario, we present a novel plug-and-play decoding algorithm named RElation-guided probability Surgery and bEam ALlocation (RESEAL), which can handle different categories of relations, e. g. , syntactical relations or factual relations. Moreover, RESEAL can adaptively "reseal" the relations to form a high-quality sentence, which can be applied to the inference stage of any autoregressive text generation model. To evaluate our method, we first construct an RCD benchmark based on dependency relations from treebanks with annotated dependencies. Experimental results demonstrate that our approach can achieve better preservation of the input dependency relations compared to previous methods. To further illustrate the effectiveness of RESEAL, we apply our method to three downstream tasks: sentence summarization, fact-based text editing, and data-to-text generation. We observe an improvement in generation quality. The source code is available at https: //github. com/CasparSwift/RESEAL.

JMLR Journal 2021 Journal Article

COKE: Communication-Censored Decentralized Kernel Learning

  • Ping Xu
  • Yue Wang
  • Xiang Chen
  • Zhi Tian

This paper studies the decentralized optimization and learning problem where multiple interconnected agents aim to learn an optimal decision function defined over a reproducing kernel Hilbert space by jointly minimizing a global objective function, with access to their own locally observed dataset. As a non-parametric approach, kernel learning faces a major challenge in distributed implementation: the decision variables of local objective functions are data-dependent and thus cannot be optimized under the decentralized consensus framework without any raw data exchange among agents. To circumvent this major challenge, we leverage the random feature (RF) approximation approach to enable consensus on the function modeled in the RF space by data-independent parameters across different agents. We then design an iterative algorithm, termed DKLA, for fast-convergent implementation via ADMM. Based on DKLA, we further develop a communication-censored kernel learning (COKE) algorithm that reduces the communication load of DKLA by preventing an agent from transmitting at every iteration unless its local updates are deemed informative. Theoretical results in terms of linear convergence guarantee and generalization performance analysis of DKLA and COKE are provided. Comprehensive tests on both synthetic and real datasets are conducted to verify the communication efficiency and learning effectiveness of COKE. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

IJCAI Conference 2021 Conference Paper

Document-level Relation Extraction as Semantic Segmentation

  • Ningyu Zhang
  • Xiang Chen
  • Xin Xie
  • Shumin Deng
  • Chuanqi Tan
  • Mosha Chen
  • Fei Huang
  • Luo Si

Document-level relation extraction aims to extract relations among multiple entity pairs from a document. Previously proposed graph-based or transformer-based models utilize the entities independently, regardless of global information among relational triples. This paper approaches the problem by predicting an entity-level relation matrix to capture local and global information, parallel to the semantic segmentation task in computer vision. Herein, we propose a Document U-shaped Network for document-level relation extraction. Specifically, we leverage an encoder module to capture the context information of entities and a U-shaped segmentation module over the image-style feature map to capture global interdependency among triples. Experimental results show that our approach can obtain state-of-the-art performance on three benchmark datasets DocRED, CDR, and GDA.

JBHI Journal 2021 Journal Article

Hand Gesture Recognition based on Surface Electromyography using Convolutional Neural Network with Transfer Learning Method

  • Xiang Chen
  • Yu Li
  • Ruochen Hu
  • Xu Zhang
  • Xun Chen

This paper presents an effective transfer learning (TL) strategy for the realization of surface electromyography (sEMG)-based gesture recognition with high generalization and low training burden. To realize the idea of taking a well-trained model as the feature extractor of the target networks, 30 hand gestures involving various states of finger joints, elbow joint and wrist joint are selected to compose the source task, and a convolutional neural network (CNN)-based source network is designed and trained as the general gesture EMG feature extraction network. Then, two types of target networks, in the forms of CNN-only and CNN+LSTM (long short-term memory) respectively, are designed with the same CNN architecture as the feature extraction network. Finally, gesture recognition experiments on three different target gesture datasets are carried out under TL and Non-TL strategies respectively. The experimental results verify the validity of the proposed TL strategy in improving hand gesture recognition accuracy and reducing training burden. For both the CNN-only and the CNN+LSTM target networks, on the three target datasets from new users, new gestures and different collection scheme, the proposed TL strategy improves the recognition accuracy by 10%~38%, reduces the training time to tens of times, and guarantees the recognition accuracy of more than 90% when only 2 repetitions of each gesture are used to fine-tune the parameters of target networks. The proposed TL strategy has important application value for promoting the development of myoelectric control systems.

JBHI Journal 2020 Journal Article

Exploration of Chinese Sign Language Recognition Using Wearable Sensors Based on Deep Belief Net

  • Yi Yu
  • Xiang Chen
  • Shuai Cao
  • Xu Zhang
  • Xun Chen

In this paper, deep belief net (DBN) was applied into the field of wearable-sensor based Chinese sign language (CSL) recognition. Eight subjects were involved in the study, and all of the subjects finished a five-day experiment performing CSL on a target word set consisting of 150 CSL subwords. During the experiment, surface electromyography (sEMG), accelerometer (ACC), and gyroscope (GYRO) signals were collected from the participants. In order to obtain the optimal structure of the network, three different sensor fusion strategies, including data-level fusion, feature-level fusion, and decision-level fusion, were explored. In addition, for the feature-level fusion strategy, two different feature sources, which are hand-crafted features and network generated features, and two different network structures, which are fully-connected net and DBN, were also compared. The result showed that feature level fusion could achieve the best recognition accuracy among the three fusion strategies, and feature-level fusion with network generated features and DBN could achieve the best recognition accuracy. The best recognition accuracy realized in this study was 95. 1% for the user-dependent test and 88. 2% for the user-independent test. The significance of the study is that it applied the deep learning method into the field of wearable sensors-based CSL recognition, and according to our knowledge it's the first study comparing human engineered features with the network generated features in the correspondent field. The results from the study shed lights on the method of using network-generated features during sensor fusion and CSL recognition.

JBHI Journal 2020 Journal Article

miRTMC: A miRNA Target Prediction Method Based on Matrix Completion Algorithm

  • Hui Jiang
  • Mengyun Yang
  • Xiang Chen
  • Min Li
  • Yaohang Li
  • Jianxin Wang

microRNAs (miRNAs) are small non-coding RNAs which modulate the stability of gene targets and their rates of translation into proteins at transcriptional level and post-transcriptional level. miRNA dysfunctions can lead to human diseases because of dysregulation of their targets. Correct miRNA target prediction will lead to better understanding of the mechanisms of human diseases and provide hints on curing them. In recent years, computational miRNA target prediction methods have been proposed according to the interaction rules between miRNAs and targets. However, these methods suffer from high false positive rates due to the complicated relationship between miRNAs and their targets. The rapidly growing number of experimentally validated miRNA targets enables predicting miRNA targets with high precision via accurate data analysis. Taking advantage of these known miRNA targets, a novel recommendation system model (miRTMC) for miRNA target prediction is established using a new matrix completion algorithm. In miRTMC, a heterogeneous network is constructed by integrating the miRNA similarity network, the gene similarity network, and the miRNA-gene interaction network. Our assumption is that the latent factors determining whether a gene is the target of miRNA or not are highly correlated, i. e. , the adjacency matrix of the heterogeneous network is low-rank, which is then completed by using a nuclear norm regularized linear least squares model under non-negative constraints. Alternating direction method of multipliers (ADMM) is adopted to numerically solve the matrix completion problem. Our results show that miRTMC outperforms the competing methods in terms of various evaluation metrics. Our software package is available at https://github.com/hjiangcsu/miRTMC.

AAAI Conference 2020 Conference Paper

PSENet: Psoriasis Severity Evaluation Network

  • Yi Li
  • Zhe Wu
  • Shuang Zhao
  • Xian Wu
  • Yehong Kuang
  • YangTian Yan
  • Shen Ge
  • Kai Wang

Psoriasis is a chronic skin disease which affects hundreds of millions of people around the world. This disease cannot be fully cured and requires lifelong caring. If the deterioration of Psoriasis is not detected and properly treated in time, it could cause serious complications or even lead to a life threat. Therefore, a quantitative measurement that can track the Psoriasis severity is necessary. Currently, PASI (Psoriasis Area and Severity Index) is the most frequently used measurement in clinical practices. However, PASI has the following disadvantages: (1) Time consuming: calculating PASI usually takes more than 30 minutes which poses a heavy burden on dermatologists; and (2) Inconsistency: due to the complexity of PASI calculation, different or even the same dermatologist could give different scores for the same case. To overcome these drawbacks, we propose PSENet which applies deep neural networks to estimate Psoriasis severity based on skin lesion images. Different from typical deep learning frameworks for image processing, PSENet has the following characteristics: (1) PSENet introduces a score re- fine module which is able to capture the visual features of skin at both coarse and fine-grained granularities; (2) PSENet uses siamese structure in training and accepts pairwise inputs, which reduces the dependency on large amount of training data; and (3) PSENet can not only estimate the severity, but also locate the skin lesion regions from the input image. To train and evaluate PSENet, we work with professional dermatologists from a top hospital and spend years in building a golden dataset. The experimental results show that PSENet can achieve the mean absolute error of 2. 21 and the accuracy of 77. 87% in pair comparison, outperforming baseline methods. Overall, PSENet not only relieves dermatologists from the dull PASI calculation but also enables patients to track Psoriasis severity in a much more convenient manner.

IJCAI Conference 2019 Conference Paper

Interpreting and Evaluating Neural Network Robustness

  • Fuxun Yu
  • Zhuwei Qin
  • Chenchen Liu
  • Liang Zhao
  • Yanzhi Wang
  • Xiang Chen

Recently, adversarial deception becomes one of the most considerable threats to deep neural networks. However, compared to extensive research in new designs of various adversarial attacks and defenses, the neural networks' intrinsic robustness property is still lack of thorough investigation. This work aims to qualitatively interpret the adversarial attack and defense mechanisms through loss visualization, and establish a quantitative metric to evaluate the model's intrinsic robustness. The proposed robustness metric identifies the upper bound of a model's prediction divergence in the given domain and thus indicates whether the model can maintain a stable prediction. With extensive experiments, our metric demonstrates several advantages over conventional testing accuracy based robustness estimation: (1) it provides a uniformed evaluation to models with different structures and parameter scales; (2) it over-performs conventional accuracy based robustness evaluation and provides a more reliable evaluation that is invariant to different test settings; (3) it can be fast generated without considerable testing cost.

JBHI Journal 2017 Journal Article

Chinese Sign Language Recognition Based on an Optimized Tree-Structure Framework

  • Xidong Yang
  • Xiang Chen
  • Xiang Cao
  • Shengjing Wei
  • Xu Zhang

Chinese Sign Language (CSL) subword recognition based on surface electromyography (sEMG), accelerometer (ACC), and gyroscope (GYRO) sensors was explored in this paper. In order to fuse effectively the information of these three kinds of sensors, the classification abilities of sEMG, ACC, GYRO, and their combinations in three common sign components (one or two handed, hand orientation, and hand amplitude) were evaluated first and then an optimized tree-structure classification framework was proposed for CSL subword recognition. Eight subjects participated in this study and recognition experiments under different testing conditions were implemented on a target set consisting of 150 CSL subwords. The proposed optimized tree-structure classification framework based on sEMG, ACC, and GYRO obtained the best performance among seven different testing conditions with single sensor, paired-sensor fusion, and three-sensor fusion, and the overall recognition accuracies of 94. 31% and 87. 02% were obtained for 150 CSL subwords in a user-specific test and user-independent test, respectively. Our study could lay a basis for the implementation of large-vocabulary sign language recognition system based on sEMG, ACC, and GYRO sensors.

JBHI Journal 2017 Journal Article

EMG-Torque Relation in Chronic Stroke: A Novel EMG Complexity Representation With a Linear Electrode Array

  • Xu Zhang
  • Dongqing Wang
  • Zaiyang Yu
  • Xiang Chen
  • Sheng Li
  • Ping Zhou

This study examines the electromyogram (EMG)—torque relation for chronic stroke survivors using a novel EMG complexity representation. Ten stroke subjects performed a series of submaximal isometric elbow flexion tasks using their affected and contralateral arms, respectively, while a 20-channel linear electrode array was used to record surface EMG from the biceps brachii muscles. The sample entropy (SampEn) of surface EMG signals was calculated with both global and local tolerance schemes. A regression analysis was performed between SampEn of each channel's surface EMG and elbow flexion torque. It was found that a linear regression can be used to well describe the relation between surface EMG SampEn and the torque. Each channel's root mean square (RMS) amplitude of surface EMG signal in the different torque level was computed to determine the channel with the highest EMG amplitude. The slope of the regression (observed from the channel with the highest EMG amplitude) was smaller on the impaired side than on the nonimpaired side in 8 of the 10 subjects, regardless of the tolerance scheme (global or local) and the range of torques (full or matched range) used for comparison. The surface EMG signals from the channels above the estimated muscle innervation zones demonstrated significantly lower levels of complexity compared with other channels between innervation zones and muscle tendons. The study provides a novel point of view of the EMG-torque relation in the complexity domain, and reveals its alterations post stroke, which are associated with complex neural and muscular changes post stroke. The slope difference between channels with regard to innervation zones also confirms the relevance of electrode position in surface EMG analysis.

JBHI Journal 2016 Journal Article

A Prototype of Reflection Pulse Oximeter Designed for Mobile Healthcare

  • Zhiyuan Lu
  • Xiang Chen
  • Zhongfei Dong
  • Zhangyan Zhao
  • Xu Zhang

This paper introduces a pulse oximeter prototype designed for mobile healthcare. In this prototype, a reflection pulse oximeter is embedded into the back cover of a smart handheld device to offer the convenient measurement of both heart rate (HR) and SpO 2 (estimation of arterial oxygen saturation) for home or mobile applications. Novel and miniaturized circuit modules including a chopper network and a filtering amplifier were designed to overcome the influence of ambient light and interferences that are caused by embedding the sensor into a flat cover. A method based on adaptive trough detection for improved HR and SpO 2 estimation is proposed with appropriate simplification for its implementation on mobile devices. A fast and effective photoplethysmogram validation scheme is also proposed. Clinical experiments have been carried out to calibrate and test our oximeter. Our prototype oximeter can achieve comparable performance to a clinical oximeter with no significant difference revealed by paired t -tests (p = 0. 182 for SpO 2 measurement and p = 0. 496 for HR measurement). The design of this pulse oximeter will facilitate fast and convenient measurement of SpO 2 for mobile healthcare.

JBHI Journal 2013 Journal Article

A Framework for Daily Activity Monitoring and Fall Detection Based on Surface Electromyography and Accelerometer Signals

  • Juan Cheng
  • Xiang Chen
  • Minfen Shen

As an essential branch of context awareness, activity awareness, especially daily activity monitoring and fall detection, is important to healthcare for the elderly and patients with chronic diseases. In this paper, a framework for activity awareness using surface electromyography and accelerometer (ACC) signals is proposed. First, histogram negative entropy was employed to determine the start- and end-points of static and dynamic active segments. Then, the angle of each ACC axis was calculated to indicate body postures, which assisted with sorting dynamic activities into two categories: dynamic gait activities and dynamic transition ones, by judging whether the pre- and post-postures are both standing. Next, the dynamic gait activities were identified by the double-stream hidden Markov models. Besides, the dynamic transition activities were distinguished into normal transition activities and falls by resultant ACC amplitude. Finally, a continuous daily activity monitoring and fall detection scheme was performed with the recognition accuracy over 98%, demonstrating the excellent fall detection performance and the great feasibility of the proposed method in daily activities awareness.

JBHI Journal 2013 Journal Article

Multiscale Entropy Analysis of Different Spontaneous Motor Unit Discharge Patterns

  • Xu Zhang
  • Xiang Chen
  • P. E. Barkhaus
  • Ping Zhou

This study explores a novel application of multiscale entropy (MSE) analysis for characterizing different patterns of spontaneous electromyogram (EMG) signals including sporadic, tonic and repetitive spontaneous motor unit discharges, and normal surface EMG baseline. Two algorithms for MSE analysis, namely, the standard MSE and the intrinsic mode entropy (IMEn) (based on the recently developed multivariate empirical mode decomposition method), were applied to different patterns of spontaneous EMG. Significant differences were observed in multiple scales of the standard MSE and IMEn analyses ( p <; 0. 001) for any two of the spontaneous EMG patterns, while such significance may not be observed from the single-scale entropy analysis. Compared to the standard MSE, the IMEn analysis facilitates usage of a relatively low scale number to discern entropy difference among various patterns of spontaneous EMG signals. The findings from this study contribute to our understanding of the nonlinear dynamic properties of different spontaneous EMG patterns, which may be related to spinal motoneuron or motor unit health.

ICRA Conference 2011 Conference Paper

Dual mode predictive control for ultrafast piezoelectric nanopositioning stages

  • Hai-Tao Zhang
  • Xiang Chen
  • Zhiyong Chen 0001

Precision control of piezoelectric motor nanopositioning stages is widely used in a variety of nano-manufacturing equipments. But due to the hysteresis nonlinearity with input saturation, it is challenging to design an ultrafast output feedback controller with large region of closed-loop stability. To address this problem, we developed a dual-mode nonlinear model predictive control (NMPC) method, in which an optimal input profile found by solving an open-loop optimal control problem drives the nonlinear system state into the terminal invariant set; afterwards a linear output-feedback controller steers the state to the origin asymptotically. In contrast to the classical output-feedback controller, the settling time is effectively decreased and the closed-loop stable region is substantially increased by the present NMPC with almost no loss of the nanopositioning accuracy. Finally, the feasibility and superiority of the proposed switching control method are examined by extensive experiments on a Physik Instrumente P-563. 3CL triple-axis nanopositioning stage.