Author name cluster

Yang Xiang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers

2 author rows

AAAI Conference 2026 Conference Paper

CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation

Yexing Du
Kaiyuan Liu
Youcheng Pan
Zheng Chu
Bo Yang
Xiaocheng Feng
Ming Liu
Yang Xiang

As Large Language Models (LLMs) are increasingly popularized in the multilingual world, ensuring hallucination-free factuality becomes markedly crucial. However, existing benchmarks for evaluating the reliability of Multimodal Large Language Models (MLLMs) predominantly focus on textual or visual modalities with a primary emphasis on English, which creates a gap in evaluation when processing multilingual input, especially in speech. To bridge this gap, we propose a novel Cross-lingual and Cross-modal Factuality benchmark (CCFQA). Specifically, the CCFQA benchmark contains parallel speech-text factual questions across 8 languages, designed to systematically evaluate MLLMs' cross-lingual and cross-modal factuality capabilities. Our experimental results demonstrate that current MLLMs still face substantial challenges on the CCFQA benchmark. Furthermore, we propose a few-shot transfer learning strategy that effectively transfers the Question Answering (QA) capabilities of LLMs in English to multilingual Spoken Question Answering (SQA) tasks, achieving competitive performance with GPT-4o-mini-Audio using just 5-shot training. We release CCFQA as a foundational research resource to promote the development of MLLMs with more robust and reliable speech understanding capabilities.

PDF Details DOI

AAAI Conference 2026 Conference Paper

The Visual Prism: Refracting Images into Parallel Multilingual Descriptions with Structured Visual Guidance

Chengpeng Fu
Xiaocheng Feng
Yichong Huang
Wenshuai Huo
Baohang Li
Yang Xiang
Ting Liu

Parallel corpora, as the foundation of machine translation, remain crucial even in the era of large language models (LLMs) for pre-training and fine-tuning. However, annotating parallel corpora is extremely costly, as it requires annotators to be proficient in multiple languages. To reduce this cost, prior work has explored image-pivoted corpus synthesis, generating multilingual captions for the same image as pseudo-parallel data. Unfortunately, these pseudo corpora suffer from the serious issue of multilingual focus divergence, i.e., the model attending to distinct aspects of the image when generating captions in different languages. To address this problem, we propose a method called PRISMS (Parallel Refracting ImageS into Multilingual descriptions with Structured visual guidance), which leverages semantic graphs as structured visual guidance to unify the focus of multilingual captions. To ensure adherence to this guidance, we introduce two key techniques: supervised fine-tuning using self-generated instructional data, and reinforcement learning with a reward signal based on semantic graph consistency. Experimental results on five languages show that our PRISMS significantly improves the image-pivot parallel corpora synthesis, enabling LLMs to achieve translation performance comparable to that of models trained on manually annotated corpora.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TweezeEdit: Consistent and Efficient Image Editing with Path Regularization

Jianda Mao
Kaibo Wang
Yang Xiang
Kani Chen

Recent progress in training-free image editing has enabled existing text-to-image diffusion models to be directly adapted into text-guided image editors without additional training. However, existing methods often over-align with target prompts while inadequately preserving source image semantics. These approaches generate target images explicitly or implicitly from the inversion noise of the source images, termed the inversion anchors. We identify this strategy as suboptimal for semantic preservation and inefficient due to elongated editing paths. We propose TweezeEdit, a tuning- and inversion-free framework for consistent and efficient image editing. Our method addresses these limitations by regularizing the entire denoising path rather than relying solely on the inversion anchors, ensuring source semantic retention and shortening editing paths. Guided by gradient-driven regularization, we efficiently inject target prompt semantics along a direct path using a consistency model. Extensive experiments demonstrate TweezeEdit's superior performance in semantic preservation and target alignment, outperforming existing methods. Remarkably, it requires only 12 steps (1.6 seconds per edit), underscoring its potential for real-time applications. The appendix is available in the extended version.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TWINFUZZ: Dual-Model Fuzzing for Robustness Generalization in Deep Learning

Enze Dai
Wentao Mo
Kun Hu
Xiaogang Zhu
Xi Xiao
Sheng Wen
Shaohua Wang
Yang Xiang

Deep learning (DL) models are increasingly deployed in safety-critical applications such as face recognition, autonomous driving, and medical diagnosis. Despite their impressive accuracy, they remain vulnerable to adversarial examples - subtle perturbations that can cause incorrect predictions, i.e., the robustness issues. While adversarial training improves robustness against known attacks, it often fails to generalize to unseen or stronger threats, revealing a critical gap in robustness generalization. In this work, we propose a dual-model fuzzing framework to enhance generalized robustness in DL models. Central to our method is a lightweight metric, the Lagrangian Information Bottleneck (LIB), which guides entropy-based mutation toward semantically meaningful and high-risk regions of the input space. The executor uses a resistant model and a more error-prone vulnerable model; their prediction consistency forms the basis of agreement mining, a label-free oracle for isolating decision-boundary samples. To ensure fuzzing effectiveness, we further introduce a task-driven seed selection strategy (e.g., SSIM for vision) that filters out low-quality inputs. We implement a prototype, TWINFUZZ, and evaluate it on six benchmark datasets and nine DL models. Compared with state-of-the-art testing approaches, TWINFUZZ achieves superior improvements in both training-specific and generalized robustness.

PDF Details DOI

JBHI Journal 2025 Journal Article

A Lightweight Privacy-Preserving Federated Learning Framework for Heterogeneity-Resilient Skin Cancer Diagnosis

Junyu Lin
Jiageng Chen
Jichao Xiong
Dian Jiao
Weizhong Zhao
Yang Xiang

Machine Learning (ML) demonstrates dermatologist-level accuracy in skin cancer diagnosis, yet its practical adoption is constrained by data silos and privacy issues. While Federated Learning (FL) addresses these limitations, it remains susceptible to data heterogeneity and gradient leakage attacks. To overcome these challenges, we introduce a privacy-preserving FL framework tailored for encrypted dermoscopic image analysis. Our proposed framework integrates a Fully Homomorphic Encryption (FHE)-enabled variant of Stochastic Controlled Averaging (SCA), enhancing model convergence with Non-IID data. To further minimize computational and communication overhead, we develop a layer-wise Packed FHE (PFHE) approach that improves the efficiency of encrypted model aggregation. Moreover, we design a lightweight, FHE-Friendly Deep Neural Network (DNN) optimized for encrypted inference. This architecture incorporates a DO-EncConv module specifically engineered to balance inference efficiency and precision within FHE computational constraints. Experimental results on the HAM10000 and ISIC2019 datasets confirm the effectiveness of our proposed framework, demonstrating F1-Score improvements of 2. 2% and 4. 0%, respectively, over baseline FL approaches. Additionally, our method achieves communication overhead reductions of 94. 85% and 93. 48%, while encrypted inference is performed in approximately 17. 8 seconds per sample, with less than 2% accuracy degradation compared to centralized plaintext models. These outcomes underscore the framework's practicality and effectiveness for secure, scalable clinical deployment.

Details DOI

IJCAI Conference 2025 Conference Paper

A Survey on the Feedback Mechanism of LLM-based AI Agents

Zhipeng Liu
Xuefeng Bai
Kehai Chen
Xinyang Chen
Xiucheng Li
Yang Xiang
Jin Liu
Hong-Dong Li

Large language models (LLMs) are increasingly being adopted to develop general-purpose AI agents. However, it remains challenging for these LLM-based AI agents to efficiently learn from feedback and iteratively optimize their strategies. To address this challenge, tremendous efforts have been dedicated to designing diverse feedback mechanisms for LLM-based AI agents. To provide a comprehensive overview of this rapidly evolving field, this paper presents a systematic review of these studies, offering a holistic perspective on the feedback mechanisms in LLM-based AI agents. We begin by discussing the construction of LLM-based AI agents, introducing a generalized framework that encapsulates much of the existing work. Next, we delve into the exploration of feedback mechanisms, categorizing them into four distinct types: internal feedback, external feedback, multi-agent feedback, and human feedback. Additionally, we provide an overview of evaluation protocols and benchmarks specifically tailored for LLM-based AI agents. Finally, we highlight the significant challenges and identify potential directions for future studies. The relevant papers are summarized and will be consistently updated at https: //github. com/kevinson7515/Agents-Feedback-Mechanisms.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Exploring the Translation Mechanism of Large Language Models

Hongbin Zhang
Kehai Chen
Xuefeng Bai
Xiucheng Li
Yang Xiang
Min Zhang

While large language models (LLMs) demonstrate remarkable success in multilingual translation, their internal core translation mechanisms, even at the fundamental word level, remain insufficiently understood. To address this critical gap, this work introduces a systematic framework for interpreting the mechanism behind LLM translation from the perspective of computational components. This paper first proposes subspace-intervened path patching for precise, fine-grained causal analysis, enabling the detection of components crucial to translation tasks and subsequently characterizing their behavioral patterns in human-interpretable terms. Comprehensive experiments reveal that translation is predominantly driven by a sparse subset of components: specialized attention heads serve critical roles in extracting source language, translation indicators, and positional features, which are then integrated and processed by specific multi-layer perceptrons (MLPs) into intermediary English-centric latent representations before ultimately yielding the final translation. The significance of these findings is underscored by the empirical demonstration that targeted fine-tuning a minimal parameter subset (<5%) enhances translation performance while preserving general capabilities. This result further indicates that these crucial components generalize effectively to sentence-level translation and are instrumental in elucidating more intricate translation tasks.

PDF Details

JBHI Journal 2025 Journal Article

Hypercomplex Graph Neural Network: Towards Deep Intersection of Multi-Modal Brain Networks

Yanwu Yang
Chenfei Ye
Guoqing Cai
Kunru Song
Jintao Zhang
Yang Xiang
Ting Ma

The multi-modal neuroimage study has provided insights into understanding the heteromodal relationships between brain network organization and behavioral phenotypes. Integrating data from various modalities facilitates the characterization of the interplay among anatomical, functional, and physiological brain alterations or developments. Graph Neural Networks (GNNs) have recently become popular in analyzing and fusing multi-modal, graph-structured brain networks. However, effectively learning complementary representations from other modalities remains a significant challenge due to the sophisticated and heterogeneous inter-modal dependencies. Furthermore, most existing studies often focus on specific modalities (e. g. , only fMRI and DTI), which limits their scalability to other types of brain networks. To overcome these limitations, we propose a HyperComplex Graph Neural Network (HC-GNN) that models multi-modal networks as hypercomplex tensor graphs. In our approach, HC-GNN is conceptualized as a dynamic spatial graph, where the attentively learned inter-modal associations are represented as the adjacency matrix. HC-GNN leverages hypercomplex operations for inter-modal intersections through cross-embedding and cross-aggregation, enriching the deep coupling of multi-modal representations. We conduct a statistical analysis on the saliency maps to associate disease biomarkers. Extensive experiments on three datasets demonstrate the superior classification performance of our method and its strong scalability to various types of modalities. Our work presents a powerful paradigm for the study of multi-modal brain networks.

Details DOI

EAAI Journal 2025 Journal Article

Reconstructing three-dimensional conductivity distribution of in-situ maize ears using frequency-enhanced residual encoder-decoder network

Hai-Ying Zheng
Bing-Zhou Chen
Yang Xiang
Ke-Lei Xia
Yu-Bin Lin
Jin-Hang Liu
Jia-Jia Zhang
Liu-Deng Zhang

Monitoring water content distribution in in-situ maize ears is crucial for agricultural cultivation and crop research, with impedance dynamics correlated to water content changes. However, the combined effects of complex maize ear structure, unstable contact impedance, and diverse environmental noise significantly reduce the quality of measurement signals, posing major challenges for reconstructing three-dimensional (3D) conductivity distribution. Therefore, we propose a 3D conductivity absolute reconstruction model based on a spectrum-enhanced residual encoder-decoder (FRED-Net), enabling stable monitoring of impedance changes in in-situ maize ears in a greenhouse. FRED-Net combines residual blocks and encoder-decoder architectures, utilizing skip connections to preserve low-level features, effectively addressing gradient vanishing and information loss while maintaining low computational complexity. To cope with greenhouse environmental interference, FRED-Net incorporates an information fusion method based on spectral characteristics and volatility features, combined with a noiseless dataset training mode to enrich model input variability. Results from standard water tank simulation experiments indicate that FRED-Net exhibits ideal reconstruction accuracy and noise robustness compared to existing 3D absolute reconstruction algorithms, with a structural similarity index of 0. 9970, root mean square error of 0. 0019, relative error of 0. 0199, and coefficient of determination of 0. 9610. Imaging experiments on in-situ maize ears in the greenhouse successfully captured the dynamic characteristics of conductivity changes with water content during growth and the significant differences between tissue structures, providing an effective absolute imaging method for non-invasive visualization and measurement of the physiological state of maize ears during the growing period.

Details DOI

NeurIPS Conference 2025 Conference Paper

Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations

Kaibo Wang
Jianda Mao
Tong Wu
Yang Xiang

Classifier-Free Guidance (CFG) is an essential component of text-to-image diffusion models, and understanding and advancing its operational mechanisms remains a central focus of research. Existing approaches stem from divergent theoretical interpretations, thereby limiting the design space and obscuring key design choices. To address this, we propose a unified perspective that reframes conditional guidance as fixed point iterations, seeking to identify a golden path where latents produce consistent outputs under both conditional and unconditional generation. We demonstrate that CFG and its variants constitute a special case of single-step short-interval iteration, which is theoretically proven to exhibit inefficiency. To this end, we introduce Foresight Guidance (FSG), which prioritizes solving longer-interval subproblems in early diffusion stages with increased iterations. Extensive experiments across diverse datasets and model architectures validate the superiority of FSG over state-of-the-art methods in both image quality and computational efficiency. Our work offers novel perspectives for conditional guidance and unlocks the potential of adaptive design.

PDF Details

NeurIPS Conference 2024 Conference Paper

DiffHammer: Rethinking the Robustness of Diffusion-Based Adversarial Purification

Kaibo Wang
Xiaowen Fu
Yuxuan Han
Yang Xiang

Diffusion-based purification has demonstrated impressive robustness as an adversarial defense. However, concerns exist about whether this robustness arises from insufficient evaluation. Our research shows that EOT-based attacks face gradient dilemmas due to global gradient averaging, resulting in ineffective evaluations. Additionally, 1-evaluation underestimates resubmit risks in stochastic defenses. To address these issues, we propose an effective and efficient attack named DiffHammer. This method bypasses the gradient dilemma through selective attacks on vulnerable purifications, incorporating $N$-evaluation into loops and using gradient grafting for comprehensive and efficient evaluations. Our experiments validate that DiffHammer achieves effective results within 10-30 iterations, outperforming other methods. This calls into question the reliability of diffusion-based purification after mitigating the gradient dilemma and scrutinizing its resubmit risk.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

Yichong Huang
Xiaocheng Feng
Baohang Li
Yang Xiang
Hui Wang
Ting Liu
Bing Qin

Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. However, existing work focuses on training an extra reward model or fusion model to select or combine all candidate answers, posing a great challenge to the generalization on unseen data distributions. Besides, prior methods use textual responses as communication media, ignoring the valuable information in the internal representations. In this work, we propose a training-free ensemble framework \textsc{DeePEn}, fusing the informative probability distributions yielded by different LLMs at each decoding step. Unfortunately, the vocabulary discrepancy between heterogeneous LLMs directly makes averaging the distributions unfeasible due to the token misalignment. To address this challenge, \textsc{DeePEn} maps the probability distribution of each model from its own probability space to a universal \textit{relative space} based on the relative representation theory, and performs aggregation. Next, we devise a search-based inverse transformation to transform the aggregated result back to the probability space of one of the ensembling LLMs (main model), in order to determine the next token. We conduct extensive experiments on ensembles of different number of LLMs, ensembles of LLMs with different architectures, and ensembles between the LLM and the specialist model. Experimental results show that (i) \textsc{DeePEn} achieves consistent improvements across six benchmarks covering subject examination, reasoning, and knowledge, (ii) a well-performing specialist model can benefit from a less effective LLM through distribution fusion, and (iii) \textsc{DeePEn} has complementary strengths with other ensemble methods such as voting.

PDF Details DOI

AAAI Conference 2024 Conference Paper

FG-EmoTalk: Talking Head Video Generation with Fine-Grained Controllable Facial Expressions

Zhaoxu Sun
Yuze Xuan
Fang Liu
Yang Xiang

Although deep generative models have greatly improved one-shot video-driven talking head generation, few studies address fine-grained controllable facial expression editing, which is crucial for practical applications. Existing methods rely on a fixed set of predefined discrete emotion labels or simply copy expressions from input videos. This is limiting as expressions are complex, and methods using only emotion labels cannot generate fine-grained, accurate or mixed expressions. Generating talking head video with precise expressions is also difficult using 3D model-based approaches, as 3DMM only models facial movements and tends to produce deviations. In this paper, we propose a novel framework enabling fine-grained facial expression editing in talking face generation. Our goal is to achieve expression control by manipulating the intensities of individual facial Action Units (AUs) or groups. First, compared with existing methods which decouple the face into pose and expression, we propose a disentanglement scheme to isolates three components from the human face, namely, appearance, pose, and expression. Second, we propose to use input AUs to control muscle group intensities in the generated face, and integrate the AUs features with the disentangled expression latent code. Finally, we present a self-supervised training strategy with well-designed constraints. Experiments show our method achieves fine-grained expression control, produces high-quality talking head videos and outperforms baseline methods.

PDF Details DOI

ECAI Conference 2024 Conference Paper

Model Provenance via Model DNA

Xin Mu
Yu Wang
Yehong Zhang
Jiaqi Zhang
Hui Wang
Yang Xiang
Yue Yu

Understanding the life cycle of the machine learning (ML) model is an intriguing area of research (e. g. , understanding where the model comes from, how it is trained, and how it is used). Our focus is on a novel problem within this domain, namely Model Provenance (MP). MP concerns the relationship between a target model and its pre-training model and aims to determine whether a source model serves as the provenance for a target model. In this paper, we formulate this new challenge as a learning problem, supplementing our exploration with empirical discussions on its connections to existing works. Following that, we introduce “Model DNA”, an interesting concept encoding the model’s training data and input-output information to create a compact machine-learning model representation. Capitalizing on this model DNA, we establish an efficient framework consisting of three key components: DNA generation, DNA similarity loss, and a provenance classifier, aimed at identifying model provenance. We conduct evaluations on both computer vision and natural language processing tasks using various models, datasets, and scenarios to demonstrate the effectiveness of our approach.

Details

AIIM Journal 2024 Journal Article

TSOANet: Time-Sensitive Orthogonal Attention Network for medical event prediction

Hao Chen
Junjie Zhang
Yang Xiang
Shengye Lu
Buzhou Tang

Medical Event Prediction (MEP) based on Electronic Medical Records (EMR) is an essential and valuable task for healthcare. For a patient, information in the EMR can be organized into a structured sequence, consisting of multiple visits each with details about visit time and various types of medical events. As the time intervals between neighboring visits are irregular and the medical events at different visits can vary significantly, MEP based on EMR is still challenging. Many studies have been proposed to model the irregular time intervals, relations among different types of medical events within each visit and relations among medical events across visits, and reported exciting results. However, most of these studies focus on two out of the three aspects mentioned above, with only a few addressing all the three aspects simultaneously. In this study, we propose a novel network, the Time-Sensitive Orthogonal Attention Network (TSOANet), which can fully utilize the irregular time intervals, relations among different types of medical events within and across visits. In particular, we design two key components: (1) Time-Sensitive Block, used to model the time intervals at both local and global levels to determine the impact of each visit in EMR; (2) Orthogonal Attention Block, used to model relations among different types of medical events within each visit and across visits in two axes, that is, event axis and time axis. Extensive experiments on two public real-world EMR datasets demonstrate that TSOANet outperforms the state-of-the-art models for various prediction tasks, thereby verifying the effectiveness of our approach. The source code of TSOANet is released at https: //github. com/chh13502/TSOANet.

Details DOI

AAAI Conference 2024 Conference Paper

ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-Order Optimization

Shuoran Jiang
Qingcai Chen
Youcheng Pan
Yang Xiang
Yukang Lin
Xiangping Wu
Chuanyi Liu
Xiaobao Song

Lowering the memory requirement in full-parameter training on large models has become a hot research area. MeZO fine-tunes the large language models (LLMs) by just forward passes in a zeroth-order SGD optimizer (ZO-SGD), demonstrating excellent performance with the same GPU memory usage as inference. However, the simulated perturbation stochastic approximation for gradient estimate in MeZO leads to severe oscillations and incurs a substantial time overhead. Moreover, without momentum regularization, MeZO shows severe over-fitting problems. Lastly, the perturbation-irrelevant momentum on ZO-SGD does not improve the convergence rate. This study proposes ZO-AdaMU to resolve the above problems by adapting the simulated perturbation with momentum in its stochastic approximation. Unlike existing adaptive momentum methods, we relocate momentum on simulated perturbation in stochastic gradient approximation. Our convergence analysis and experiments prove this is a better way to improve convergence stability and rate in ZO-SGD. Extensive experiments demonstrate that ZO-AdaMU yields better generalization for LLMs fine-tuning across various NLP tasks than MeZO and its momentum variants.

PDF Details DOI

EAAI Journal 2023 Journal Article

Active diversification of head-class features in bilateral-expert models for enhanced tail-class optimization in long-tailed classification

Jianting Chen
Ling Ding
Yunxiao Yang
Yang Xiang

Training deep learning models on long-tailed datasets is a challenging task since the classification performance of tail classes with fewer samples is always unsatisfactory. Currently, many long-tailed methods have achieved success. However, some methods always improve tail-class performance at the expense of head-class performance due to limited model capability. To address this issue, we propose a novel algorithm-level method inspired by information theory to balance the information space of each class and boost tail-class performance while minimizing head-class sacrifice. Our method involves actively eliminating the redundant feature information of head classes to save space for tail classes during training. Specifically, we use a bilateral-expert model and design a duplicate information disentanglement (DID) module that can extract duplicate and redundant information from bilateral-expert features. This allows us to develop a head diversity loss to decrease the extracted duplicate and redundant information of head classes and a tail distillation loss to increase the label information of tail classes. The joint result of these two losses allows our model to fully leverage the information space for improved tail-class performance without compromising head-class performance. The effectiveness and practicability of our method are verified by five datasets with long-tailed distributions for visual recognition or fault diagnosis tasks. Experimental results demonstrate that our method outperforms currently available mainstream methods, which we attribute to the effectiveness of our proposed DID module and the incorporation of two long-tailed losses.

Details DOI

JBHI Journal 2023 Journal Article

Multimodal Data Matters: Language Model Pre-Training Over Structured and Unstructured Electronic Health Records

Sicen Liu
Xiaolong Wang
Yongshuai Hou
Ge Li
Hui Wang
Hui Xu
Yang Xiang
Buzhou Tang

As two important textual modalities in electronic health records (EHR), both structured data (clinical codes) and unstructured data (clinical narratives) have recently been increasingly applied to the healthcare domain. Most existing EHR-oriented studies, however, either focus on a particular modality or integrate data from different modalities in a straightforward manner, which usually treats structured and unstructured data as two independent sources of information about patient admission and ignore the intrinsic interactions between them. In fact, the two modalities are documented during the same encounter where structured data inform the documentation of unstructured data and vice versa. In this paper, we proposed a Medical Multimodal Pre-trained Language Model, named MedM-PLM, to learn enhanced EHR representations over structured and unstructured data and explore the interaction of two modalities. In MedM-PLM, two Transformer-based neural network components are firstly adopted to learn representative characteristics from each modality. A cross-modal module is then introduced to model their interactions. We pre-trained MedM-PLM on the MIMIC-III dataset and verified the effectiveness of the model on three downstream clinical tasks, i. e. , medication recommendation, 30-day readmission prediction and ICD coding. Extensive experiments demonstrate the power of MedM-PLM compared with state-of-the-art methods. Further analyses and visualizations show the robustness of our model, which could potentially provide more comprehensive interpretations for clinical decision-making.

Details DOI

NeurIPS Conference 2023 Conference Paper

Nearly Optimal VC-Dimension and Pseudo-Dimension Bounds for Deep Neural Network Derivatives

Yahong Yang
Haizhao Yang
Yang Xiang

This paper addresses the problem of nearly optimal Vapnik--Chervonenkis dimension (VC-dimension) and pseudo-dimension estimations of the derivative functions of deep neural networks (DNNs). Two important applications of these estimations include: 1) Establishing a nearly tight approximation result of DNNs in the Sobolev space; 2) Characterizing the generalization error of machine learning methods with loss functions involving function derivatives. This theoretical investigation fills the gap of learning error estimations for a wide range of physics-informed machine learning models and applications including generative models, solving partial differential equations, operator learning, network compression, distillation, regularization, etc.

PDF Details

JBHI Journal 2023 Journal Article

SHAPE: A Sample-Adaptive Hierarchical Prediction Network for Medication Recommendation

Sicen Liu
Xiaolong Wang
Jingcheng Du
Yongshuai Hou
Xianbing Zhao
Hui Xu
Hui Wang
Yang Xiang

Effectively medication recommendation with complex multimorbidity conditions is a critical yet challenging task in healthcare. Most existing works predicted medications based on longitudinal records, which assumed the encoding format of intra-visit medical events are serialized and information transmitted patterns of learning longitudinal sequence data are stable. However, the following conditions may have been ignored: 1) A more compact encoder for intra-relationship in the intra-visit medical event is urgent; 2) Strategies for learning accurate representations of the variable longitudinal sequences of patients are different. In this article, we proposed a novel Sample-adaptive Hierarchical medicAtion Prediction nEtwork, termed SHAPE, to tackle the above challenges in the medication recommendation task. Specifically, we design a compact intra-visit set encoder to encode the relationship in the medical event for obtaining visit-level representation and then develop an inter-visit longitudinal encoder to learn the patient-level longitudinal representation efficiently. To endow the model with the capability of modeling the variable visit length, we introduce a soft curriculum learning method to assign the difficulty of each sample automatically by the visit length. Extensive experiments on a benchmark dataset verify the superiority of our model compared with several state-of-the-art baselines.

Details DOI

AIIM Journal 2022 Journal Article

CATNet: Cross-event attention-based time-aware network for medical event prediction

Sicen Liu
Xiaolong Wang
Yang Xiang
Hui Xu
Hui Wang
Buzhou Tang

Medical event prediction (MEP) is a fundamental task in the healthcare domain, which needs to predict medical events, including medications, diagnosis codes, laboratory tests, procedures, outcomes, and so on, according to historical medical records of patients. Many researchers have tried to build MEP models to overcome the challenges caused by the heterogeneous and irregular temporal characteristics of EHR data. However, most of them consider the heterogenous and temporal medical events separately and ignore the correlations among different types of medical events, especially relations between heterogeneous historical medical events and target medical events. In this paper, we propose a novel neural network based on attention mechanism called Cross-event Attention-based Time-aware Network (CATNet) for MEP. It is a time-aware, event-aware and task-adaptive method with the following advantages: 1) modeling heterogeneous information and temporal information in a unified way and considering irregular temporal characteristics locally and globally respectively, 2) taking full advantage of correlations among different types of events via cross-event attention. Experiments on two public datasets (MIMIC-III and eICU) show CATNet outperforms other state-of-the-art methods on various MEP tasks. The source code of CATNet is released at https: //github. com/sherry6247/CATNet. git.

Details DOI

IJCAI Conference 2022 Conference Paper

Enhancing Entity Representations with Prompt Learning for Biomedical Entity Linking

Tiantian Zhu
Yang Qin
Qingcai Chen
Baotian Hu
Yang Xiang

Biomedical entity linking aims to map mentions in biomedical text to standardized concepts or entities in a curated knowledge base (KB) such as Unified Medical Language System (UMLS). The latest research tends to solve this problem in a unified framework solely based on surface form matching between mentions and entities. Specifically, these methods focus on addressing the variety challenge of the heterogeneous naming of biomedical concepts. Yet, the ambiguity challenge that the same word under different contexts may refer to distinct entities is usually ignored. To address this challenge, we propose a two-stage linking algorithm to enhance the entity representations based on prompt learning. The first stage includes a coarser-grained retrieval from a representation space defined by a bi-encoder that independently embeds the mention and entity’s surface forms. Unlike previous one-model-fits-all systems, each candidate is then re-ranked with a finer-grained encoder based on prompt-tuning that utilizes the contextual information. Extensive experiments show that our model achieves promising performance improvements compared with several state-of-the-art techniques on the largest biomedical public dataset MedMentions and the NCBI disease corpus. We also observe by cases that the proposed prompt-tuning strategy is effective in solving both the variety and ambiguity challenges in the linking task.

PDF Details DOI

AAMAS Conference 2021 Conference Paper

Constructing Junction Tree Agent Organization with Privacy

Yang Xiang
Abdulrahman Alshememry

Several frameworks for decentralized reasoning assume a junction tree agent organization (JT-org). JT-org construction involves 3 related tasks on existence recognition, construction, and environment re-decomposition, where re-decomposition incurs loss of JT-org linked privacy, including privacy on agent, topology, private and shared variables. We propose a novel algorithm DAER that accomplishes all 3 tasks distributively. For Tasks 1 and 2, DAER incurs no loss of JT-org linked privacy. For Task 3, it incurs significantly less privacy loss than existing JT-org construction methods. Its performance is formally analyzed and empirically evaluated 1.

PDF

YNIMG Journal 2021 Journal Article

The mutuality of social emotions: How the victim's reactive attitude influences the transgressor's emotional responses

Xiaoxue Gao
Hongbo Yu
Lu Peng
Xiaoliang Gong
Yang Xiang
Changjun Jiang
Xiaolin Zhou

Would a transgressor be guiltier or less after receiving the victim's forgiving or blaming attitude? Everyday intuitions and empirical evidence are mixed in this regard, leaving how interpersonal attitudes shape the transgressor's reactive social emotions an open question. We combined a social interactive game with multivariate pattern analysis of fMRI data to address this question. Participants played an interactive game in an fMRI scanner where their incorrect responses could cause either high or low pain stimulation to an anonymous co-player. Following incorrect responses, participants were presented with the co-player's (i.e., the victim's) attitude towards the harm (Blame, Forgive, or Neutral). Behaviorally, the victim's attitude and the severity of harm interactively modulated the transgressor's social emotions, with expectation violation serving as a mediator. While unexpected forgiveness following severe harm amplified the participants' guilt, unexpected blame following minor harm reduced the participants' guilt and increased their anger. This role of expectation violation was supported by multivariate pattern analysis of fMRI, revealing a shared neural representation in ventral striatum in the processing of victim's attitude-induced guilt and anger. Moreover, we identified a neural re-appraisal process of guilt in the transgressor, with the involvement of area related to self-conscious processing (i.e., perigenual anterior cingulate cortex) before knowing the victim's attitude transiting to the involvement of other-regarding related area (i.e., temporoparietal junction) after knowing the victim's attitude. These findings uncover the neurocognitive bases underlying the transgressor's social emotional responses, and highlight the importance of the mutuality of social emotions.

Details DOI

JAAMAS Journal 2020 Journal Article

Privacy sensitive environment re-decomposition for junction tree agent organization construction

Yang Xiang
Abdulrahman Alshememry

Abstract A number of frameworks for decentralized probabilistic reasoning, constraint reasoning, and decision theoretic reasoning assume a junction tree agent organization (JT-org). A natural decomposition of agent environment may not admit a JT-org. Hence, JT-org construction involves three related tasks: (1) Recognize whether a JT-org exists for a given environment decomposition. (2) When JT-orgs exist, construct one. (3) When no JT-org exists, revise the environment decomposition so that one exists and then construct it. Task 3 requires re-decomposition of the environment. However, re-decomposition incurs loss of JT-org linked privacy, including agent privacy, topology privacy, privacy on private variables, and privacy on shared variables. We propose a novel algorithm suite Distributed Agent Environment Re-decomposition (DAER) that accomplishes all three tasks distributively. For Tasks 1 and 2, DAER incurs no loss of JT-org linked privacy. For Task 3, it incurs significantly less privacy loss than existing JT-org construction methods. Performance of DAER is formally analyzed and empirically evaluated.

Details DOI

IJCAI Conference 2019 Conference Paper

A Quantitative Analysis Platform for PD-L1 Immunohistochemistry based on Point-level Supervision Model

Haibo Mi
Kele Xu
Yang Xiang
Yulin He
Dawei Feng
Huaimin Wang
Chun Wu
Yanming Song

Recently, deep learning has witnessed dramatic progress in the medical image analysis field. In the precise treatment of cancer immunotherapy, the quantitative analysis of PD-L1 immunohistochemistry is of great importance. It is quite common that pathologists manually quantify the cell nuclei. This process is very time-consuming and error-prone. In this paper, we describe the development of a platform for PD-L1 pathological image quantitative analysis using deep learning approaches. As point-level annotations can provide a rough estimate of the object locations and classifications, this platform adopts a point-level supervision model to classify, localize, and count the PD-L1 cells nuclei. Presently, this platform has achieved an accurate quantitative analysis of PD-L1 for two types of carcinoma, and it is deployed in one of the first-class hospitals in China.

PDF Details

JAAMAS Journal 2015 Journal Article

Privacy preserving existence recognition and construction of hypertree agent organization

Yang Xiang
Kamala Srinivasan

Abstract Decentralized probabilistic reasoning, constraint reasoning, and decision theoretic reasoning are some essential tasks of cooperative multiagent systems. Several frameworks for these tasks organize agents into a junction tree (JT). We show that existing techniques for JT existence recognition and construction leak information on private variables, shared variables, agent identities and adjacency, that can potentially be protected. We present a scheme to quantify these privacy losses. We develop two novel algorithms for JT existence recognition and for JT construction when existing, that provide strong guarantee of agent privacy. Our experimental comparison shows that the proposed algorithms out-perform existing techniques, one of them having the lowest privacy loss and the other having no privacy loss, while being more efficient than most alternatives.

Details DOI

AAAI Conference 2012 Conference Paper

Towards Automated Choreographing of Web Services Using Planning

Guobing Zou
Yixin Chen
You Xu
Ruoyun Huang
Yang Xiang

For Web service composition, choreography has recently received great attention and demonstrated a few key advantages over orchestration such as distributed control, fairness, data efficiency, and scalability. Automated design of choreography plans, especially distributed plans for multiple roles, is more complex and has not been studied before. Existing work requires manual generation assisted by model checking. In this paper, we propose a novel planning-based approach that can automatically convert a given composition task to a distributed choreography specification. Although planning has been used for orchestration, it is difficult to use planning for choreography, as it involves decentralized control, concurrent workflows, and contingency. We propose a few novel techniques, including compilation of contingencies, dependency graph analysis, and communication control, to handle these characteristics using planning. We theoretically show the correctness of this approach and empirically evaluate its practicability.

PDF Details

AIJ Journal 2010 Journal Article

Book review

Yang Xiang

Details DOI

MFCS Conference 2009 Conference Paper

How to Use Spanning Trees to Navigate in Graphs

Feodor F. Dragan
Yang Xiang

Abstract In this paper, we investigate three strategies of how to use a spanning tree T of a graph G to navigate in G, i. e. , to move from a current vertex x towards a destination vertex y via a path that is close to optimal. In each strategy, each vertex v has full knowledge of its neighborhood N G [ v ] in G (or, k -neighborhood D k ( v, G ), where k is a small integer) and uses a small piece of global information from spanning tree T (e. g. , distance or ancestry information in T ), available locally at v, to navigate in G. We investigate advantages and limitations of these strategies on particular families of graphs such as graphs with locally connected spanning trees, graphs with bounded length of largest induced cycle, graphs with bounded tree-length, graphs with bounded hyperbolicity. For most of these families of graphs, the ancestry information from a BFS-tree guarantees short enough routing paths. In many cases, the obtained results are optimal up to a constant factor.

Details

AAMAS Conference 2006 Conference Paper

Agent Interface Enhancement For Efficient Multiagent Probabilistic Inference

Yang Xiang
Kun Zhang

NMR Workshop 2004 Conference Paper

Probabilistic reasoning in dynamic multiagent systems

Xiangdong An 0001
Yang Xiang
Nick Cercone

Probabilistic reasoning with multiply sectioned Bayesian networks (MSBNs) has been successfully applied in static domains under the cooperative multiagent paradigm. Probabilistic reasoning in dynamic domains under the same paradigm involves several issues. This paper proposes an approach to address these issues. Intuitively, observation on current state plays a more important role in the reasoning of the current state than remote historic information. Based on the intuition, we model the entire domain for a period of time into an MSBN and then reason about the state of the dynamic domain period by period exactly. In reasoning the state of a suspected entity, we compute and observe an observable Markov boundary of the entity. This makes observation more efficient and relevant. In MSBNs, an observable Markov boundary of a node may span across all Bayesian subnets. We present an algorithm for cooperative computation of an observable Markov boundary of a set of nodes in MSBNs without revealing subnet structures. Preliminary experiments show the approach works well on our simulated multiagent dynamic domains.

Details

UAI Conference 1997 Conference Paper

Learning Belief Networks in Domains with Recursively Embedded Pseudo Independent Submodels

Jun Hu
Yang Xiang

Details

UAI Conference 1996 Conference Paper

Critical Remarks on Single Link Search in Learning Belief Networks

Yang Xiang
S. K. Michael Wong
Nick Cercone

In learning belief networks, the single link lookahead search is widely adopted to reduce the search space. We show that there exists a class of probabilistic domain models which displays a special pattern of dependency. We analyze the behavior of several learning algorithms using different scoring metrics such as the entropy, conditional independence, minimal description length and Bayesian metrics. We demonstrate that single link lookahead search procedures (employed in these algorithms) cannot learn these models correctly. Thus, when the underlying domain model actually belongs to this class, the use of a single link search procedure will result in learning of an incorrect model. This may lead to inference errors when the model is used. Our analysis suggests that if the prior knowledge about a domain does not rule out the possible existence of these models, a multi-link lookahead search or other heuristics should be used for the learning process.

Details

UAI Conference 1995 Conference Paper

A Method for Implementing a Probabilistic Model as a Relational Database

S. K. Michael Wong
Cory J. Butz
Yang Xiang

This paper discusses a method for implementing a probabilistic inference system based on an extended relational data model. This model provides a unified approach for a variety of applications such as dynamic programming, solving sparse linear equations, and constraint propagation. In this framework, the probability model is represented as a generalized relational database. Subsequent probabilistic requests can be processed as standard relational queries. Conventional database management systems can be easily adopted for implementing such an approximate reasoning system.

Details

AIIM Journal 1993 Journal Article

Multiply sectioned Bayesian networks for neuromuscular diagnosis

Yang Xiang
B. Pant
A. Eisen
M.P. Beddoes
D. Poole

A prototype neuromuscular diagnostic system (PAINULIM) that diagnoses painful or impaired upper limbs has been developed based on Bayesian networks. This paper presents nonmathematically the major knowledge representation issues that arose in the development of PAINULIM. Motivated by the computational overhead of large application domains, and the desire to provide a user with an interface that gives a focused display of a subdomain of current interest, we built PAINULIM using the idea of multiply sectioned Bayesian networks. A preliminary evaluation of PAINULIM with 76 patients has demonstrated good clinical performance.

Details DOI