Author name cluster

Xinyu Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers

2 author rows

AAAI Conference 2026 Conference Paper

Fair Domain Generalization: An Information-Theoretic View

Tangzheng Lian
Guanyu Hu
Dimitrios Kollias
Xinyu Yang
Oya Celiktutan

Domain generalization (DG) and algorithmic fairness are two key challenges in machine learning. However, most DG methods focus solely on minimizing expected risk in the unseen target domain, without considering algorithmic fairness. Conversely, fairness methods typically do not account for domain shifts, so the fairness achieved during training may not generalize to unseen test domains. In this work, we bridge these gaps by studying the problem of Fair Domain Generalization (FairDG), which aims to minimize both expected risk and fairness violations in unseen target domains. We derive novel mutual information-based upper bounds for expected risk and fairness violations in multi-class classification tasks with multi-group sensitive attributes. These bounds provide key insights for algorithm design from an information-theoretic perspective. Guided by these insights, we propose a practical method that solves the FairDG problem through Pareto optimization. Experiments on real-world vision and language datasets show that our method achieves superior utility–fairness trade-offs compared to existing approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

RetouchAgent: Towards Interactive and Explainable Image Retouching with MLLM Agents

Shuo Zhang
Xinyu Yang

Although deep learning-based image retouching has made significant progress, its inherent subjectivity renders current black-box methods limited in interactivity and explainability. Among existing efforts, parameter-controlled methods aim to improve interactivity, but often suffer from ambiguous semantics and lack support for natural language control. Reinforcement learning–based explainability methods are constrained by low-dimensional and limited action spaces, which result in suboptimal performance. To address the above issues, we propose RetouchAgent, a novel framework that leverages collaboration among multiple MLLM agents for image retouching. Our method consists of the following key steps: (1) Retrieval: By constructing a multimodal retouching database, we enable an ICL sample retrieval mechanism guided by retouching intent. (2) Engine: Leveraging the vision-language understanding capabilities of MLLM, a carefully designed prompting strategy, and a dedicated operation library, we enable precise and controllable image retouching. (3) Reflection: We evaluate each retouching interaction and optimize the retouching process for progressive result refinement. Finally, through multiple rounds of collaboration among MLLM agents, RetouchAgent achieves state-of-the-art performance in quantitative and qualitative evaluations.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Skill Path: Unveiling Language Skills from Circuit Graphs

Hang Chen
Xinyu Yang
Jiaying Zhu
Wenya Wang

Circuit graph discovery has emerged as a fundamental approach to elucidating the skill mechanistic of language models. Despite the output faithfulness of circuit graphs, they suffer from atomic ablation, which causes the loss of causal dependencies between connected components. In addition, their discovery process, designed to preserve output faithfulness, inadvertently captures extraneous effects other than an isolated target skill. To alleviate these challenges, we introduce skill paths, which offer a more refined and compact representation by isolating individual skills within a linear chain of components. To enable skill path extracting from circuit graphs, we propose a three-step framework, consisting of decomposition, pruning, and post-hoc causal mediation. In particular, we offer a complete linear decomposition of the transformer model which leads to a disentangled computation graph. After pruning, we further adopt causal analysis techniques, including counterfactuals and interventions, to extract the final skill paths from the circuit graph. To underscore the significance of skill paths, we investigate three generic language skills—Previous Token Skill, Induction Skill, and In-Context Learning Skill—using our framework. Experiments support two crucial properties of these skills, namely stratification and inclusiveness.

PDF Details DOI

EAAI Journal 2025 Journal Article

Estimation of the orientation of potatoes and detection bud eye position using potato orientation detection you only look once with fast and accurate features for the movement strategy of intelligent cutting robots

Jie Huang
Xiangyou Wang
Chengqian Jin
Fernando Auat Cheein
Xinyu Yang

Details DOI

AAAI Conference 2025 Conference Paper

Graph Structure Learning for Spatial-Temporal Imputation: Adapting to Node and Feature Scales

Xinyu Yang
Yu Sun
Xinyang Chen
Ying Zhang
Xiaojie Yuan

Spatial-temporal data collected across different geographic locations often suffer from missing values, posing challenges to data analysis. Existing methods primarily leverage fixed spatial graphs to impute missing values, which implicitly assume that the spatial relationship is roughly the same for all features across different locations. However, they may overlook the different spatial relationships of diverse features recorded by sensors in different locations. To address this, we introduce the multi-scale Graph Structure Learning framework for spatial-temporal Imputation (GSLI) that dynamically adapts to the heterogeneous spatial correlations. Our framework encompasses node-scale graph structure learning to cater to the distinct global spatial correlations of different features, and feature-scale graph structure learning to unveil common spatial correlation across features within all stations. Integrated with prominence modeling, our framework emphasizes nodes and features with greater significance in the imputation process. Furthermore, GSLI incorporates cross-feature and cross-temporal representation learning to capture spatial-temporal dependencies. Evaluated on six real incomplete spatial-temporal datasets, GSLI showcases the improvement in data imputation and downstream applications.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Xinyu Yang
Yuwei An
Hongyi Liu
Tianqi Chen
Beidi Chen

Autoregressive Large Language Models (AR-LLMs) frequently exhibit implicit parallelism in sequential generation. Inspired by this, we introduce Multiverse, a new generative model enabling natively parallel generation. Multiverse internalizes a MapReduce paradigm, generating automatically through three stages: (i) a Map stage for adaptive task decomposition, (ii) a Process stage for parallel subtask execution, and (iii) a Reduce stage for lossless result synthesis. Next, we build a real-world Multiverse reasoning model with co-design of data, algorithm, and system, enabling rapid and seamless transfer from frontier AR-LLMs. For data creation, we develop Multiverse Curator, an automated LLM-assisted pipeline that transforms sequential reasoning chains into structured training data, avoiding costly human annotations. Algorithmically, we design Multiverse Attention to separate parallel reasoning steps while keeping compatibility with causal attention for efficient training. Systematically, we implement Multiverse Engine to support parallel inference. It features a dedicated interpreter that dynamically switches between sequential and parallel generation, triggered directly by the model. After a 3-hour fine-tuning with 1K examples, our Multiverse-32B stands as the only open-sourced non-AR model achieving performance on par with leading AR-LLMs of the same scale, evidenced by AIME24 & 25 scores of 54% and 46%, respectively. Moreover, our budget control experiments show that Multiverse-32B exhibits superior scaling, outperforming AR-LLMs by 1. 87% on average using the same context length. Such scaling further leads to practical efficiency gain, achieving up to 2x speedup across varying batch sizes. We have open-sourced the entire Multiverse ecosystem, including data, model weights, serving system, supporting tools, as well as data curation prompts and detailed training and evaluation recipes.

PDF Details

TMLR Journal 2025 Journal Article

Reliable and Responsible Foundation Models

Xinyu Yang
Junlin Han
Rishi Bommasani
Jinqi Luo
Wenjie Qu
Wangchunshu Zhou
Adel Bibi
Xiyao Wang

Foundation models, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), Image Generative Models (i.e, Text-to-Image Models and Image-Editing Models), and Video Generative Models, have become essential tools with broad applications across various domains such as law, medicine, education, finance, and beyond. As these models see increasing real-world deployment, ensuring their reliability and responsibility has become critical for academia, industry, and government. This survey addresses the reliable and responsible development of foundation models. We explore critical issues, including bias and fairness, security and privacy, uncertainty, explainability, and distribution shift. Our research also covers model limitations, such as hallucinations, as well as methods like alignment and Artificial Intelligence-Generated Content (AIGC) detection. For each area, we review the current state of the field and outline concrete future research directions. Additionally, we discuss the intersections between these areas, highlighting their connections and shared challenges. We hope our survey fosters the development of foundation models that are not only powerful but also ethical, trustworthy, reliable, and socially responsible.

PDF Details

NeurIPS Conference 2025 Conference Paper

Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates

Hang Chen
Jiaying Zhu
Xinyu Yang
Wenya Wang

Circuit discovery has gradually become one of the prominent methods for mechanistic interpretability, and research on circuit completeness has also garnered increasing attention. Methods of circuit discovery that do not guarantee completeness not only result in circuits that are not fixed across different runs but also cause key mechanisms to be omitted. The nature of incompleteness arises from the presence of OR gates within the circuit, which are often only partially detected in standard circuit discovery methods. To this end, we systematically introduce three types of logic gates: AND, OR, and ADDER gates, and decompose the circuit into combinations of these logical gates. Through the concept of these gates, we derive the minimum requirements necessary to achieve faithfulness and completeness. Furthermore, we propose a framework that combines noising-based and denoising-based interventions, which can be easily integrated into existing circuit discovery methods without significantly increasing computational complexity. This framework is capable of fully identifying the logic gates and distinguishing them within the circuit. In addition to the extensive experimental validation of the framework's ability to restore the faithfulness, completeness, and sparsity of circuits, using this framework, we uncover fundamental properties of the three logic gates, such as their proportions and contributions to the output, and explore how they behave among the functionalities of language models.

PDF Details

IJCAI Conference 2025 Conference Paper

SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation

Zhaoxi Mu
Xinyu Yang
Gang Wang

While contemporary speech separation technologies adeptly process lengthy mixed audio waveforms, they are frequently challenged by the intricacies of real-world environments, including noisy and reverberant settings, which can result in artifacts or distortions in the separated speech. To overcome these limitations, we introduce SepALM, a pioneering approach that employs audio language models (ALMs) to rectify and re-synthesize speech within the text domain following preliminary separation. SepALM comprises four core components: a separator, a corrector, a synthesizer, and an aligner. By integrating an ALM-based end-to-end error correction mechanism, we mitigate the risk of error accumulation and circumvent the optimization hurdles typically encountered in conventional methods that amalgamate automatic speech recognition (ASR) with large language models (LLMs). Additionally, we have developed Chain-of-Thought (CoT) prompting and knowledge distillation techniques to facilitate the reasoning and training processes of the ALM. Our experiments substantiate that SepALM not only elevates the precision of speech separation but also markedly bolsters adaptability in novel acoustic environments.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Tracking and Understanding Object Transformations

Yihong Sun
Xinyu Yang
Jennifer Sun
Bharath Hariharan

Real-world objects frequently undergo state transformations. From an apple being cut into pieces to a butterfly emerging from its cocoon, tracking through these changes is important for understanding real-world objects and dynamics. However, existing methods often lose track of the target object after transformation, due to significant changes in object appearance. To address this limitation, we introduce the task of Track Any State: tracking objects through transformations while detecting and describing state changes, accompanied by a new benchmark dataset, VOST-TAS. To tackle this problem, we present TubeletGraph, a zero-shot system that recovers missing objects after transformation and maps out how object states are evolving over time. TubeletGraph first identifies potentially overlooked tracks, and determines whether they should be integrated based on semantic and proximity priors. Then, it reasons about the added tracks and generates a state graph describing each observed transformation. TubeletGraph achieves state-of-the-art tracking performance under transformations, while demonstrating deeper understanding of object transformations and promising capabilities in temporal grounding and semantic reasoning for complex object transformations. Code, additional results, and the benchmark dataset are available at https: //tubelet-graph. github. io.

PDF Details

NeurIPS Conference 2024 Conference Paper

Frequency-aware Generative Models for Multivariate Time Series Imputation

Xinyu Yang
Yu Sun
Xiaojie Yuan
Xinyang Chen

Missing data in multivariate time series are common issues that can affect the analysis and downstream applications. Although multivariate time series data generally consist of the trend, seasonal and residual terms, existing works mainly focus on optimizing the modeling for the first two items. However, we find that the residual term is more crucial for getting accurate fillings, since it is more related to the diverse changes of data and the biggest component of imputation errors. Therefore, in this study, we introduce frequency-domain information and design Frequency-aware Generative Models for Multivariate Time Series Imputation (FGTI). Specifically, FGTI employs a high-frequency filter to boost the residual term imputation, supplemented by a dominant-frequency filter for the trend and seasonal imputation. Cross-domain representation learning module then fuses frequency-domain insights with deep representations. Experiments over various datasets with real-world missing values show that FGTI achieves superiority in both data imputation and downstream applications.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization

Xiaochen Ma
Xuekang Zhu
Lei Su
Bo Du
Zhuohang Jiang
Bingkui Tong
Zeyu Lei
Xinyu Yang

A comprehensive benchmark is yet to be established in the Image Manipulation Detection & Localization (IMDL) field. The absence of such a benchmark leads to insufficient and misleading model evaluations, severely undermining the development of this field. However, the scarcity of open-sourced baseline models and inconsistent training and evaluation protocols make conducting rigorous experiments and faithful comparisons among IMDL models challenging. To address these challenges, we introduce IMDL-BenCo, the first comprehensive IMDL benchmark and modular codebase. IMDL-BenCo: i) decomposes the IMDL framework into standardized, reusable components and revises the model construction pipeline, improving coding efficiency and customization flexibility; ii) fully implements or incorporates training code for state-of-the-art models to establish a comprehensive IMDL benchmark; and iii) conducts deep analysis based on the established benchmark and codebase, offering new insights into IMDL model architecture, dataset characteristics, and evaluation standards. Specifically, IMDL-BenCo includes common processing algorithms, 8 state-of-the-art IMDL models (1 of which are reproduced from scratch), 2 sets of standard training and evaluation protocols, 15 GPU-accelerated evaluation metrics, and 3 kinds of robustness evaluation. This benchmark and codebase represent a significant leap forward in calibrating the current progress in the IMDL field and inspiring future breakthroughs. Code is available at: https: //github. com/scu-zjz/IMDLBenCo

PDF Details DOI

AAAI Conference 2024 Conference Paper

MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion

Shulei Ji
Xinyu Yang

Generating music with emotion is an important task in automatic music generation, in which emotion is evoked through a variety of musical elements (such as pitch and duration) that change over time and collaborate with each other. However, prior research on deep learning-based emotional music generation has rarely explored the contribution of different musical elements to emotions, let alone the deliberate manipulation of these elements to alter the emotion of music, which is not conducive to fine-grained element-level control over emotions. To address this gap, we present a novel approach employing musical element-based regularization in the latent space to disentangle distinct elements, investigate their roles in distinguishing emotions, and further manipulate elements to alter musical emotions. Specifically, we propose a novel VQ-VAE-based model named MusER. MusER incorporates a regularization loss to enforce the correspondence between the musical element sequences and the specific dimensions of latent variable sequences, providing a new solution for disentangling discrete sequences. Taking advantage of the disentangled latent vectors, a two-level decoding strategy that includes multiple decoders attending to latent vectors with different semantics is devised to better predict the elements. By visualizing latent space, we conclude that MusER yields a disentangled and interpretable latent space and gain insights into the contribution of distinct elements to the emotional dimensions (i.e., arousal and valence). Experimental results demonstrate that MusER outperforms the state-of-the-art models for generating emotional music in both objective and subjective evaluation. Besides, we rearrange music through element transfer and attempt to alter the emotion of music by transferring emotion-distinguishable elements.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

Xinyu Yang
Jixuan Leng
Geyang Guo
Jiawei Zhao
Ryumei Nakada
Linjun Zhang
Huaxiu Yao
Beidi Chen

Current PEFT methods for LLMs can achieve high quality, efficient training, or scalable serving, but not all three simultaneously. To address this limitation, we investigate sparse fine-tuning and observe a remarkable improvement in generalization ability. Utilizing this key insight, we propose a family of Structured Sparse Fine-Tuning (S${^2}$FT) methods for LLMs, which concurrently achieve state-of-the-art fine-tuning performance, training efficiency, and inference scalability. S${^2}$FT accomplishes this by "selecting sparsely and computing densely". Based on the coupled structures in LLMs, \model selects a few attention heads and channels in the MHA and FFN modules for each Transformer block, respectively. Next, it co-permutes the weight matrices on both sides of all coupled structures to connect the selected subsets in each layer into a dense submatrix. Finally, S${^2}$FT performs in-place gradient updates on all selected submatrices. Through theoretical analyses and empirical results, our method prevents forgetting while simplifying optimization, delivers SOTA performance on both commonsense and arithmetic reasoning with 4. 6% and 1. 3% average improvements compared to LoRA, and surpasses full FT by 11. 5% when generalizing to various domains after instruction tuning. Using our partial back-propagation algorithm, S${^2}$FT saves training memory up to 3$\times$ and improves latency by 1. 5-2. 7$\times$ compared to full FT, while achieving an average 10\% improvement over LoRA on both metrics. We further demonstrate that the weight updates in S${^2}$FT can be decoupled into adapters, enabling effective fusion, fast switch, and efficient parallelism when serving multiple fine-tuned models.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Zhaoxi Mu
Xinyu Yang
Sining Sun
Qing Yang

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network. To overcome this challenge, we propose a self-supervised disentangled representation learning method. Our approach tackles this issue through a two-phase process, utilizing a reference speech encoding network and a global information disentanglement network to gradually disentangle the speaker identity information from other irrelevant factors. We exclusively employ the disentangled speaker identity information to guide the speech extraction network. Moreover, we introduce the adaptive modulation Transformer to ensure that the acoustic representation of the mixed signal remains undisturbed by the speaker embeddings. This component incorporates speaker embeddings as conditional information, facilitating natural and efficient guidance for the speech extraction network. Experimental results substantiate the effectiveness of our meticulously crafted approach, showcasing a substantial reduction in the likelihood of speaker confusion.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction

Zhaoxi Mu
Xinyu Yang

The integration of visual cues has revitalized the performance of the target speech extraction task, elevating it to the forefront of the field. Nevertheless, this multi-modal learning paradigm often encounters the challenge of modality imbalance. In audio-visual target speech extraction tasks, the audio modality tends to dominate, potentially overshadowing the importance of visual guidance. To tackle this issue, we propose AVSepChain, drawing inspiration from the speech chain concept. Our approach partitions the audio-visual target speech extraction task into two stages: speech perception and speech production. In the speech perception stage, audio serves as the dominant modality, while visual information acts as the conditional modality. Conversely, in the speech production stage, the roles are reversed. This transformation of modality status aims to alleviate the problem of modality imbalance. Additionally, we introduce a contrastive semantic matching loss to ensure that the semantic information conveyed by the generated speech aligns with the semantic information conveyed by lip movements during the speech production stage. Through extensive experiments conducted on multiple benchmark datasets for audio-visual target speech extraction, we showcase the superior performance achieved by our proposed method.

PDF Details DOI

IS Journal 2023 Journal Article

ECCVideo: A Scalable Edge Cloud Collaborative Video Analysis System

Qing Han
Xuebin Ren
Peng Zhao
Yimeng Wang
Luhui Wang
Cong Zhao
Xinyu Yang

Video analysis drives a wide range of applications in the fields of public safety, autonomous vehicles, etc. , with the great potential to impact society. Traditional cloud-based approaches are not applicable because of prohibitive bandwidth consumption and high response latency, while simply edge-based video analysis suffers from large computation delay, considering the restricted computing capacity of edge servers. Therefore, in this article, we focus on low-latency edge-cloud collaborative video analytic applications (ECCVApps) by making full use of resources at both the edge and cloud. Particularly, we present an edge-cloud collaborative video analysis system called ECCVideo, to support the unified management of heterogeneous servers and facilitate the development and deployment of large-scale ECCVApps. Under ECCVideo, we design the application architecture of ECCVApps, including presentation paradigm, transparent communication services, and full lifecycle management. To validate the proposed system, a real-time object detection application is deployed on the ECCVideo prototype.

Details DOI

TMLR Journal 2023 Journal Article

Multi-Domain Long-Tailed Learning by Augmenting Disentangled Representations

Xinyu Yang
Huaxiu Yao
Allan Zhou
Chelsea Finn

There is an inescapable long-tailed class-imbalance issue in many real-world classification problems. Current methods for addressing this problem only consider scenarios where all examples come from the same distribution. However, in many cases, there are multiple domains with distinct class imbalance. We study this multi-domain long-tailed learning problem and aim to produce a model that generalizes well across all classes and domains. Towards that goal, we introduce TALLY, a method that addresses this multi-domain long-tailed learning problem. Built upon a proposed selective balanced sampling strategy, TALLY achieves this by mixing the semantic representation of one example with the domain-associated nuisances of another, producing a new representation for use as data augmentation. To improve the disentanglement of semantic representations, TALLY further utilizes a domain-invariant class prototype that averages out domain-specific effects. We evaluate TALLY on several benchmarks and real-world datasets and find that it consistently outperforms other state-of-the-art methods in both subpopulation and domain shift.

PDF Details

IROS Conference 2023 Conference Paper

Object-Oriented Option Framework for Robotics Manipulation in Clutter

Jing-Cheng Pang
Si-Hang Yang
Xiong-Hui Chen
Xinyu Yang
Yang Yu 0001
Mas Ma
Ziqi Guo
Howard Yang

Domestic service robots are becoming increasingly popular due to their ability to help people with household tasks. These robots often encounter the challenge of manipulating objects in cluttered environments (MoC), which is difficult due to the complexity of effective planning and control. Previous solutions involved designing specific action primitives and planning paradigms. However, the pre-coded action primitives can limit the agility and task-solving scope of robots. In this paper, we propose a general approach for MoC called the Object-Oriented Option Framework (O3F), which uses the option framework (OF) to learn planning and control. The standard OF discovers options from scratch based on reinforcement learning, which can lead to collapsed options and hurt learning. To address this limitation, O3F introduces the concept of an object-oriented option space for OF, which focuses specifically on object movement and overcomes the challenges associated with collapsed options. Based on this, we train an object-oriented option planner to determine the option to execute and a universal object-oriented option executor to complete the option. Simulation experiments on the Ginger XR1 robot and robot arm show that O3F is generally applicable to various types of robot and manipulation tasks. Furthermore, O3F achieves success rates of 72. 4% and 90% in grasping and object collecting tasks, respectively, significantly outperforming baseline methods.

Details

JAIR Journal 2023 Journal Article

Stackelberg Security Games with Contagious Attacks on a Network: Reallocation to the Rescue

Rufan Bai
Haoxing Lin
Xinyu Yang
Xiaowei Wu
Minming Li
Weijia Jia

In the classic network security games, the defender distributes defending resources to the nodes of the network, and the attacker attacks a node, with the objective of maximizing the damage caused. In this paper, we consider the network defending problem against contagious attacks, e.g., the attack at a node u spreads to the neighbors of u and can cause damage at multiple nodes. Existing works that study shared resources assume that the resource allocated to a node can be shared or duplicated between neighboring nodes. However, in the real world, sharing resource naturally leads to a decrease in defending power of the source node, especially when defending against contagious attacks. Therefore, we study the model in which resources allocated to a node can only be transferred to its neighboring nodes, which we refer to as a reallocation process. We show that the problem of computing optimal defending strategy is NP-hard even for some very special cases. For positive results, we give a mixed integer linear program formulation for the problem and a bi-criteria approximation algorithm. Our experimental results demonstrate that the allocation and reallocation strategies our algorithm computes perform well in terms of minimizing the damage due to contagious attacks.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Mixed Strategies for Security Games with General Defending Requirements

Rufan Bai
Haoxing Lin
Xinyu Yang
Xiaowei Wu
Minming Li
Weijia Jia

The Stackelberg security game is played between a defender and an attacker, where the defender needs to allocate a limited amount of resources to multiple targets in order to minimize the loss due to adversarial attack by the attacker. While allowing targets to have different values, classic settings often assume uniform requirements to defend the targets. This enables existing results that study mixed strategies (randomized allocation algorithms) to adopt a compact representation of the mixed strategies. In this work, we initiate the study of mixed strategies for the security games in which the targets can have different defending requirements. In contrast to the case of uniform defending requirement, for which an optimal mixed strategy can be computed efficiently, we show that computing the optimal mixed strategy is NP-hard for the general defending requirements setting. However, we show that strong upper and lower bounds for the optimal mixed strategy defending result can be derived. We propose an efficient close-to-optimal Patching algorithm that computes mixed strategies that use only few pure strategies. We also study the setting when the game is played on a network and resource sharing is enabled between neighboring targets. Our experimental results demonstrate the effectiveness of our algorithm in several large real-world datasets.

PDF Details DOI

JBHI Journal 2021 Journal Article

DBAN: Adversarial Network With Multi-Scale Features for Cardiac MRI Segmentation

Xinyu Yang
Yuan Zhang
Benny Lo
Dongrui Wu
Hongen Liao
Yuan-Ting Zhang

With the development of medical artificial intelligence, automatic magnetic resonance image (MRI) segmentation method is quite desirable. Inspired by the power of deep neural networks, a novel deep adversarial network, dilated block adversarial network (DBAN), is proposed to perform left ventricle, right ventricle, and myocardium segmentation in short-axis cardiac MRI. DBAN contains a segmentor along with a discriminator. In the segmentor, the dilated block (DB) is proposed to capture, and aggregate multi-scale features. The segmentor can produce segmentation probability maps while the discriminator can differentiate the segmentation probability map, and the ground truth at the pixel level. In addition, confidence probability maps generated by the discriminator can guide the segmentor to modify segmentation probability maps. Extensive experiments demonstrate that DBAN has achieved the state-of-the-art performance on the ACDC dataset. Quantitative analyses indicate that cardiac function indices from DBAN are similar to those from clinical experts. Therefore, DBAN can be a potential candidate for short-axis cardiac MRI segmentation in clinical applications.

Details DOI

AAAI Conference 2021 Conference Paper

Defending against Contagious Attacks on a Network with Resource Reallocation

Rufan Bai
Haoxing Lin
Xinyu Yang
Xiaowei Wu
Minming Li
Weijia Jia

In classic network security games, the defender distributes defending resources to the nodes of the network, and the attacker attacks a node, with the objective to maximize the damage caused. Existing models assume that the attack at node u causes damage only at u. However, in many real-world security scenarios, the attack at a node u spreads to the neighbors of u and can cause damage at multiple nodes, e. g. , for the outbreak of a virus. In this paper, we consider the network defending problem against contagious attacks. Existing works that study shared resources assume that the resource allocated to a node can be shared or duplicated between neighboring nodes. However, in real world, sharing resource naturally leads to a decrease in defending power of the source node, especially when defending against contagious attacks. To this end, we study the model in which resources allocated to a node can only be transferred to its neighboring nodes, which we refer to as a reallocation process. We show that this more general model is difficult in two aspects: (1) even for a fixed allocation of resources, we show that computing the optimal reallocation is NP-hard; (2) for the case when reallocation is not allowed, we show that computing the optimal allocation (against contagious attack) is also NP-hard. For positive results, we give a mixed integer linear program formulation for the problem and a bi-criteria approximation algorithm. Our experimental results demonstrate that the allocation and reallocation strategies our algorithm computes perform well in terms of minimizing the damage due to contagious attacks. *Funded by the Science and Technology Development Fund, Macau SAR (File no. SKLIOTSC-2018-2020), the Start-up Research Grant of University of Macau (File no. SRG2020-00020- IOTSC). This work was supported in part by the Science and Technology Development Fund, Macau SAR under File no. 0060/2019/A1, and in part by Research Grant of University of Macau under Grant MYRG2018-00237-FST. † City University of Hong Kong Shenzhen Research Institute, Shenzhen, P. R. China. The work described in this paper was partially sponsored by Project 11771365 supported by NSFC. ‡ BNU-UIC Institute of Artificial Intelligence and Future Networks, Beijing Normal University (Zhuhai), Guangdong, China. The work was partially supported by Chinese National Research Fund (NSFC) Key Project No. 61532013; NSFC grant No. 61872239; and Guangdong Provincial Key Lab of AI and Multimodal Data Processing at BNU-HKBU UIC. Copyright © 2021, Association for the Advancement of Artificial Intelligence (www. aaai. org). All rights reserved.

PDF Details

IJCAI Conference 2019 Conference Paper

On Privacy Protection of Latent Dirichlet Allocation Model Training

Fangyuan Zhao
Xuebin Ren
Shusen Yang
Xinyu Yang

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine learning algorithms, the process of training a LDA model may leak the sensitive information of the training datasets and bring significant privacy risks. To mitigate the privacy issues in LDA, we focus on studying privacy-preserving algorithms of LDA model training in this paper. In particular, we first develop a privacy monitoring algorithm to investigate the privacy guarantee obtained from the inherent randomness of the Collapsed Gibbs Sampling (CGS) process in a typical LDA training algorithm on centralized curated datasets. Then, we further propose a locally private LDA training algorithm on crowdsourced data to provide local differential privacy for individual data contributors. The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms.

PDF Details