Author name cluster

Yuxuan Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

AAAI Conference 2026 Conference Paper

Causality-inspired Federated Learning for Dynamic Spatio-Temporal Graphs

Yuxuan Liu
Wenchao Xu
Haozhao Wang
Zhiming He
Zhaofeng Shi
Chongyang Xu
Peichao Wang
Boyuan Zhang

Federated Graph Learning (FGL) has emerged as a powerful paradigm for decentralized training of graph neural networks while preserving data privacy. However, existing FGL methods are predominantly designed for static graphs and rely on parameter averaging or distribution alignment, which implicitly assume that all features are equally transferable across clients, overlooking both the spatial and temporal heterogeneity and the presence of client-specific knowledge in real-world graphs. In this work, we identify that such assumptions create a vicious cycle of spurious representation entanglement, client-specific interference, and negative transfer, degrading generalization performance in Federated Learning over Dynamic Spatio-Temporal Graphs (FSTG). To address this issue, we propose a novel causality-inspired framework named SC-FSGL, which explicitly decouples transferable causal knowledge from client-specific noise through representation-level interventions. Specifically, we introduce a Conditional Separation Module that simulates soft interventions through client conditioned masks, enabling the disentanglement of invariant spatio-temporal causal factors from spurious signals and mitigating representation entanglement caused by client heterogeneity. In addition, we propose a Causal Codebook that clusters causal prototypes and aligns local representations via contrastive learning, promoting cross-client consistency and facilitating knowledge sharing across diverse spatio-temporal patterns. Experiments on five diverse heterogeneity Spatio-Temporal Graph (STG) datasets show that SC-FSGL outperforms state-of-the-art methods.

PDF Details DOI

TMLR Journal 2026 Journal Article

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction

Yadi Cao
Yuxuan Liu
Liu Yang
Rose Yu
Hayden Schaeffer
Stanley Osher

In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose \textit{Vision In-Context Operator Networks} (VICON), which integrate vision transformer architectures to efficiently process 2D data through patch-wise operations while preserving ICON's adaptability to multi-physics systems and varying timesteps. Evaluated across three fluid dynamics benchmarks, VICON significantly outperforms state-of-the-art baselines DPOT and MPP, reducing the average last-step rollout error by 37.9\% compared to DPOT and 44.7\% compared to MPP, while requiring only 72.5\% and 34.8\% of their respective inference times. VICON naturally supports flexible rollout strategies with varying timestep strides, enabling immediate deployment in \textit{imperfect measurement systems} where sampling frequencies may differ or frames might be dropped—common challenges in real-world settings—without requiring retraining or interpolation. In these realistic scenarios, VICON exhibits remarkable robustness, experiencing only 24.41\% relative performance degradation compared to 71.37\%-74.49\% degradation in baseline methods, demonstrating its versatility for deployment in realistic applications. Our scripts for processing datasets and code are publicly available at https://github.com/Eydcao/VICON.

PDF Details

AAAI Conference 2025 Conference Paper

BiDeV: Bilateral Defusing Verification for Complex Claim Fact-Checking

Yuxuan Liu
Hongda Sun
Wenya Guo
Xinyan Xiao
Cunli Mao
Zhengtao Yu
Rui Yan

Complex claim fact-checking performs a crucial role in disinformation detection. However, existing fact-checking methods struggle with claim vagueness, specifically in effectively handling latent information and complex relations within claims. Moreover, evidence redundancy, where non-essential information complicates the verification process, remains a significant issue. To tackle these limitations, we propose Bilateral Defusing Verification (BiDeV), a novel fact-checking working-flow framework integrating multiple role-played LLMs to mimic the human-expert fact-checking process. BiDeV consists of two main modules: Vagueness Defusing identifies latent information and resolves complex relations to simplify the claim, and Redundancy Defusing eliminates redundant content to enhance the evidence quality. Extensive experimental results on two widely used challenging fact-checking benchmarks (Hover and Feverous-s) demonstrate that our BiDeV can achieve the best performance under both gold and open settings. This highlights the effectiveness of BiDeV in handling complex claims and ensuring precise fact-checking.

PDF Details DOI

EAAI Journal 2025 Journal Article

Hypergraph-driven soft semantics flexible learning for visible–infrared person re-identification

Jiacheng Zhu
Hongwei Ge
Yuxuan Liu
Chunguo Wu
Jiulin Fan

Visible–infrared person re-identification (VI-ReID) aims to match the images of the person across different modalities. The main challenge in the engineering application of VI-ReID lies in the considerable modality gap between visible and infrared images. Existing methods primarily focus on learning low-order hard semantics to reduce the modality gap, such as specific body parts (e. g. , arms, legs, and hands), which are modality-specific and sensitive to cross-modality variations, leading to difficult modality alignment. However, high-order soft semantics contain more modality-invariant features that can be flexibly learned by the model to better align the cross-modality features and reduce the modality gap. To better learn soft semantics, we propose a novel, hypergraph-driven soft semantics flexible learning network (HSFLNet), which extracts hierarchical semantics and flexibly explores the relationships among the extracted soft semantics by using hypergraph neural networks (HGNNs). Specifically, first, we propose a soft semantics mining (SSM) module to capture and flexibly fuse hierarchical features, which integrates low- and high-level semantics across channel and spatial dimensions to align modalities. Second, a hypergraph-driven soft semantics flexible learning (HSFL) module is designed, which employs HGNNs to explore the relationships among soft semantics. These semantics are adaptively learned through gating mechanisms to capture modality-invariant features, thereby reducing the modality gap. We evaluate the performance of HSFLNet on SYSU-MM01, RegDB, and low-light cross-modality (LLCM) datasets, achieving a Rank-1/mean average precision of 75. 09%/72. 11%, 95. 82%/92. 42%, and 65. 42%/68. 55%, respectively. Experimental results demonstrate that HSFLNet outperforms state-of-the-art VI-ReID methods. Our code is available at https: //github. com/Jiacheng813/HSFLNet.

Details DOI

ICRA Conference 2025 Conference Paper

Self-Deformable Magnetic Miniature Robot for Traction Assistance in Endoscopic Submucosal Dissection

Bolan Zhang
Toshiro Yamanaka
Tengo Shu
Yuxuan Liu
Fumihito Arai

Between 1999 and 2020, gastrointestinal cancers were responsible for over three million deaths, emphasizing the critical role of minimally invasive surgical techniques like Endoscopic Submucosal Dissection (ESD) in managing such life-threatening conditions. ESD, which dissects the connective tissue between the mucosal and muscular layers using an electrosurgical knife connected to an endoscope, requires a constant traction force to stabilize tissues and expose underlying anatomical structures. This paper introduces a miniature magnetic flexible robot, actuated by a permanent magnet on a robotic manipulator, designed to enhance ESD by providing traction forces consistently on lesions. The robot was fabricated by casting magnetic silicone composites, and its safe deployment through the endoscope instrument channel was successfully demonstrated, avoiding tissue contact. Experiments in a rubber intestine model validated the feasibility of providing constant traction and 2 DOF orientation control via the robot, allowing real-time fine-tuning of the force direction. This reduces the difficulty and improves the precision and safety of ESD. This research presents a practical method for achieving stable force output in medical miniature robots, particularly in gastrointestinal procedures.

Details

AAAI Conference 2024 Conference Paper

Advancing Video Synchronization with Fractional Frame Analysis: Introducing a Novel Dataset and Model

Yuxuan Liu
Haizhou Ai
Junliang Xing
Xuri Li
Xiaoyi Wang
Pin Tao

Multiple views play a vital role in 3D pose estimation tasks. Ideally, multi-view 3D pose estimation tasks should directly utilize naturally collected videos for pose estimation. However, due to the constraints of video synchronization, existing methods often use expensive hardware devices to synchronize the initiation of cameras, which restricts most 3D pose collection scenarios to indoor settings. Some recent works learn deep neural networks to align desynchronized datasets derived from synchronized cameras and can only produce frame-level accuracy. For fractional frame video synchronization, this work proposes an Inter-Frame and Intra-Frame Desynchronized Dataset (IFID), which labels fractional time intervals between two video clips. IFID is the first dataset that annotates inter-frame and intra-frame intervals, with a total of 382,500 video clips annotated, making it the largest dataset to date. We also develop a novel model based on the Transformer architecture, named InSynFormer, for synchronizing inter-frame and intra-frame. Extensive experimental evaluations demonstrate its promising performance. The dataset and source code of the model are available at https://github.com/yuxuan-cser/InSynFormer.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Modeling Adaptive Inter-Task Feature Interactions via Sentiment-Aware Contrastive Learning for Joint Aspect-Sentiment Prediction

Wei Chen
Yuxuan Liu
Zhao Zhang
Fuzhen Zhuang
Jiang Zhong

Aspect prediction (AP) and sentiment prediction (SP) are representative applications in fine-grained sentiment anal- ysis. They can be considered as sequential tasks, where AP identifies mentioned aspects in a sentence, and SP infers fine-grained sentiments for these aspects. Recent models perform the aspect-sentiment prediction in a joint man-ner, but heavily rely on the feature interactions of aspect and sentiment. One drawback is that they ignore correlation strength varies between aspect features and sentiment fea- tures across different sentences, and employ a fixed feature interaction strategy may limit effective knowledge transfer across tasks. To tackle this issue, in this paper, we propose an Adaptive Inter-task Feature Interaction framework, AIFI, for joint aspect-sentiment prediction. Specifically, we introduce a novel contrast-based alignment method based on contrastive learning. Our approach considers the AP-specific and SP-specific representations of a given sentence as a positive pair, while representation of another random sentence serves as a negative example. Moreover, we propose an inter-task feature correlation network to predict the contrast strength, which is determined by the temperature coefficient in the InfoNCE loss. This dynamic correlation adjustment enhances model’s ability to capture proper feature interactions more efficiently. Experimental results on three datasets validate the effectiveness of our approach.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

Jingyuan Zhu
Shiyu Li
Yuxuan Liu
Jian Yuan
Ping Huang
Jiulong Shan
Huimin Ma

Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the generation quality and the controllability for complex scenes containing multi-class objects and dense objects with occlusions remain limited. This paper presents ODGEN, a novel method to generate high-quality images conditioned on bounding boxes, thereby facilitating data synthesis for object detection. Given a domain-specific object detection dataset, we first fine-tune a pre-trained diffusion model on both cropped foreground objects and entire images to fit target distributions. Then we propose to control the diffusion model using synthesized visual prompts with spatial constraints and object-wise textual descriptions. ODGEN exhibits robustness in handling complex scenes and specific domains. Further, we design a dataset synthesis pipeline to evaluate ODGEN on 7 domain-specific benchmarks to demonstrate its effectiveness. Adding training data generated by ODGEN improves up to 25. 3% mAP@. 50: .95 with object detectors like YOLOv5 and YOLOv7, outperforming prior controllable generative methods. In addition, we design an evaluation protocol based on COCO-2014 to validate ODGEN in general domains and observe an advantage up to 5. 6% in mAP@. 50: .95 against existing methods.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Text Diffusion with Reinforced Conditioning

Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun

Diffusion models have demonstrated exceptional capability in generating high-quality images, videos, and audio. Due to their adaptiveness in iterative refinement, they provide a strong potential for achieving better non-autoregressive sequence generation. However, existing text diffusion models still fall short in their performance due to a challenge in handling the discreteness of language. This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling. Motivated by our findings, we propose a novel Text Diffusion model called TReC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling. Our extensive experiments demonstrate the competitiveness of TReC against autoregressive, non-autoregressive, and diffusion baselines. Moreover, qualitative analysis shows its advanced ability to fully utilize the diffusion process in refining samples.

PDF Details DOI

ICLR Conference 2023 Conference Paper

Fairness-aware Contrastive Learning with Partially Annotated Sensitive Attributes

Fengda Zhang
Kun Kuang 0001
Long Chen 0016
Yuxuan Liu
Chao Wu 0001
Jun Xiao 0001

Learning high-quality representation is important and essential for visual recognition. Unfortunately, traditional representation learning suffers from fairness issues since the model may learn information of sensitive attributes. Recently, a series of studies have been proposed to improve fairness by explicitly decorrelating target labels and sensitive attributes. Most of these methods, however, rely on the assumption that fully annotated labels on target variable and sensitive attributes are available, which is unrealistic due to the expensive annotation cost. In this paper, we investigate a novel and practical problem of Fair Unsupervised Representation Learning with Partially annotated Sensitive labels (FURL-PS). FURL-PS has two key challenges: 1) how to make full use of the samples that are not annotated with sensitive attributes; 2) how to eliminate bias in the dataset without target labels. To address these challenges, we propose a general Fairness-aware Contrastive Learning (FairCL) framework consisting of two stages. Firstly, we generate contrastive sample pairs, which share the same visual information apart from sensitive attributes, for each instance in the original dataset. In this way, we construct a balanced and unbiased dataset. Then, we execute fair contrastive learning by closing the distance between representations of contrastive sample pairs. Besides, we also propose an unsupervised way to balance the utility and fairness of learned representations by feature reweighting. Extensive experimental results illustrate the effectiveness of our method in terms of fairness and utility, even with very limited sensitive attributes and serious data bias.

Details

AAAI Conference 2023 Conference Paper

Learning Instrumental Variable from Data Fusion for Treatment Effect Estimation

Anpeng Wu
Kun Kuang
Ruoxuan Xiong
Minqin Zhu
Yuxuan Liu
Bo Li
Furui Liu
Zhihua Wang

The advent of the big data era brought new opportunities and challenges to draw treatment effect in data fusion, that is, a mixed dataset collected from multiple sources (each source with an independent treatment assignment mechanism). Due to possibly omitted source labels and unmeasured confounders, traditional methods cannot estimate individual treatment assignment probability and infer treatment effect effectively. Therefore, we propose to reconstruct the source label and model it as a Group Instrumental Variable (GIV) to implement IV-based Regression for treatment effect estimation. In this paper, we conceptualize this line of thought and develop a unified framework (Meta-EM) to (1) map the raw data into a representation space to construct Linear Mixed Models for the assigned treatment variable; (2) estimate the distribution differences and model the GIV for the different treatment assignment mechanisms; and (3) adopt an alternating training strategy to iteratively optimize the representations and the joint distribution to model GIV for IV regression. Empirical results demonstrate the advantages of our Meta-EM compared with state-of-the-art methods. The project page with the code and the Supplementary materials is available at https://github.com/causal-machine-learning-lab/meta-em.

PDF Details DOI

NeurIPS Conference 2018 Conference Paper

Meta-Reinforcement Learning of Structured Exploration Strategies

Abhishek Gupta
Russell Mendonca
Yuxuan Liu
Pieter Abbeel
Sergey Levine

Exploration is a fundamental challenge in reinforcement learning (RL). Many current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we study how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm – model agnostic exploration with structured noise (MAESN) – to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation.

PDF Details