Author name cluster

Yi Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

149 papers

2 author rows

EAAI Journal 2026 Journal Article

A solid-spherical neural operator for residual stress inversion of components with varying geometries

Zhiwei Zhao
Changqing Liu
Yi Yang
Yingguang Li

Details DOI

AAAI Conference 2026 Conference Paper

Bayes-Optimal Fair Classification with Multiple Sensitive Features

Yi Yang
Yinghui Huang
Xiangyu Chang

Existing theoretical work on Bayes-optimal fair classifiers usually considers a single (binary) sensitive feature. In practice, individuals are often defined by multiple sensitive features. In this paper, we characterize the Bayes-optimal fair classifier for multiple sensitive features under general approximate fairness measures, including *mean difference* (MD) and *mean ratio* (MR). We show that these approximate measures for existing group fairness notions, including Demographic Parity, Equal Opportunity, Predictive Equality, and Accuracy Parity, are linear transformations of selection rates for specific groups defined by both labels and sensitive features. We then characterize that Bayes-optimal fair classifiers for multiple sensitive features under both MD and MR become instance-dependent thresholding rules that rely on a weighted sum of these group membership probabilities. Our framework applies to both attribute-aware and attribute-blind settings and can accommodate composite fairness notions like Equalized Odds. Building on this, we propose two practical algorithms for Bayes-optimal fair classification via in-processing and post-processing. We show empirically that our methods compare favorably to existing methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra

Yiwen Zhang
Keyan Ding
Yihang Wu
Xiang Zhuang
Yi Yang
Qiang Zhang
Huajun Chen

Retrieving molecular structures from tandem mass spectra is a crucial step in rapid compound identification. Existing retrieval methods, such as traditional mass spectral library matching, suffer from limited spectral library coverage, while recent cross-modal representation learning frameworks often encounter modality misalignment, resulting in suboptimal retrieval accuracy and generalization. To address these limitations, we propose GLMR, a Generative Language Model-based Retrieval framework that mitigates the cross-modal misalignment through a two-stage process. In the pre-retrieval stage, a contrastive learning-based model identifies top candidate molecules as contextual priors for the input mass spectrum. In the generative retrieval stage, these candidate molecules are integrated with the input mass spectrum to guide a generative model in producing refined molecular structures, which are then used to re-rank the candidates based on molecular similarity. Experiments on both MassSpecGym and the proposed MassRET-20k dataset demonstrate that GLMR significantly outperforms existing methods, achieving over 40% improvement in top-1 accuracy and exhibiting strong generalizability.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DLVINet: Advancing Dual-Lens Video Inpainting Beyond Parallax Constraints

Zhiliang Wu
Kun Li
Yunqiu Xu
Hehe Fan
Yi Yang

Dual-lens video inpainting aims to simultaneously restore missing or corrupted contents in videos captured by each lens of binocular systems. Although preliminary explorations have been conducted, existing methods still face two key challenges: limited exploitation of long-range reference information and inadequate modeling of inter-lens consistency in non-standard binocular systems. In this paper, we propose a novel dual-lens video inpainting framework named DLVINet, which addresses these challenges with two core components. Firstly, we develop a sparse spatial-temporal transformer (SSTT) that effectively utilizes the information from distant frames to complete the video contents of each lens individually. By employing sparse spatial-temporal attention with a channel selection mechanism, SSTT not only restores missing regions, but also avoids introducing redundant or irrelevant information. Furthermore, SSTT introduces a multi-scale feed-forward network to enrich the multi-scale representation of completed features. Secondly, we design a cross-lens texture transformer (CLTT) to model inter-lens consistency. By interacting with corresponding features between lenses under the guidance of cross-attention, CLTT captures global inter-lens correspondences. Such a design enables effective cross-view information modeling without being constrained by horizontal parallax, which is particularly critical for non-standard binocular systems. Extensive experiments demonstrate the effectiveness of our DLVINet.

PDF Details DOI

JBHI Journal 2026 Journal Article

Dual-Branch Attention-Based Frequency Domain Network for Cross-Subject SSVEP-BCIs

Yi Yang
Ze Wang
Ziyu Jia
Boyu Wang
Shangen Zhang
Chi Man Wong
Xiaorong Gao
Tzyy-Ping Jung

Steady-state visual evoked potential-based brain-computer interfaces (SSVEP-BCIs) hold significant promise for enabling high-speed human-computer interaction in real-world scenarios. However, existing frequency-domain decoding methods treat frequency spectrum features (the real and imaginary spectrum features) as a single feature without considering their unique spatial and spectral characteristics, resulting in insufficient generalizable features and limited classification accuracy in cross-subject scenarios. To address this issue, we propose a Dual-Branch Attention-Based Frequency Domain Network (DB-AFDNet) to independently decode real and imaginary spectral components, aiming to acquire more discriminative and generalizable features for cross-subject applications. Specifically, we construct inter-branch attention similarity constraints to encourage the two branches to have similar attention properties, promoting to learn the consensus characteristics in the dual branches. Furthermore, we propose intra-branch orthogonality constraints to explore branch-specific discriminative features to learn generalizable features. Experimental studies on two public datasets, the Benchmark and Beta datasets, demonstrate that DB-AFDNet outperforms state-of-the-art methods in cross-subject classification, achieving a relative improvement of 1. 36 $\%$ and 1. 45 $\%$, respectively.

Details DOI

EAAI Journal 2026 Journal Article

High-accuracy few-shot fault diagnosis for smart hydraulic systems using contrastive learning enhanced categorical generative adversarial network

Jiahao Wu
Yi Yang
Xiaohan Tang
Xinyang Zhou
Chengliang Liu
Jianfeng Tao
Chenggang Yuan

Details DOI

AAAI Conference 2026 Conference Paper

HiMo: High-Speed Objects Motion Compensation in Point Clouds (Abstract Reprint)

Qingwen Zhang
Ajinkya Khoche
Yi Yang
Li Ling
Sina Sharif Mansouri
Olov Andersson
Patric Jensfelt

LiDAR point cloud is essential for autonomous vehicles, but motion distortions from dynamic objects degrade the data quality. While previous work has considered distortions caused by ego motion, distortions caused by other moving objects remain largely overlooked, leading to errors in object shape and position. This distortion is particularly pronounced in high-speed environments such as highways and in multi-LiDAR configurations, a common setup for heavy vehicles. To address this challenge, we introduce HiMo, a pipeline that repurposes scene flow estimation for non-ego motion compensation, correcting the representation of dynamic objects in point clouds. We further propose SeFlow++, a real-time scene flow estimator that achieves state-of-the-art performance on both scene flow and motion compensation. We validate HiMo through extensive experiments on Argoverse 2, ZOD and a newly collected real-world dataset featuring highway driving and multi-LiDAR-equipped heavy vehicles.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Insert Anything: Image Insertion via In-Context Editing in DiT

Wensong Song
Hong Jiang
Zongxin Yang
Zheqiao Cheng
Ruijie Quan
Yi Yang

This work presents Insert Anything, a unified framework for reference-based image insertion that seamlessly integrates objects from reference images into target scenes under flexible, user-specified control guidance. Instead of training separate models for individual tasks, our approach is trained once on our new AnyInsertion dataset, the first open-source large-scale dataset specifically designed for reference image–based image editing, comprising 136K prompt-image pairs covering diverse tasks such as person, object, and garment insertion--and effortlessly generalizes to a wide range of insertion scenarios. Such a challenging setting requires capturing both identity features and fine-grained details, while allowing versatile local adaptations in style, color, and texture. To this end, we propose to leverage the multimodal attention of the Diffusion Transformer (DiT) to support both mask- and text-guided editing. Furthermore, we introduce an in-context editing mechanism that treats the reference image as contextual information, employing two prompting strategies to harmonize the inserted elements with the target scene while faithfully preserving their distinctive features. Extensive experiments on AnyInsertion, DreamBooth, and VTON-HD benchmarks demonstrate that our method consistently outperforms existing alternatives, underscoring its great potential in real-world applications such as creative content generation, virtual try-on, and scene composition.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Melodia: Training-Free Music Editing Guided by Attention Probing in Diffusion Models

Yi Yang
Haowen Li
Tianxiang Li
Boyu Cao
Xiaohan Zhang
Liqun Chen
Qi Liu

Text-to-music generation technology is progressing rapidly, creating new opportunities for musical composition and editing. However, existing music editing methods often fail to preserve the source music's temporal structure, including melody and rhythm, when altering particular attributes like instrument, genre, and mood. To address this challenge, this paper conducts an in-depth probing analysis on attention maps within AudioLDM 2, a diffusion-based model commonly used as the backbone for existing music editing methods. We reveal a key finding: cross-attention maps encompass details regarding distinct musical characteristics, and interventions on these maps frequently result in ineffective modifications. In contrast, self-attention maps are essential for preserving the temporal structure of the source music during its conversion into the target music. Building upon this understanding, we present Melodia, a training-free technique that selectively manipulates self-attention maps in particular layers during the denoising process and leverages an attention repository to store source music information, achieving accurate modification of musical characteristics while preserving the original structure without requiring textual descriptions of the source music. Additionally, we propose two novel metrics to better evaluate music editing methods. Both objective and subjective experiments demonstrate that our approach achieves superior results in terms of textual adherence and structural integrity across various datasets. This research enhances comprehension of internal mechanisms within music generation models and provides improved control for music creation.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Oscillation Inversion: Training-Free Image and Video Enhancement Through Oscillated Latents in Large Flow Models

Yan Zheng
Zhenxiao Liang
Xiaoyan Cong
Yi Yang
Lanqing Guo
Yuehao Wang
Peihao Wang
Zhangyang Wang

We explore the oscillatory behavior observed in inversion methods applied to large-scale flow models, including text-to-image and text-to-video. By employing an augmented fixed-point-inspired iterative approach to invert real-world images, we observe that the solution does not achieve convergence, instead oscillating between distinct clusters. Through both experiments on synthetic data, text-to-image and text-to-video, we demonstrate that these oscillating clusters exhibit notable semantic coherence. We offer theoretical insights, showing that this behavior arises from oscillatory dynamics in flow models. Building on this understanding, we introduce a simple and fast distribution transfer technique that facilitates training-free image and video editing/enhancement. Furthermore, we provide quantitative results demonstrating the effectiveness of our method on tasks such as image enhancement, editing, and reconstruction. Notably, our approach enables the transformation of image-only enhancers and editors into lightweight, video-capable tools—without additional training—highlighting its practical versatility and impact.

PDF Details DOI

AIIM Journal 2026 Journal Article

Syndrome differentiation of Traditional Chinese Medicine via multiple knowledge enhancement with Kolmogorov–Arnold Theorem

Yi Yang
Xuxiang Lu
Wenrong An
Haifeng Wei
Xiang Li
Pingping Wang
Benzheng Wei

Details DOI

JBHI Journal 2026 Journal Article

Unified Online Adaptation Framework for Correlation Analysis-based Spatial Filtering Methods in SSVEP-based BCIs

Ze Wang
Lu Shen
Xinran Mi
Leqian Cheng
Yi Yang
Boyu Wang
Tzyy-Ping Jung
Feng Wan

Online adaptation is a promising technique for achieving calibration-free recognition in user-friendly brain-computer interfaces (BCIs) but remains underexplored for steady-state visual evoked potential (SSVEP) recognition. In our previous work on online multi-stimulus canonical correlation analysis (OMSCCA), we introduced a state-of-the-art scheme for the online adaptation of SSVEP spatial filters. Despite its effectiveness, this approach can not be directly extended to other advanced spatial filtering methods, thereby seriously limiting the broader development of calibration-free algorithms. To address this limitation, we propose a unified online adaptation frame work for correlation analysis (CA)-based spatial filtering methods, encompassing both spatial filter computation and utilization. Specifically, we extend the least-squares (LS) unified framework originally designed for full calibration with large amounts of training data to the online adaptation scenario without any pre-calibration, thereby enabling continuous updates of spatial filters. Moreover, to sufficiently utilize spatial filters, we introduce a cross-stimulus transfer method for online adaptation of the common impulse response and generation of user-specific templates for all stimuli using limited online unlabeled data. Finally, leveraging the proposed unified framework, we adapt three advanced spatial filtering methods from their calibration based counter parts to online adaptation paradigms and validate their performance through simulation studies. Our results demonstrate the framework's effectiveness in promoting the development ofzero-calibration SSVEP-based BCIs. Compared to the OMSCCA, the proposed online adaptation methods canimprove the recognition performance by more than 12%. This work provides a generalizable approach for transforming existing calibration-based methods into adaptive, user-friendly solutions for practical BCI applications.

Details DOI

NeurIPS Conference 2025 Conference Paper

3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization

Yuze Hao
Linchao Zhu
Yi Yang

Inverse design aims to design the input variables of a physical system to optimize a specified objective function, typically formulated as a search or optimization problem. However, in 3D domains, the design space grows exponentially, rendering exhaustive grid-based searches infeasible. Recent advances in deep learning have accelerated inverse design by providing powerful generative priors and differentiable surrogate models. Nevertheless, current methods tend to approximate the 3D design space using 2D projections or fine-tune existing 3D shapes. These approaches sacrifice volumetric detail and constrain design exploration, preventing true 3D design from scratch. In this paper, we propose a 3D Inverse Design (3DID) framework that directly navigates the 3D design space by coupling a continuous latent representation with a physics-aware optimization strategy. We first learn a unified physics–geometry embedding that compactly captures shape and physical field data in a continuous latent space. Then, we introduce a two-stage strategy to perform physics-aware optimization. In the first stage, a gradient-guided diffusion sampler explores the global latent manifold. In the second stage, an objectivedriven, topology-preserving refinement further sculpts each candidate toward the target objective. This enables 3DID to generate high-fidelity 3D geometries, outperforming existing methods in both solution quality and design versatility.