Author name cluster

Xiaoyan Gu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

AAAI Conference 2026 Conference Paper

Value-Aligned Prompt Moderation via Zero-Shot Agentic Rewriting for Safe Image Generation

Xin Zhao
Xiaojun Chen
Bingshan Liu
Zeyao Liu
Zhendong Zhao
Xiaoyan Gu

Generative vision-language models like Stable Diffusion demonstrate remarkable capabilities in creative media synthesis, but they also pose substantial risks of producing unsafe, offensive, or culturally inappropriate content when prompted adversarially. Current defenses struggle to align outputs with human values without sacrificing generation quality or incurring high costs. To address these challenges, we introduce VALOR (Value-Aligned LLM-Overseen Rewriter), a modular, zero-shot agentic framework for safer and more helpful text-to-image generation. VALOR integrates layered prompt analysis with human-aligned value reasoning: a multi-level NSFW detector filters lexical and semantic risks; a cultural value alignment module identifies violations of social norms, legality, and representational ethics; and an intention disambiguator detects subtle or indirect unsafe implications. When unsafe content is detected, prompts are selectively rewritten by a large language model under dynamic, role-specific instructions designed to preserve user intent while enforcing alignment. If the generated image still fails a safety check, VALOR optionally performs a stylistic regeneration to steer the output toward a safer visual domain without altering core semantics. Experiments across adversarial, ambiguous, and value-sensitive prompts show that VALOR significantly reduces unsafe outputs by up to 100.00% while preserving prompt usefulness and creativity. These results highlight VALOR as a scalable and effective approach for deploying safe, aligned, and helpful image generation systems in open-world settings.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Capture Global Feature Statistics for One-Shot Federated Learning

Zenghao Guan
Yucan Zhou
Xiaoyan Gu

Traditional Federated Learning (FL) necessitates numerous rounds of communication between the server and clients, posing significant challenges including high communication costs, connection drop risks and susceptibility to privacy attacks. One-shot FL has become a compelling learning paradigm to overcome above drawbacks by enabling the training of a global server model via a single communication round. However, existing one-shot FL methods suffer from expensive computation cost on the server or clients and cannot deal with non-IID (Independent and Identically Distributed) data stably and effectively. To address these challenges, this paper proposes FedCGS, a novel Federated learning algorithm that Capture Global feature Statistics leveraging pre-trained models. With global feature statistics, we achieve training-free and heterogeneity-resistant one-shot FL. Furthermore, we expand its application to personalization scenario, where clients only need execute one extra communication round with server to download global statistics. Extensive experimental results demonstrate the effectiveness of our methods across diverse data-heterogeneity settings.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Know2Vec: A Black-Box Proxy for Neural Network Retrieval

Zhuoyi Shang
Yanwei Liu
Jinxia Liu
Xiaoyan Gu
Ying Ding
Xiangyang Ji

For general users, training a neural network from scratch is usually challenging and labor-intensive. Fortunately, neural network zoos enable them to find a well-performing model for directly use or fine-tuning it in their local environments. Although current model retrieval solutions attempt to convert neural network models into vectors to avoid complex multiple inference processes required for model selection, it is still difficult to choose a suitable model due to inaccurate vectorization and biased correlation alignment between the query dataset and models. From the perspective of knowledge consistency, i.e., whether the knowledge possessed by the model can meet the needs of query tasks, we propose a model retrieval scheme, named Know2Vec, that acts as a black-box retrieval proxy for model zoo. Know2Vec first accesses to models via a black-box interface in advance, capturing vital decision knowledge from models while ensuring their privacy. Next, it employs an effective encoding technique to transform the knowledge into precise model vectors. Secondly, it maps the user's query task to a knowledge vector by probing the semantic relationships within query samples. Furthermore, the proxy ensures the knowledge-consistency between query vector and model vectors within their alignment space, which is optimized through the supervised learning with diverse loss functions, and finally it can identify the most suitable model for a given task during the inference stage. Extensive experiments show that our Know2Vec achieves superior retrieval accuracy against the state-of-the-art methods in diverse neural network retrieval tasks.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Statistics Caching Test-Time Adaptation for Vision-Language Models

Zenghao Guan
Yucan Zhou
Wu Liu
Xiaoyan Gu

Test-time adaptation (TTA) for Vision-Language Models (VLMs) aims to enhance performance on unseen test data. However, existing methods struggle to achieve robust and continuous knowledge accumulation during test time. To address this, we propose Statistics Caching test-time Adaptation (SCA), a novel cache-based approach. Unlike traditional feature-caching methods prone to forgetting, SCA continuously accumulates task-specific knowledge from all encountered test samples. By formulating the reuse of past features as a least squares problem, SCA avoids storing raw features and instead maintains compact, incrementally updated feature statistics. This design enables efficient online adaptation without the limitations of fixed-size caches, ensuring that the accumulated knowledge grows persistently over time. Furthermore, we introduce adaptive strategies that leverage the VLM's prediction uncertainty to reduce the impact of noisy pseudo-labels and dynamically balance multiple prediction sources, leading to more robust and reliable performance. Extensive experiments demonstrate that SCA achieves compelling performance while maintaining competitive computational efficiency.

PDF Details

JBHI Journal 2023 Journal Article

Improving the Quality of Fetal Heart Ultrasound Imaging With Multihead Enhanced Self-Attention and Contrastive Learning

Yingying Zhang
Haogang Zhu
Jian Cheng
Jingyi Wang
Xiaoyan Gu
Jiancheng Han
Ye Zhang
Ying Zhao

Fetal congenital heart disease (FCHD) is a common, serious birth defect affecting ∼1% of newborns annually. Fetal echocardiography is the most effective and important technique for prenatal FCHD diagnosis. The prerequisites for accurate ultrasound FCHD diagnosis are accurate view recognition and high-quality diagnostic view extraction. However, these manual clinical procedures have drawbacks such as, varying technical capabilities and inefficiency. Therefore, the automatic identification of high-quality multiview fetal heart scan images is highly desirable to improve prenatal diagnosis efficiency and accuracy of FCHD. Here, we present a framework for multiview fetal heart ultrasound image recognition and quality assessment that comprises two parts: a multiview classification and localization network (MCLN) and an improved contrastive learning network (ICLN). In the MCLN, a multihead enhanced self-attention mechanism is applied to construct the classification network and identify six accurate and interpretable views of the fetal heart. In the ICLN, anatomical structure standardization and image clarity are considered. With contrastive learning, the absolute loss, feature relative loss and predicted value relative loss are combined to achieve favorable quality assessment results. Experiments show that the MCLN outperforms other state-of-the-art networks by 1. 52–13. 61% when determining the F1 score in six standard view recognition tasks, and the ICLN is comparable to the performance of expert cardiologists in the quality assessment of fetal heart ultrasound images, reaching 97% on a test set within 2 points for the four-chamber view task. Thus, our architecture offers great potential in helping cardiologists improve quality control for fetal echocardiographic images in clinical practice.

Details DOI

AAAI Conference 2019 Conference Paper

Community Focusing: Yet Another Query-Dependent Community Detection

Zhuo Wang
Weiping Wang
Chaokun Wang
Xiaoyan Gu
Bo Li
Dan Meng

As a major kind of query-dependent community detection, community search finds a densely connected subgraph containing a set of query nodes. As density is the major consideration of community search, most methods of community search often find a dense subgraph with many vertices far from the query nodes, which are not very related to the query nodes. Motivated by this, a new problem called community focusing (CF) is studied. It finds a community where the members are close and densely connected to the query nodes. A distance-sensitive dense subgraph structure called β-attention-core is proposed to remove the vertices loosely connected to or far from the query nodes, and a combinational density is designed to guarantee the density of a subgraph. Then CF is formalized as finding a subgraph with the largest combinational density among the β-attention-core subgraphs containing the query nodes with the largest β. Thereafter, effective methods are devised for CF. Furthermore, a speed-up strategy is developed to make the methods scalable to large networks. Extensive experimental results on real and synthetic networks demonstrate the performance of our methods.

PDF Details