Author name cluster

Ning Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

AAAI Conference 2026 Conference Paper

EMODIS: A Benchmark for Context-Dependent Emoji Disambiguation in Large Language Models

Jiacheng Huang
Ning Yu
Xiaoyin Yi

Large language models (LLMs) are increasingly deployed in real-world communication settings, yet their ability to resolve context-dependent ambiguity remains underexplored. In this work, we present EMODIS, a new benchmark for evaluating LLMs' capacity to interpret ambiguous emoji expressions under minimal but contrastive textual contexts. Each instance in EMODIS comprises an ambiguous sentence containing an emoji, two distinct disambiguating contexts that lead to divergent interpretations, and a specific question that requires contextual reasoning. We evaluate both open-source and API-based LLMs, and find that even the strongest models frequently fail to distinguish meanings when only subtle contextual cues are present. Further analysis reveals systematic biases toward dominant interpretations and limited sensitivity to pragmatic contrast. EMODIS provides a rigorous testbed for assessing contextual disambiguation, and highlights the gap in semantic reasoning between humans and LLMs.

PDF Details DOI

EAAI Journal 2026 Journal Article

Situation-aware deep reinforcement learning for unmanned aerial vehicle swarm navigation in dynamic multi-obstacle environments

Hongwei Zhao
Ning Yu
Yanyan Zhao
Juan Feng
Yu Wang

Autonomous navigation for unmanned aerial vehicle (UAV) swarms in dynamic, obstacle-rich environments remains a significant challenge due to unpredictable obstacle movements and complex swarm coordination requirements. Current approaches exhibit critical limitations: traditional algorithms suffer from high computational complexity and poor adaptability, while reinforcement learning (RL) methods often converge to local optima and require extensive training. To address these issues, this study proposes a hybrid AI framework integrating Situation Awareness Path Optimization (SAPO) with an enhanced Deep Deterministic Policy Gradient (DDPG), termed SAPO-DDPG. The SAPO module employs a channel-attention-enhanced CNN–LSTM network for real-time spatiotemporal danger perception, directly tackling environmental unpredictability. Concurrently, the enhanced DDPG incorporates adaptive multi-objective reward formulation and prioritized experience replay to mitigate local optima convergence and accelerate training. Evaluated on realistic simulation datasets, the proposed method consistently outperforms the state-of-the-art TSEB-DDPG baseline. It achieves shorter path lengths (individual: 2. 1%, swarm: 4. 2% reduction) while guaranteeing near-zero collision navigation, demonstrating robust autonomous operation for UAV swarms in complex, dynamic, and cluttered environments.

Details DOI

JBHI Journal 2025 Journal Article

Hierarchical Graph Representation Learning With Multi-Granularity Features for Anti-Cancer Drug Response Prediction

Wei Peng
Jiangzhen Lin
Wei Dai
Ning Yu
Jianxin Wang

Patients with the same type of cancer often respond differently to identical drug treatments due to unique genomic traits. Accurately predicting a patient's response to drug is crucial in guiding treatment decisions, alleviating patient suffering, and improving cancer prognosis. Current computational methods utilize deep learning models trained on extensive drug screening data to predict anti-cancer drug responses based on features of cell lines and drugs. However, the interaction between cell lines and drugs is a complex biological process involving interactions across various levels, from internal cellular and drug structures to the external interactions among different molecules. To address this complexity, we propose a novel Hierarchical graph representation Learning with Multi-Granularity features (HLMG) algorithm for predicting anti-cancer drug responses. The HLMG algorithm combines features at two granularities: the overall gene expression and pathway substructures of cell lines, and the overall molecular fingerprints and substructures of drugs. Subsequently, it constructs a heterogeneous graph including cell lines, drugs, known cell line-drug responses, and the associations between similar cell lines and similar drugs. Through a graph convolutional network model, the HLMG learns the final cell line and drug representations by aggregating features of their multi-level neighbor in the heterogeneous graph. The multi-level neighbors consist of the node self, directly related drugs/cell lines, and indirectly related similar drugs/cell lines. Finally, a linear correlation coefficient decoder is employed to reconstruct the cell line-drug correlation matrix to predict anti-cancer drug responses. Our model was tested on the Genomics of Drug Sensitivity in Cancer (GDSC) and the Cancer Cell Line Encyclopedia (CCLE) databases. Results indicate that HLMG outperforms other state-of-the-art methods in accurately predicting anti-cancer drug responses.

Details DOI

JBHI Journal 2025 Journal Article

Predicting Clinical Anticancer Drug Response of Patients by Using Domain Alignment and Prototypical Learning

Wei Peng
Chuyue Chen
Wei Dai
Ning Yu
Jianxin Wang

Anticancer drug response prediction is crucial in developing personalized treatment plans for cancer patients. However, High-quality patient anticancer drug response data are scarce and cell line data and patient data have different distributions, models trained solely on cell line data perform poorly. Some existing methods predict anticancer drug response by transferring knowledge from the cell line domain to the patient domain using transfer learning. However, the robustness of these classifiers is affected by anomalies in the cell line data, and they do not utilize the knowledge in the unlabeled target domain data. To this end, we proposed a model called DAPL to predict patient responses to anticancer drugs. The model extracts domain-invariant features from cell lines and patients by constructing multiple VAEs and extracts drug features using GNNs. These features are then combined for prototypical learning to train a classifier, resulting in better predictions of patient anticancer drug response. We used the cell line datasets CCLE and GDSC as source domains and the patient datasets TCGA and PDTC as target domains and conducted experiments. The results indicate that DAPL shows excellent performance in predicting patient anticancer drug response compared to other state-of-the-art methods.

Details DOI

TMLR Journal 2025 Journal Article

Survey of Video Diffusion Models: Foundations, Implementations, and Applications

Yimu Wang
Xuye Liu
Wei Pang
Li Ma
Shuai Yuan
Paul Debevec
Ning Yu

Recent advances in diffusion models have revolutionized video generation, offering superior temporal consistency and visual quality compared to traditional generative adversarial networks-based approaches. While this emerging field shows tremendous promise in applications, it faces significant challenges in motion consistency, computational efficiency, and ethical considerations. This survey provides a comprehensive review of diffusion-based video generation, examining its evolution, technical foundations, and practical applications. We present a systematic taxonomy of current methodologies, analyze architectural innovations and optimization strategies, and investigate applications across low-level vision tasks such as denoising and super-resolution. Additionally, we explore the synergies between diffusion-based video generation and related domains, including video representation learning, question answering, and retrieval. Compared to the existing surveys (Lei et al., 2024a;b; Melniket al., 2024; Cao et al., 2023; Xing et al., 2024c) which focus on specific aspects of video generation, such as human video synthesis (Lei et al., 2024a) or long-form content generation (Lei et al., 2024b), our work provides a broader, more updated, and more fine-grained perspective on diffusion-based approaches with a special section for evaluation metrics, industry solutions, and training engineering techniques in video generation. This survey serves as a foundational resource for researchers and practitioners working at the intersection of diffusion models and video generation, providing insights into both the theoretical frameworks and practical implementations that drive this rapidly evolving field.

PDF Details

AAAI Conference 2025 Conference Paper

Text2Data: Low-Resource Data Generation with Textual Control

Shiyu Wang
Yihao Feng
Tian Lan
Ning Yu
Yu Bai
Ran Xu
Huan Wang
Caiming Xiong

Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines. Recognizing the importance of this interface, the machine learning community is investing considerable effort in generating data that is semantically coherent with textual instructions. While strides have been made in text-to-data generation spanning image editing, audio synthesis, video creation, and beyond, low-resource areas characterized by expensive annotations or complex data structures, such as molecules, motion dynamics, and time series, often lack textual labels. This deficiency impedes supervised learning, thereby constraining the application of advanced generative models for text-to-data tasks. In response to these challenges in the low-resource scenario, we propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model. Subsequently, it undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting. Comprehensive experiments demonstrate that Text2Data is able to achieve enhanced performance regarding controllability across various modalities, including molecules, motions and time series, when compared to existing baselines.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

Yuancheng Xu
Jiarui Yao
Manli Shu
Yanchao Sun
Zichu Wu
Ning Yu
Tom Goldstein
Furong Huang

Vision-Language Models (VLMs) excel in generating textual responses from visual inputs, but their versatility raises security concerns. This study takes the first step in exposing VLMs’ susceptibility to data poisoning attacks that can manipulate responses to innocuous, everyday prompts. We introduce Shadowcast, a stealthy data poisoning attack where poison samples are visually indistinguishable from benign images with matching texts. Shadowcast demonstrates effectiveness in two attack types. The first is a traditional Label Attack, tricking VLMs into misidentifying class labels, such as confusing Donald Trump for Joe Biden. The second is a novel Persuasion Attack, leveraging VLMs’ text generation capabilities to craft persuasive and seemingly rational narratives for misinformation, such as portraying junk food as healthy. We show that Shadowcast effectively achieves the attacker’s intentions using as few as 50 poison samples. Crucially, the poisoned samples demonstrate transferability across different VLM architectures, posing a significant concern in black-box settings. Moreover, Shadowcast remains potent under realistic conditions involving various text prompts, training data augmentation, and image compression techniques. This work reveals how poisoned VLMs can disseminate convincing yet deceptive misinformation to everyday, benign users, emphasizing the importance of data integrity for responsible VLM deployments. Our code is available at: https: //github. com/umd-huang-lab/VLM-Poisoning.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition

Chen Yeh
You-Ming Chang
Wei-Chen Chiu
Ning Yu

While widespread access to the Internet and the rapid advancement of generative models boost people's creativity and productivity, the risk of encountering inappropriate or harmful content also increases. To address the aforementioned issue, researchers managed to incorporate several harmful contents datasets with machine learning methods to detect harmful concepts. However, existing harmful datasets are curated by the presence of a narrow range of harmful objects, and only cover real harmful content sources. This restricts the generalizability of methods based on such datasets and leads to the potential misjudgment in certain cases. Therefore, we propose a comprehensive and extensive harmful dataset, VHD11K, consisting of 10, 000 images and 1, 000 videos, crawled from the Internet and generated by 4 generative models, across a total of 10 harmful categories covering a full spectrum of harmful concepts with non-trival definition. We also propose a novel annotation framework by formulating the annotation process as a multi-agent Visual Question Answering (VQA) task, having 3 different VLMs "debate" about whether the given image/video is harmful, and incorporating the in-context learning strategy in the debating process. Therefore, we can ensure that the VLMs consider the context of the given image/video and both sides of the arguments thoroughly before making decisions, further reducing the likelihood of misjudgments in edge cases. Evaluation and experimental results demonstrate that (1) the great alignment between the annotation from our novel annotation framework and those from human, ensuring the reliability of VHD11K; (2) our full-spectrum harmful dataset successfully identifies the inability of existing harmful content detection methods to detect extensive harmful contents and improves the performance of existing harmfulness recognition methods; (3) our dataset outperforms the baseline dataset, SMID, as evidenced by the superior improvement in harmfulness recognition methods. The entire dataset is publicly available: https: //huggingface. co/datasets/denny3388/VHD11K

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Detecting Adversarial Faces Using Only Real Face Self-Perturbations

Qian Wang
Yongqin Xian
Hefei Ling
Jinyuan Zhang
Xiaorui Lin
Ping Li
Jiazhong Chen
Ning Yu

Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate. Even worse, existing techniques require attack data before implementing the defense, making it impractical to defend newly emerging attacks that are unseen to defenders. In this paper, we investigate the intrinsic generality of adv-faces and propose to generate pseudo adv-faces by perturbing real faces with three heuristically designed noise patterns. We are the first to train an adv-face detector using only real faces and their self-perturbations, agnostic to victim facial recognition systems, and agnostic to unseen attacks. By regarding adv-faces as out-of-distribution data, we then naturally introduce a novel cascaded system for adv-face detection, which consists of training data self-perturbations, decision boundary regularization, and a max-pooling-based binary classifier focusing on abnormal local color aberrations. Experiments conducted on LFW and CelebA-HQ datasets with eight gradient-based and two GAN-based attacks validate that our method generalizes to a variety of unseen adversarial attacks.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Learning Prototype Classifiers for Long-Tailed Recognition

Saurabh Sharma
Yongqin Xian
Ning Yu
Ambuj Singh

The problem of long-tailed recognition (LTR) has received attention in recent years due to the fundamental power-law distribution of objects in the real-world. Most recent works in LTR use softmax classifiers that are biased in that they correlate classifier norm with the amount of training data for a given class. In this work, we show that learning prototype classifiers addresses the biased softmax problem in LTR. Prototype classifiers can deliver promising results simply using Nearest-Class-Mean (NCM), a special case where prototypes are empirical centroids. We go one step further and propose to jointly learn prototypes by using distances to prototypes in representation space as the logit scores for classification. Further, we theoretically analyze the properties of Euclidean distance based prototype classifiers that lead to stable gradient-based optimization which is robust to outliers. To enable independent distance scales along each channel, we enhance Prototype classifiers by learning channel-dependent temperature parameters. Our analysis shows that prototypes learned by Prototype classifiers are better separated than empirical centroids. Results on four LTR benchmarks show that Prototype classifier outperforms or is comparable to state-of-the-art methods. Our code is made available at https: //github. com/saurabhsharma1993/prototype-classifier-ltr.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

Can Qin
Shu Zhang
Ning Yu
Yihao Feng
Xinyi Yang
Yingbo Zhou
Huan Wang
Juan Carlos Niebles

Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often fall short in generating images with spatial, structural, or geometric controls. The integration of such controls, which can accommodate various visual conditions in a single unified model, remains an unaddressed challenge. In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. UniControl enables pixel-level-precise image generation, where visual conditions primarily influence the generated structures and language prompts guide the style and context. To equip UniControl with the capacity to handle diverse visual conditions, we augment pretrained text-to-image diffusion models and introduce a task-aware HyperNet to modulate the diffusion models, enabling the adaptation to different C2I tasks simultaneously. Trained on nine unique C2I tasks, UniControl demonstrates impressive zero-shot generation abilities with unseen visual conditions. Experimental results show that UniControl often surpasses the performance of single-task-controlled methods of comparable model sizes. This control versatility positions UniControl as a significant advancement in the realm of controllable visual generation.

PDF Details

IJCAI Conference 2021 Conference Paper

Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis

Yang He
Ning Yu
Margret Keuper
Mario Fritz

The rapid advances in deep generative models over the past years have led to highly realistic media, known as deepfakes, that are commonly indistinguishable from real to human eyes. These advances make assessing the authenticity of visual data increasingly difficult and pose a misinformation threat to the trustworthiness of visual content in general. Although recent work has shown strong detection accuracy of such deepfakes, the success largely relies on identifying frequency artifacts in the generated images, which will not yield a sustainable detection approach as generative models continue evolving and closing the gap to real images. In order to overcome this issue, we propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection. The re-synthesis procedure is flexible, allowing us to incorporate a series of visual tasks - we adopt super-resolution, denoising and colorization as the re-synthesis. We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios involving multiple generators over CelebA-HQ, FFHQ, and LSUN datasets. Source code is available at https: //github. com/SSAW14/BeyondtheSpectrum.

PDF Details DOI

TIST Journal 2019 Journal Article

Reconstruction of Hidden Representation for Robust Feature Extraction

Zeng Yu
Tianrui Li
Ning Yu
Yi Pan
Hongmei Chen
Bing Liu

This article aims to develop a new and robust approach to feature representation. Motivated by the success of Auto-Encoders, we first theoretically analyze and summarize the general properties of all algorithms that are based on traditional Auto-Encoders: (1) The reconstruction error of the input cannot be lower than a lower bound, which can be viewed as a guiding principle for reconstructing the input. Additionally, when the input is corrupted with noises, the reconstruction error of the corrupted input also cannot be lower than a lower bound. (2) The reconstruction of a hidden representation achieving its ideal situation is the necessary condition for the reconstruction of the input to reach the ideal state. (3) Minimizing the Frobenius norm of the Jacobian matrix of the hidden representation has a deficiency and may result in a much worse local optimum value. We believe that minimizing the reconstruction error of the hidden representation is more robust than minimizing the Frobenius norm of the Jacobian matrix of the hidden representation. Based on the above analysis, we propose a new model termed Double Denoising Auto-Encoders (DDAEs), which uses corruption and reconstruction on both the input and the hidden representation. We demonstrate that the proposed model is highly flexible and extensible and has a potentially better capability to learn invariant and robust feature representations. We also show that our model is more robust than Denoising Auto-Encoders (DAEs) for dealing with noises or inessential features. Furthermore, we detail how to train DDAEs with two different pretraining methods by optimizing the objective function in a combined and separate manner, respectively. Comparative experiments illustrate that the proposed model is significantly better for representation learning than the state-of-the-art models.

Details DOI

IROS Conference 2016 Conference Paper

Automated pick-up of carbon nanotubes inside a scanning electron microscope

Yana Guo
Qing Shi
Zhan Yang 0002
Huaping Wang
Ning Yu
Lining Sun
Qiang Huang 0002
Toshio Fukuda

It is of great importance to pick up a single carbon nanotube (CNT) from a bulk of CNTs for nanodevice fabrication. In this study, we have proposed a nanorobotic manipulation system allowing automated pick-up of CNTs based on visual feedback. We utilize histogram normalization for automatic binarization, and it achieves to clearly distinguish CNTs from substrate and other impurities under different image brightness. Furthermore, we develop the gradient orientation inversion (GOI) algorithm to recognize CNT tip and atomic force microscopy (AFM) cantilever. Taking full advantages of the geometrical characteristics of CNT and AFM cantilever, GOI is proved to be quite robust. We have designed segment detection method (SDM) to successfully separate the AFM cantilever and CNT, whereas the contact detection between them is achieved by analyzing the straightness variation. Preliminary experimental results imply that our method shows high promise in realistic fabrication of nanodevices.

Details

IROS Conference 2015 Conference Paper

Automated bubble-based assembly of cell-laden microgels into vascular-like microtubes

Xiaoming Liu 0007
Qing Shi
Huaping Wang
Tao Sun 0001
Ning Yu
Qiang Huang 0002
Toshio Fukuda

Fabrication of artificial blood vessels in micro scale significantly benefits the regeneration of functional human vascular networks. In this paper, we develop an efficient multi-microrobotic system with an innovative motorized sample holder (MSH) and two manipulators. Air is injected into the solution through a glass pipette fixed on one manipulator to create bubbles. These bubbles conduct a regular rising movement, which is utilized to assemble the 2D ring-shaped microgels fabricated in a simple micro fluidic device. With this novel bubble-based method and the robotic system, we achieve the automation of the assembly. A 1. 2 mm long vascular-like microtube with an outer diameter of 200 µm is fabricated. The whole process of the bubble-based assembly is visually observed and analyzed with side view. Key parameters are characterized to improve the assembly. Results show that the automated bubble-based assembly success rate is 100% and average time cost of assembling every microgel is as low as 3. 25s.

Details