Author name cluster

Yan Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers

2 author rows

AAAI Conference 2026 Conference Paper

Effective Robotic Cloth Grasping Through Suppressing False Discoveries

Xingyu Zhu
Zhiwen Tu
Yan Wu
Shan Luo
Hechang Chen
Yixing Gao

Enabling robots to grasp disorganized cloth for efficient storage is valuable in robot-assisted room organization. Diverse deformations of cloth and the stacking of multiple items limit grasping-pose estimation that relies on annotations. This necessitates segmenting each cloth item in an unsupervised manner before estimating the grasping position. However, existing segmentation methods primarily focus on improving metrics such as Intersection-over-Union and Pixel Accuracy, which cannot effectively measure the segmentation errors of the cloth area and thus lead to failure grasping position estimation. To address this challenge, we use False Discovery Rate (FDR) as a novel measure of segmentation errors and analyze its impact on grasping success. Our preliminary study reveals a negative correlation between segmentation FDR and grasping success rate, highlighting the need for more reliable segmentation in cluttered cloth scenarios. Therefore, we propose an unsupervised cloth segmentation network based on feature distance-weighted constraints, designed to reduce the false discovery rate in cloth area perception without requiring expensive pixel-level manual annotations. Additionally, to estimate the grasping position on the perceived cloth area, we introduce a strategy based on cloth surface wrinkle analysis, which operates without the need for annotations or training. By integrating the proposed segmentation network and grasping strategy, we develop a robotic system capable of sequentially grasping cluttered cloth from a table. Extensive real-world robotic experiments demonstrate the effectiveness of our approach, outperforming multiple baseline methods in segmentation FDR and grasping success rate.

PDF Details DOI

YNIMG Journal 2026 Journal Article

Neural signatures of context-dependent trust: How strategic interaction and social value orientation shape prosocial decisions

Yan Wu
Davood Bayat
Fatemeh Zahra Shahraki Pour
Frank Krueger

Trust is a cornerstone of human cooperation, yet it unfolds differently depending on who we interact with and under what circumstances. Understanding how the brain integrates social traits and contextual demands is key to explaining why we sometimes choose to trust-and sometimes refrain. Although prior studies have identified neural regions involved in trust, it remains unclear how dispositional factors, such as Social Value Orientation (SVO), interact with the strategic context of the decision. Few experiments have jointly manipulated both partner characteristics and game structure, limiting our understanding of how context-dependent trust is represented in the brain. This study examined how partner SVO and game context jointly shape trust behavior and its neural correlates. Thirty-one adults completed a multi-game fMRI paradigm, acting as trustors in the Trust Game (TG), which involves reciprocity, and the Tripled Dictator Game (TDG), assessing altruism. Partner SVOs ranged from aggressive to altruistic. Behaviorally, participants transferred more with prosocial partners, particularly in the TG, and this tendency was strongest among individuals who were themselves more prosocial, indicating that personal dispositions amplify sensitivity to cooperative partners in strategic contexts. Neurally, activity in the precuneus/posterior cingulate cortex increased with higher amount sent during the TG, and regions involved in social cognition, such as the right angular gyrus, reflected how the brain distinguishes cooperative from selfish partners depending on whether trust requires reciprocity. In conclusion, strategic context dynamically modulates both the behavioral expression and neural representation of trust, showing how social preferences and situational demands jointly guide prosocial decision-making.

Details DOI

AAAI Conference 2025 Conference Paper

DiMSOD: A Diffusion-Based Framework for Multi-Modal Salient Object Detection

Shuo Zhang
Jiaming Huang
Wenbing Tang
Yan Wu
Terrence Hu
Xiaogang Xu
Jing Liu

Multi-modal salient object detection (SOD) through the integration of additional data such as depth or thermal information has become a significant task in computer vision during recent years. Traditionally, the challenges of identifying salient objects in RGB, RGB-D (Depth), and RGB-T (Thermal) images are tackled separately. However, without intricate cross-modal fusion strategies, such approaches struggle to effectively integrate multi-modal information, often resulting in poorly defined object edges or overconfident inaccurate predictions. Recent studies have shown that designing a unified end-to-end framework to handle all three types of SOD tasks simultaneously is both necessary and difficult. To address this need, we propose a novel approach that treats multi-modal SOD as a conditional mask generation task utilizing diffusion models. We introduce DiMSOD, which enables the concurrent use of local (depth maps, thermal maps) and global controls (original images) within a unified model for progressive denoising and refined prediction. DiMSOD is efficient, only requiring fine-tuning of our newly introduced modules on the existing stable diffusion, which not only reduces the fine-tuning cost, making it more viable for practical use, but also enhances the integration of multi-modal conditional controls. Specifically, we have developed modules including SOD-ControlNet, Feature Adaptive Network (FAN), and Feature Injection Attention Network (FIAN) to enhance the model's performance. Extensive experiments demonstrate that DiMSOD efficiently detects salient objects across RGB, RGB-D, and RGB-T datasets, achieving superior performance compared to previous well-established methods.

PDF Details DOI

ICML Conference 2025 Conference Paper

LEMoN: Label Error Detection using Multimodal Neighbors

Haoran Zhang 0003
Aparna Balagopalan
Nassim Oufattole
Hyewon Jeong
Yan Wu
Jiacheng Zhu
Marzyeh Ghassemi

Large repositories of image-caption pairs are essential for the development of vision-language models. However, these datasets are often extracted from noisy data scraped from the web, and contain many mislabeled instances. In order to improve the reliability of downstream models, it is important to identify and filter images with incorrect captions. However, beyond filtering based on image-caption embedding similarity, no prior works have proposed other methods to filter noisy multimodal data, or concretely assessed the impact of noisy captioning data on downstream training. In this work, we propose, theoretically justify, and empirically validate LEMoN, a method to identify label errors in image-caption datasets. Our method leverages the multimodal neighborhood of image-caption pairs in the latent space of contrastively pretrained multimodal models to automatically identify label errors. Through empirical evaluations across eight datasets and twelve baselines, we find that LEMoN outperforms the baselines by over 3% in label error detection, and that training on datasets filtered using our method improves downstream captioning performance by more than 2 BLEU points over noisy training.

Details

JBHI Journal 2025 Journal Article

NPENN: A Noise Perturbation Ensemble Neural Network for Microbiome Disease Phenotype Prediction

Zhen Cui
Yan Wu
Qin-Hu Zhang
Si-Guo Wang
Zhen-Hao Guo

With advances in microbiomics, the crucial role of microbes in disease progression is increasingly recognized. However, predicting disease phenotypes using microbiome data remains challenging due to data complexity, heterogeneity, and limited model generalization. Current methods often depend on specific datasets and are vulnerable to adversarial attacks. To address these issues, this paper introduces a novel Noise Perturbation Ensemble Neural Network model (NPENN), which combines noise mechanisms with Gradient Boosting (GB) techniques for robust neural network ensemble learning. NPENN, validated on multiple microbiome datasets, shows superior accuracy and generalization compared to traditional methods, effectively handling data complexity and variability. This approach enhances model robustness and feature learning by integrating GB prior knowledge. Additionally, the study explores microbial community roles in various diseases, providing insights into disease mechanisms and potential biomarkers for personalized precision diagnosis and treatment strategies.

Details DOI

NeurIPS Conference 2025 Conference Paper

PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

Yan Wu
Esther Wershof
Sebastian Schmon
Marcel Nassar
Błażej Osiński
Ridvan Eksi
Zichao Yan
Rory Stark

We introduce a comprehensive framework for modeling single cell transcriptomic responses to perturbations, aimed at standardizing benchmarking in this rapidly evolving field. Our approach includes a modular and user-friendly model development and evaluation platform, a collection of diverse perturbational datasets, and a set of metrics designed to fairly compare models and dissect their performance. Through extensive evaluation of both published and baseline models across diverse datasets, we highlight the limitations of widely used models, such as mode collapse. We also demonstrate the importance of rank metrics which complement traditional model fit measures, such as RMSE, for validating model effectiveness. Notably, our results show that while no single model architecture clearly outperforms others, simpler architectures are generally competitive and scale well with larger datasets. Overall, this benchmarking exercise sets new standards for model evaluation, supports robust model development, and furthers the use of these models to simulate genetic and chemical screens for therapeutic discovery.

PDF Details

EAAI Journal 2024 Journal Article

Prediction of the transient emission characteristics from diesel engine using temporal convolutional networks

Jianxiong Liao
Jie Hu
Peng Chen
Lei Zhu
Yan Wu
Zhizhou Cai
Hanming Wu
Maoxuan Wang

In order to predict the transient emission characteristics from diesel engine accurately and quickly, a novel prediction model, based on temporal convolutional networks (TCN) that incorporates the dilated convolutions and residual connections, was presented in the paper. Firstly, 1800 samples from the World Harmonized Transient Cycle (WHTC) were employed to train and validate the model. A Random Forest algorithm was used to select six top important variables as inputs to reduce the data dimensionality. Then the effect of model hyperparameters on the prediction performance was discussed and the optimal hyperparameter combination was obtained by a particle swarm optimization (PSO) algorithm. The optimized TCN model showed a coefficient of determination value (R2) above 0. 972 for training dataset and 0. 941 for validation dataset, respectively. The root mean squared error (RMSE) and the mean absolute error (MAE) were relatively low. Finally, the measured data from World Harmonized Steady Cycle (WHSC) was used to test model, and the average R2 value of 0. 936 demonstrated that TCN model has excellent robustness and generalization. Moreover, a comparative investigation between TCN model and other advanced algorithms, including BP, GBRT, XGBoost, RNN, LSTM and Transformer, was also conducted. The result showed that TCN model has not only higher accuracy, but also has less computing time. This demonstrates that it is a promising method to predict the emission characteristics of diesel engine.

Details DOI

NeurIPS Conference 2023 Conference Paper

AbDiffuser: full-atom generation of in-vitro functioning antibodies

Karolis Martinkus
Jan Ludwiczak
WEI-CHING LIANG
Julien Lafrance-Vanasse
Isidro Hotzel
Arvind Rajpal
Yan Wu
Kyunghyun Cho

We introduce AbDiffuser, an equivariant and physics-informed diffusion model for the joint generation of antibody 3D structures and sequences. AbDiffuser is built on top of a new representation of protein structure, relies on a novel architecture for aligned proteins, and utilizes strong diffusion priors to improve the denoising process. Our approach improves protein diffusion by taking advantage of domain knowledge and physics-based constraints; handles sequence-length changes; and reduces memory complexity by an order of magnitude, enabling backbone and side chain generation. We validate AbDiffuser in silico and in vitro. Numerical experiments showcase the ability of AbDiffuser to generate antibodies that closely track the sequence and structural properties of a reference set. Laboratory experiments confirm that all 16 HER2 antibodies discovered were expressed at high levels and that 57. 1% of the selected designs were tight binders.

PDF Details

IROS Conference 2023 Conference Paper

Learning Deep Sensorimotor Policies for Vision-Based Autonomous Drone Racing

Jiawei Fu 0003
Yunlong Song
Yan Wu
Fisher Yu 0001
Davide Scaramuzza 0001

The development of effective vision-based algorithms has been a significant challenge in achieving autonomous drones, which promise to offer immense potential for many real-world applications. This paper investigates learning deep sensorimotor policies for vision-based drone racing, which is a particularly demanding setting for testing the limits of an algorithm. Our method combines feature representation learning to extract task-relevant feature representations from high-dimensional image inputs with a learning-by-cheating framework to train a deep sensorimotor policy for vision-based drone racing. This approach eliminates the need for globally-consistent state estimation, trajectory planning, and handcrafted control design, allowing the policy to directly infer control commands from raw images, similar to human pilots. We conduct experiments using a realistic simulator and show that our vision-based policy can achieve state-of-the-art racing performance while being robust against unseen visual disturbances. Our study suggests that consistent feature embeddings are essential for achieving robust control performance in the presence of visual disturbances. The key to acquiring consistent feature embeddings is utilizing contrastive learning along with data augmentation. Video: https://youtu.be/AX_fcnW9yqE

Details

TMLR Journal 2023 Journal Article

On a continuous time model of gradient descent dynamics and instability in deep learning

Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin

The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.

PDF Details

ICLR Conference 2021 Conference Paper

Kanerva++: Extending the Kanerva Machine With Differentiable, Locally Block Allocated Latent Memory

Jason Ramapuram
Yan Wu
Alexandros Kalousis

Episodic and semantic memory are critical components of the human memory model. The theory of complementary learning systems (McClelland et al., 1995) suggests that the compressed representation produced by a serial event (episodic memory) is later restructured to build a more generalized form of reusable knowledge (semantic memory). In this work, we develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory via a hierarchical latent variable model. We take inspiration from traditional heap allocation and extend the idea of locally contiguous memory to the Kanerva Machine, enabling a novel differentiable block allocated latent memory. In contrast to the Kanerva Machine, we simplify the process of memory writing by treating it as a fully feed forward deterministic process, relying on the stochasticity of the read key distribution to disperse information within the memory. We demonstrate that this allocation scheme improves performance in memory conditional image generation, resulting in new state-of-the-art conditional likelihood values on binarized MNIST (≤41.58 nats/image) , binarized Omniglot (≤66.24 nats/image), as well as presenting competitive performance on CIFAR10, DMLab Mazes, Celeb-A and ImageNet32×32.

Details

AAAI Conference 2021 Conference Paper

Neural Architecture Search as Sparse Supernet

Yan Wu
Aoming Liu
Zhiwu Huang
Siwei Zhang
Luc Van Gool

This paper aims at enlarging the problem of Neural Architecture Search (NAS) from Single-Path and Multi-Path Search to automated Mixed-Path Search. In particular, we model the NAS problem as a sparse supernet using a new continuous architecture representation with a mixture of sparsity constraints. The sparse supernet enables us to automatically achieve sparsely-mixed paths upon a compact set of nodes. To optimize the proposed sparse supernet, we exploit a hierarchical accelerated proximal gradient algorithm within a bi-level optimization framework. Extensive experiments on Convolutional Neural Network and Recurrent Neural Network search demonstrate that the proposed method is capable of searching for compact, general and powerful neural architectures.

PDF Details

IJCAI Conference 2021 Conference Paper

Neural Architecture Search of SPD Manifold Networks

Rhea Sanjay Sukthanker
Zhiwu Huang
Suryansh Kumar
Erik Goron Endsjo
Yan Wu
Luc Van Gool

In this paper, we propose a new neural architecture search (NAS) problem of Symmetric Positive Definite (SPD) manifold networks, aiming to automate the design of SPD neural architectures. To address this problem, we first introduce a geometrically rich and diverse SPD neural architecture search space for an efficient SPD cell design. Further, we model our new NAS problem with a one-shot training process of a single supernet. Based on the supernet modeling, we exploit a differentiable NAS algorithm on our relaxed continuous search space for SPD neural architecture search. Statistical evaluation of our method on drone, action, and emotion recognition tasks mostly provides better results than the state-of-the-art SPD networks and traditional NAS algorithms. Empirical results show that our algorithm excels in discovering better performing SPD network design and provides models that are more than three times lighter than searched by the state-of-the-art NAS algorithms.

PDF Details DOI

AAAI Conference 2021 System Paper

TAILOR: Teaching with Active and Incremental Learning for Object Registration

Qianli Xu
Nicolas Gauthier
Wenyu Liang
Fen Fang
Hui Li Tan
Ying Sun
Yan Wu
Liyuan Li

When deploying a robot to a new task, one often has to train it to detect novel objects, which is time-consuming and laborintensive. We present TAILOR - a method and system for object registration with active and incremental learning. When instructed by a human teacher to register an object, TAILOR is able to automatically select viewpoints to capture informative images by actively exploring viewpoints, and employs a fast incremental learning algorithm to learn new objects without potential forgetting of previously learned objects. We demonstrate the effectiveness of our method with a KUKA robot to learn novel objects used in a real-world gearbox assembly task through natural interactions.

PDF Details

TIST Journal 2020 Journal Article

Superpixel Region Merging Based on Deep Network for Medical Image Segmentation

Hui Liu
Haiou Wang
Yan Wu
Lei Xing

Automatic and accurate semantic segmentation of pathological structures in medical images is challenging because of noisy disturbance, deformable shapes of pathology, and low contrast between soft tissues. Classical superpixel-based classification algorithms suffer from edge leakage due to complexity and heterogeneity inherent in medical images. Therefore, we propose a deep U-Net with superpixel region merging processing incorporated for edge enhancement to facilitate and optimize segmentation. Our approach combines three innovations: (1) different from deep learning--based image segmentation, the segmentation evolved from superpixel region merging via U-Net training getting rich semantic information, in addition to gray similarity; (2) a bilateral filtering module was adopted at the beginning of the network to eliminate external noise and enhance soft tissue contrast at edges of pathogy; and (3) a normalization layer was inserted after the convolutional layer at each feature scale, to prevent overfitting and increase the sensitivity to model parameters. This model was validated on lung CT, brain MR, and coronary CT datasets, respectively. Different superpixel methods and cross validation show the effectiveness of this architecture. The hyperparameter settings were empirically explored to achieve a good trade-off between the performance and efficiency, where a four-layer network achieves the best result in precision, recall, F-measure, and running speed. It was demonstrated that our method outperformed state-of-the-art networks, including FCN-16s, SegNet, PSPNet, DeepLabv3, and traditional U-Net, both quantitatively and qualitatively. Source code for the complete method is available at https://github.com/Leahnawho/Superpixel-network.

Details DOI

NeurIPS Conference 2020 Conference Paper

Training Generative Adversarial Networks by Solving Ordinary Differential Equations

Chongli Qin
Yan Wu
Jost Tobias Springenberg
Andy Brock
Jeff Donahue
Timothy Lillicrap
Pushmeet Kohli

The instability of Generative Adversarial Network (GAN) training has frequently been attributed to gradient descent. Consequently, recent methods have aimed to tailor the models and training procedures to stabilise the discrete updates. In contrast, we study the continuous-time dynamics induced by GAN training. Both theory and toy experiments suggest that these dynamics are in fact surprisingly stable. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error in discretising the continuous dynamics. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training - when combined with a regulariser that controls the integration error. Our approach represents a radical departure from previous methods which typically use adaptive optimisation and stabilisation techniques that constrain the functional space (e. g. Spectral Normalisation). Evaluation on CIFAR-10 and ImageNet shows that our method outperforms several strong baselines, demonstrating its efficacy.

PDF Details

NeurIPS Conference 2019 Conference Paper

Shaping Belief States with Generative Environment Models for RL

Karol Gregor
Danilo Jimenez Rezende
Frederic Besse
Yan Wu
Hamza Merzic
Aaron van den Oord

When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive algorithm with an expressive generative model can form stable belief-states in visually rich and dynamic 3D environments. More precisely, we show that the learned representation captures the layout of the environment as well as the position and orientation of the agent. Our experiments show that the model substantially improves data-efficiency on a number of reinforcement learning (RL) tasks compared with strong model-free baseline agents. We find that predicting multiple steps into the future (overshooting), in combination with an expressive generative model, is critical for stable representations to emerge. In practice, using expressive generative models in RL is computationally expensive and we propose a scheme to reduce this computational burden, allowing us to build agents that are competitive with model-free baselines.

PDF Details

NeurIPS Conference 2018 Conference Paper

Learning Attractor Dynamics for Generative Memory

Yan Wu
Gregory Wayne
Karol Gregor
Timothy Lillicrap

A central challenge faced by memory systems is the robust retrieval of a stored pattern in the presence of interference due to other stored patterns and noise. A theoretically well-founded solution to robust retrieval is given by attractor dynamics, which iteratively cleans up patterns during recall. However, incorporating attractor dynamics into modern deep learning systems poses difficulties: attractor basins are characterised by vanishing gradients, which are known to make training neural networks difficult. In this work, we exploit recent advances in variational inference and avoid the vanishing gradient problem by training a generative distributed memory with a variational lower-bound-based Lyapunov function. The model is minimalistic with surprisingly few parameters. Experiments shows it converges to correct patterns upon iterative retrieval and achieves competitive performance as both a memory model and a generative model.

PDF Details

ICRA Conference 1996 Conference Paper

A randomized roadmap method for path and manipulation planning

Nancy M. Amato
Yan Wu

This paper presents a new randomized roadmap method for motion planning for many DOF robots that can be used to obtain high quality roadmaps even when C-space is crowded. The main novelty in the authors' approach is that roadmap candidate points are chosen on C-obstacle surfaces. As a consequence, the roadmap is likely to contain difficult paths, such as those traversing long, narrow passages in C-space. The approach can be used for both collision-free path planning and for manipulation planning of contact tasks. Experimental results with a planar articulated 6 DOF robot show that, after preprocessing, difficult path planning operations can often be carried out in less than a second.

Details