Author name cluster

Chenyu You

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

AAAI Conference 2026 Conference Paper

CoMA: Compositional Human Motion Generation with Multi-modal Agents

Shanlin Sun
Jiaqi Xu
Gabriel de Araujo
Shenghan Zhou
Hanwen Zhang
Ziheng Huang
Chenyu You
Xiaohui Xie

3D human motion generation has seen substantial advancement in recent years. While state-of-the-art approaches have improved performance significantly, they still struggle with complex and detailed motions unseen in training data, largely due to the scarcity of motion datasets and the prohibitive cost of generating new training examples. To address these challenges, we introduce CoMA, an agent-based solution for complex human motion generation, editing, and comprehension. CoMA leverages multiple collaborative agents powered by large language and vision models, alongside a mask transformer-based motion generator featuring body part-specific encoders and codebooks for fine-grained control. Our framework enables generation of both short and long motion sequences with detailed instructions, text-guided motion editing, and self-correction for improved quality. Evaluations on the HumanML3D dataset demonstrate competitive performance against state-of-the-art methods. Additionally, we create a set of context-rich, compositional, and long text prompts, where user studies show our method significantly outperforms existing approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Towards Robust, Reliable, and Generalized Medical AI

Chenyu You

Artificial intelligence is rapidly transforming biomedical imaging, from tumor segmentation to multimodal diagnosis, but reliable deployment in hospitals remains a pressing challenge. Models often struggle with limited labeled data, fail to generalize across scanners and institutions, and lack theoretical guarantees needed for safety-critical decisions. In this talk, I will present our research on overcoming these challenges. I will discuss label-efficient and class-imbalanced learning methods that make effective use of large-scale unlabeled data, theoretical frameworks that embed statistical principles into deep learning for stability and reliability, and medical foundation models that provide universal anatomical priors with lightweight refinement for diverse clinical settings. Together, these advances move biomedical AI beyond benchmark performance toward systems that are robust, generalizable, and clinically actionable.

PDF Details DOI

ICML Conference 2025 Conference Paper

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Tiansheng Wen
Yifei Wang 0001
Zequn Zeng
Zhong Peng
Yudi Su
Xinyang Liu
Bo Chen 0001
Hongwei Liu 0001

Many large-scale systems rely on high-quality deep representations (embeddings) to facilitate tasks like retrieval, search, and generative modeling. Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths, but it requires full model retraining and suffers from noticeable performance degradations at short lengths. In this paper, we show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. We propose Contrastive Sparse Representation ( CSR ), a method that specifies pre-trained embeddings into a high-dimensional but selectively activated feature space. By leveraging lightweight autoencoding and task-aware contrastive objectives, CSR preserves semantic quality while allowing flexible, cost-effective inference at different sparsity levels. Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of both accuracy and retrieval speed—often by large margins—while also cutting training time to a fraction of that required by MRL. Our results establish sparse coding as a powerful paradigm for adaptive representation learning in real-world applications where efficiency and fidelity are both paramount. Code is available at this URL.

Details

NeurIPS Conference 2025 Conference Paper

Martian World Model: Controllable Video Synthesis with Physically Accurate 3D Reconstructions

Longfei Li
Zhiwen Fan
Wenyan Cong
Xinhang Liu
Yuyang Yin
Matt Foutter
Panwang Pan
Chenyu You

The synthesis of realistic Martian landscape videos, essential for mission rehearsal and robotic simulation, presents unique challenges. These primarily stem from the scarcity of high-quality Martian data and the significant domain gap relative to terrestrial imagery. To address these challenges, we introduce a holistic solution comprising two main components: 1) a data curation framework, Multimodal Mars Synthesis (M3arsSynth), which processes stereo navigation images to render high-fidelity 3D video sequences. 2) a video-based Martian terrain generator (MarsGen), that utilizes multimodal conditioning data to accurately synthesize novel, 3D-consistent frames. Our data are sourced from NASA’s Planetary Data System (PDS), covering diverse Martian terrains and dates, enabling the production of physics-accurate 3D surface models at metric-scale resolution. During inference, MarsGen is conditioned on an initial image frame and can be guided by specified camera trajectories or textual prompts to generate new environments. Experimental results demonstrate that our solution surpasses video synthesis approaches trained on terrestrial data, achieving superior visual quality and 3D structural consistency.

PDF Details

UAI Conference 2024 Conference Paper

Multi-Relational Structural Entropy

Yuwei Cao
Hao Peng 0001
Angsheng Li
Chenyu You
Zhifeng Hao
Philip S. Yu

Structural Entropy (SE) measures the structural information contained in a graph. Minimizing or maximizing SE helps to reveal or obscure the intrinsic structural patterns underlying graphs in an interpretable manner, finding applications in various tasks driven by networked data. However, SE ignores the heterogeneity inherent in the graph relations, which is ubiquitous in modern networks. In this work, we extend SE to consider heterogeneous relations and propose the first metric for multi-relational graph structural information, namely, Multi-relational Structural Entropy (MrSE). To this end, we first cast SE through the novel lens of the stationary distribution from random surfing, which readily extends to multi-relational networks by considering the choices of both nodes and relation types simultaneously at each step. The resulting MrSE is then optimized by a new greedy algorithm to reveal the essential structures within a multi-relational network. Experimental results highlight that the proposed MrSE offers a more insightful interpretation of the structure of multi-relational graphs compared to SE. Additionally, it enhances the performance of two tasks that involve real-world multi-relational graphs, including node clustering and social event detection.

Details

ICLR Conference 2024 Conference Paper

Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects

Chunming He
Kai Li 0012
Yachao Zhang 0001
Yulun Zhang 0001
Chenyu You
Zhenhua Guo 0001
Xiu Li 0001
Martin Danelljan

Camouflaged object detection (COD) is the challenging task of identifying camouflaged objects visually blended into surroundings. Albeit achieving remarkable success, existing COD detectors still struggle to obtain precise results in some challenging cases. To handle this problem, we draw inspiration from the prey-vs-predator game that leads preys to develop better camouflage and predators to acquire more acute vision systems and develop algorithms from both the prey side and the predator side. On the prey side, we propose an adversarial training framework, Camouflageator, which introduces an auxiliary generator to generate more camouflaged objects that are harder for a COD method to detect. Camouflageator trains the generator and detector in an adversarial way such that the enhanced auxiliary generator helps produce a stronger detector. On the predator side, we introduce a novel COD method, called Internal Coherence and Edge Guidance (ICEG), which introduces a camouflaged feature coherence module to excavate the internal coherence of camouflaged objects, striving to obtain more complete segmentation results. Additionally, ICEG proposes a novel edge-guided separated calibration module to remove false predictions to avoid obtaining ambiguous boundaries. Extensive experiments show that ICEG outperforms existing COD detectors and Camouflageator is flexible to improve various COD detectors, including ICEG, which brings state-of-the-art COD performance.

Details

NeurIPS Conference 2023 Conference Paper

Benchmarking Large Language Models on CMExam - A comprehensive Chinese Medical Exam Dataset

Junling Liu
Peilin Zhou
Yining Hua
Dading Chong
Zhongyu Tian
Andrew Liu
Helin Wang
Chenyu You

Recent advancements in large language models (LLMs) have transformed the field of question answering (QA). However, evaluating LLMs in the medical field is challenging due to the lack of standardized and comprehensive datasets. To address this gap, we introduce CMExam, sourced from the Chinese National Medical Licensing Examination. CMExam consists of 60K+ multiple-choice questions for standardized and objective evaluations, as well as solution explanations for model reasoning evaluation in an open-ended manner. For in-depth analyses of LLMs, we invited medical professionals to label five additional question-wise annotations, including disease groups, clinical departments, medical disciplines, areas of competency, and question difficulty levels. Alongside the dataset, we further conducted thorough experiments with representative LLMs and QA algorithms on CMExam. The results show that GPT-4 had the best accuracy of 61. 6% and a weighted F1 score of 0. 617. These results highlight a great disparity when compared to human accuracy, which stood at 71. 6%. For explanation tasks, while LLMs could generate relevant reasoning and demonstrate improved performance after finetuning, they fall short of a desired standard, indicating ample room for improvement. To the best of our knowledge, CMExam is the first Chinese medical exam dataset to provide comprehensive medical annotations. The experiments and findings of LLM evaluation also provide valuable insights into the challenges and potential solutions in developing Chinese medical QA systems and LLM evaluation pipelines.

PDF Details

NeurIPS Conference 2023 Conference Paper

Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective

Chenyu You
Weicheng Dai
Yifei Min
Fenglin Liu
David Clifton
S. Kevin Zhou
Lawrence Staib
James Duncan

For medical image segmentation, contrastive learning is the dominant practice to improve the quality of visual representations by contrasting semantically similar and dissimilar pairs of samples. This is enabled by the observation that without accessing ground truth labels, negative examples with truly dissimilar anatomical features, if sampled, can significantly improve the performance. In reality, however, these samples may come from similar anatomical features and the models may struggle to distinguish the minority tail-class samples, making the tail classes more prone to misclassification, both of which typically lead to model collapse. In this paper, we propose $\texttt{ARCO}$, a semi-supervised contrastive learning (CL) framework with stratified group theory for medical image segmentation. In particular, we first propose building $\texttt{ARCO}$ through the concept of variance-reduced estimation, and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks with extremely limited labels. Furthermore, we theoretically prove these sampling techniques are universal in variance reduction. Finally, we experimentally validate our approaches on eight benchmarks, i. e. , five 2D/3D medical and three semantic segmentation datasets, with different label settings, and our methods consistently outperform state-of-the-art semi-supervised methods. Additionally, we augment the CL frameworks with these sampling techniques and demonstrate significant gains over previous methods. We believe our work is an important step towards semi-supervised medical image segmentation by quantifying the limitation of current self-supervision objectives for accomplishing such challenging safety-critical tasks.

PDF Details

NeurIPS Conference 2022 Conference Paper

Class-Aware Adversarial Transformers for Medical Image Segmentation

Chenyu You
Ruihan Zhao
Fenglin Liu
Siyuan Dong
Sandeep Chinchali
Ufuk Topcu
Lawrence Staib
James Duncan

Transformers have made remarkable progress towards modeling long-range dependencies within the medical image analysis domain. However, current transformer-based models suffer from several disadvantages: (1) existing methods fail to capture the important features of the images due to the naive tokenization scheme; (2) the models suffer from information loss because they only consider single-scale feature representations; and (3) the segmentation label maps generated by the models are not accurate enough without considering rich semantic contexts and anatomical textures. In this work, we present CASTformer, a novel type of adversarial transformers, for 2D medical image segmentation. First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations. We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures. Lastly, we utilize an adversarial training strategy that boosts segmentation accuracy and correspondingly allows a transformer-based discriminator to capture high-level semantically correlated contents and low-level anatomical features. Our experiments demonstrate that CASTformer dramatically outperforms previous state-of-the-art transformer-based approaches on three benchmarks, obtaining 2. 54%-5. 88% absolute improvements in Dice over previous models. Further qualitative experiments provide a more detailed picture of the model’s inner workings, shed light on the challenges in improved transparency, and demonstrate that transfer learning can greatly improve performance and reduce the size of medical image datasets in training, making CASTformer a strong starting point for downstream medical image analysis tasks.

PDF Details

AAAI Conference 2022 Conference Paper

KerGNNs: Interpretable Graph Neural Networks with Graph Kernels

Aosong Feng
Chenyu You
Shiqiang Wang
Leandros Tassiulas

Graph kernels are historically the most widely-used technique for graph classification tasks. However, these methods suffer from limited performance because of the hand-crafted combinatorial features of graphs. In recent years, graph neural networks (GNNs) have become the state-of-the-art method in downstream graph-related tasks due to their superior performance. Most GNNs are based on Message Passing Neural Network (MPNN) frameworks. However, recent studies show that MPNNs can not exceed the power of the Weisfeiler- Lehman (WL) algorithm in graph isomorphism test. To address the limitations of existing graph kernel and GNN methods, in this paper, we propose a novel GNN framework, termed Kernel Graph Neural Networks (KerGNNs), which integrates graph kernels into the message passing process of GNNs. Inspired by convolution filters in convolutional neural networks (CNNs), KerGNNs adopt trainable hidden graphs as graph filters which are combined with subgraphs to update node embeddings using graph kernels. In addition, we show that MPNNs can be viewed as special cases of KerGNNs. We apply KerGNNs to multiple graph-related tasks and use cross-validation to make fair comparisons with benchmarks. We show that our method achieves competitive performance compared with existing state-of-the-art methods, demonstrating the potential to increase the representation ability of GNNs. We also show that the trained graph filters in KerGNNs can reveal the local graph structures of the dataset, which significantly improves the model interpretability compared with conventional GNN models.

PDF Details

NeurIPS Conference 2022 Conference Paper

Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions

Fenglin Liu
Bang Yang
Chenyu You
Xian Wu
Shen Ge
Zhangdaihong Liu
Xu Sun
Yang Yang

The "Patient Instruction" (PI), which contains critical instructional information provided both to carers and to the patient at the time of discharge, is essential for the patient to manage their condition outside hospital. An accurate and easy-to-follow PI can improve the self-management of patients which can in turn reduce hospital readmission rates. However, writing an appropriate PI can be extremely time consuming for physicians, and is subject to being incomplete or error-prone for (potentially overworked) physicians. Therefore, we propose a new task that can provide an objective means of avoiding incompleteness, while reducing clinical workload: the automatic generation of the PI, which is imagined as being a document that the clinician can review, modify, and approve as necessary (rather than taking the human "out of the loop"). We build a benchmark clinical dataset and propose the Re$^3$Writer, which imitates the working patterns of physicians to first retrieve related working experience from historical PIs written by physicians, then reason related medical knowledge. Finally, it refines the retrieved working experience and reasoned medical knowledge to extract useful information, which is used to generate the PI for previously-unseen patient according to their health records during hospitalization. Our experiments show that, using our method, the performance of 6 different models can be substantially boosted across all metrics, with up to 20%, 11%, and 19% relative improvements in BLEU-4, ROUGE-L, and METEOR, respectively. Meanwhile, we show results from human evaluations to measure the effectiveness in terms of its usefulness for clinical practice. The code is available at https: //github. com/AI-in-Health/Patient-Instructions.

PDF Details

NeurIPS Conference 2021 Conference Paper

Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation

Fenglin Liu
Chenyu You
Xian Wu
Shen Ge
Sheng Wang
Xu Sun

Medical report generation, which aims to automatically generate a long and coherent report of a given medical image, has been receiving growing research interests. Existing approaches mainly adopt a supervised manner and heavily rely on coupled image-report pairs. However, in the medical domain, building a large-scale image-report paired dataset is both time-consuming and expensive. To relax the dependency on paired data, we propose an unsupervised model Knowledge Graph Auto-Encoder (KGAE) which accepts independent sets of images and reports in training. KGAE consists of a pre-constructed knowledge graph, a knowledge-driven encoder and a knowledge-driven decoder. The knowledge graph works as the shared latent space to bridge the visual and textual domains; The knowledge-driven encoder projects medical images and reports to the corresponding coordinates in this latent space and the knowledge-driven decoder generates a medical report given a coordinate in this space. Since the knowledge-driven encoder and decoder can be trained with independent sets of images and reports, KGAE is unsupervised. The experiments show that the unsupervised KGAE generates desirable medical reports without using any image-report training pairs. Moreover, KGAE can also work in both semi-supervised and supervised settings, and accept paired images and reports in training. By further fine-tuning with image-report pairs, KGAE consistently outperforms the current state-of-the-art models on two datasets.

PDF Details

IJCAI Conference 2021 Conference Paper

MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering

Chenyu You
Nuo Chen
Yuexian Zou

Spoken question answering (SQA) has recently drawn considerable attention in the speech community. It requires systems to find correct answers from the given spoken passages simultaneously. The common SQA systems consist of the automatic speech recognition (ASR) module and text-based question answering module. However, previous methods suffer from severe performance degradation due to ASR errors. To alleviate this problem, this work proposes a novel multi-modal residual knowledge distillation method (MRD-Net), which further distills knowledge at the acoustic level from the audio-assistant (Audio-A). Specifically, we utilize the teacher (T) trained on manual transcriptions to guide the training of the student (S) on ASR transcriptions. We also show that introducing an Audio-A helps this procedure by learning residual errors between T and S. Moreover, we propose a simple yet effective attention mechanism to adaptively leverage audio-text features as the new deep attention knowledge to boost the network performance. Extensive experiments demonstrate that the proposed MRD-Net achieves superior results compared with state-of-the-art methods on three spoken question answering benchmark datasets.

PDF Details DOI

ICLR Conference 2021 Conference Paper

Undistillable: Making A Nasty Teacher That CANNOT teach students

Haoyu Ma
Tianlong Chen 0001
Ting-Kuei Hu
Chenyu You
Xiaohui Xie
Zhangyang Wang

Knowledge Distillation (KD) is a widely used technique to transfer knowledge from pre-trained teacher models to (usually more lightweight) student models. However, in certain situations, this technique is more of a curse than a blessing. For instance, KD poses a potential risk of exposing intellectual properties (IPs): even if a trained machine learning model is released in ``black boxes'' (e.g., as executable software or APIs without open-sourcing code), it can still be replicated by KD through imitating input-output behaviors. To prevent this unwanted effect of KD, this paper introduces and investigates a concept called $\textit{Nasty Teacher}$: a specially trained teacher network that yields nearly the same performance as a normal one, but would significantly degrade the performance of student models learned by imitating it. We propose a simple yet effective algorithm to build the nasty teacher, called $\textit{self-undermining knowledge distillation}$. Specifically, we aim to maximize the difference between the output of the nasty teacher and a normal pre-trained network. Extensive experiments on several datasets demonstrate that our method is effective on both standard KD and data-free KD, providing the desirable KD-immunity to model owners for the first time. We hope our preliminary study can draw more awareness and interest in this new practical problem of both social and legal importance. Our codes and pre-trained models can be found at: $\url{https://github.com/VITA-Group/Nasty-Teacher}$.

Details