Arrow Research search

Author name cluster

Yifei Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

45 papers
2 author rows

Possible papers

45

AAAI Conference 2026 Conference Paper

MIRA: Evaluating Multimodal AI on Complex Clinical Reasoning in Interventional Radiology

  • Jingxiong Li
  • Chenglu Zhu
  • Sunyi Zheng
  • Yuxuan Sun
  • Yifei Wang
  • He Liu
  • Yunlong Zhang
  • Yixuan Si

We present MIRA (Multimodal Interventional RAdiology evaluation), a comprehensive benchmark for evaluating large multimodal models in expert-level interventional radiology tasks requiring specialized domain knowledge and advanced visual reasoning capabilities. Unlike existing medical benchmarks that primarily provide binary labels without contextual depth, MIRA offers diverse question formats, including open-ended, closed-ended, single-choice, and multiple-choice categories, each accompanied by detailed expert-validated explanations. The benchmark incorporates approximately 184K high-quality medical images spanning multiple imaging modalities with 1.2M meticulously generated question-answer pairs across various anatomical regions. These pairs were created through a sophisticated cascade methodology involving expert interventional radiologists at both the data collection and validation stages. Our comprehensive evaluation, encompassing zero-shot testing and fine-tuning experiments of large multimodal models, revealing significant performance gaps between AI systems and human specialists. Fine-tuning experiments demonstrate substantial improvements, with models achieving up to 0.80 accuracy on single-choice questions. MIRA establishes a challenging benchmark that suggests promising directions for developing specialized clinical AI systems for interventional radiology.

TMLR Journal 2026 Journal Article

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

  • Jialin Yang
  • Dongfu Jiang
  • Tony He
  • Sherman Siu
  • Yuxuan Zhang
  • Disen Liao
  • Zhuofeng Li
  • Huaye Zeng

As Large Language Models (LLMs) become integral to software development workflows, their ability to generate structured outputs has become critically important. We introduce $\textbf{StructEval}$, a comprehensive benchmark for evaluating LLMs' capabilities in producing both non-renderable (JSON, YAML, CSV) and renderable (HTML, React, SVG) structured formats. Unlike prior benchmarks, StructEval systematically evaluates structural fidelity across diverse formats through two paradigms: $\textbf{1)}$ generation tasks, producing structured output from natural language prompts, and $\textbf{2)}$ conversion tasks, translating between structured formats. Our benchmark encompasses 18 formats and 44 types of task, with novel metrics for format adherence and structural correctness. Results reveal significant performance gaps—even state-of-the-art models like o1-mini achieve only $75.58$ average score, with open-source alternatives lagging approximately $10$ points behind. We find generation tasks more challenging than conversion tasks, and producing correct visual content more difficult than generating text-only structures.

JBHI Journal 2026 Journal Article

USCNet: Transformer-Based Multimodal Fusion with Segmentation Guidance for Urolithiasis Classification

  • Changmiao Wang
  • Songqi Zhang
  • Yongquan Zhang
  • Yifei Wang
  • Liya Liu
  • Nannan Li
  • Xingzhi Li
  • Jiexin Pan

Kidney stone disease ranks among the most prevalent conditions in urology, and understanding the composition of these stones is essential for creating personalized treatment plans and preventing recurrence. Current methods for analyzing kidney stones depend on post operative specimens, which prevents rapid classification before surgery. To overcome this limitation, we introduce a new approach called the Urinary Stone Segmentation and Classification Network (USCNet). This innovative method allows for precise preoperative classification of kidney stones by integrating Computed Tomography (CT) images with clinical data from Electronic Health Records (EHR). USCNet employs a Transformer-based multimodal fusion framework with CT-EHR attention and segmentation-guided attention modules for accurate classification. Moreover, a dynamic loss function is introduced to effectively balance the dual objectives of segmentation and classification. Experiments on an in-house kidney stone dataset show that USCNet demonstrates outstanding performance across all evaluation metrics, with its classification efficacy significantly surpassing existing mainstream methods. This study presents a promising solution for the precise preoperative classification of kidney stones, offering substantial clinical benefits. The source code has been made publicly available: https://github.com/fancccc/KidneyStoneSC.

NeurIPS Conference 2025 Conference Paper

$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning

  • Xiaojun Guo
  • Ang Li
  • Yifei Wang
  • Stefanie Jegelka
  • Yisen Wang

Although Large Language Models (LLMs) have demonstrated remarkable progress, their proficiency in graph-related tasks remains notably limited, hindering the development of truly general-purpose models. Previous attempts, including pretraining graph foundation models or employing supervised fine-tuning, often face challenges such as the scarcity of large-scale, universally represented graph data. We introduce $\texttt{G1}$, a simple yet effective approach demonstrating that Reinforcement Learning (RL) on synthetic graph-theoretic tasks can significantly scale LLMs' graph reasoning abilities. To enable RL training, we curate \erdos, the largest graph reasoning dataset to date comprising 50 diverse graph-theoretic tasks of varying difficulty levels, 100k training data and 5k test data, all drived from real-world graphs. With RL on \erdos, $\texttt{G1}$ obtains substantial improvements in graph reasoning, where our finetuned 3B model even outperforms Qwen2. 5-72B-Instruct (24x size). RL-trained models also show strong zero-shot generalization to unseen tasks, domains, and graph encoding schemes, including other graph-theoretic benchmarks as well as real-world node classification and link prediction tasks, without compromising general reasoning abilities. Our findings offer an efficient, scalable path for building strong graph reasoners by finetuning LLMs with RL on graph-theoretic tasks, which combines the strengths of pretrained LLM capabilities with abundant, automatically generated synthetic data, suggesting that LLMs possess graph understanding abilities that RL can elicit successfully. Our implementation is open-sourced at https: //github. com/PKU-ML/G1, with models and datasets hosted on Hugging Face collections https: //huggingface. co/collections/PKU-ML/g1-683d659e992794fc99618cf2 for broader accessibility.

NeurIPS Conference 2025 Conference Paper

A Signed Graph Approach to Understanding and Mitigating Oversmoothing

  • Jiaqi Wang
  • Xinyi Wu
  • James Cheng
  • Yifei Wang

Deep graph neural networks (GNNs) often suffer from oversmoothing, where node representations become overly homogeneous with increasing depth. While techniques like normalization, residual connections, and edge dropout have been proposed to mitigate oversmoothing, they are typically developed independently, with limited theoretical understanding of their underlying mechanisms. In this work, we present a unified theoretical perspective based on the framework of signed graphs, showing that many existing strategies implicitly introduce negative edges that alter message-passing to resist oversmoothing. However, we show that merely adding negative edges in an unstructured manner is insufficient—the asymptotic behavior of signed propagation depends critically on the strength and organization of positive and negative edges. To address this limitation, we leverage the theory of structural balance, which promotes stable, cluster-preserving dynamics by connecting similar nodes with positive edges and dissimilar ones with negative edges. We propose Structural Balanced Propagation (SBP), a plug-and-play method that assigns signed edges based on either labels or feature similarity to explicitly enhance structural balance in the constructed signed graphs. Experiments on nine benchmarks across both homophilic and heterophilic settings demonstrate that SBP consistently improves classification accuracy and mitigates oversmoothing, even at depths of up to 300 layers. Our results provide a principled explanation for prior oversmoothing remedies and introduce a new direction for signed message-passing design in deep GNNs. Our code is available at https: //github. com/kokolerk/sbp.

JMLR Journal 2025 Journal Article

An Augmentation Overlap Theory of Contrastive Learning

  • Qi Zhang
  • Yifei Wang
  • Yisen Wang

Recently, self-supervised contrastive learning has achieved great success on various tasks. However, its underlying working mechanism is yet unclear. In this paper, we first provide the tightest bounds based on the widely adopted assumption of conditional independence. Further, we relax the conditional independence assumption to a more practical assumption of augmentation overlap and derive the asymptotically closed bounds for the downstream performance. Our proposed augmentation overlap theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations, thus simply aligning the positive samples (augmented views of the same sample) could make contrastive learning cluster intra-class samples together. Moreover, from the newly derived augmentation overlap perspective, we develop an unsupervised metric for the representation evaluation of contrastive learning, which aligns well with the downstream performance almost without relying on additional modules. Code is available at https://github.com/PKU-ML/GARC. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

IROS Conference 2025 Conference Paper

An Evidence-Based Tri-Branch Cross-Pseudo Supervision Method for Semi-Supervised Medical Image Segmentation

  • Dongyue Li
  • Aocheng Luo
  • Shaoan Wang
  • Yaoqing Hu
  • Jie Pan 0008
  • Yifei Wang
  • Junzhi Yu 0001

The semi-supervised medical image segmentation with a few annotated data can provide significant help in robot-assisted surgery. This step plays a pivotal role in identification of pathological regions, more appropriate planning of surgical procedures, and so on. In this work, we develop an evidence-based tri-branch cross-pseudo supervision model, which integrates evidence-based uncertainty estimation and multi-branch cross supervision to bolster the effectiveness of semi-supervised learning. The overall framework consists of a vanilla network and an evidential dual-branch network. Two evidential branches EPB and ERB are proposed to complement each other and improve the quality of pseudo-labels. The EPB places more focus on classification accuracy at the pixel level and the ERB emphasizes the similarity and overall integrity of the segmented regions. Then, a novel cross-pseudo supervision strategy among the three branches is designed, to guarantee that valuable and diverse unlabeled knowledge is explored and transferred for segmentation improvement. The effectiveness of the proposed method was verified on the ACDC dataset, achieving outstanding performance compared with other state-of-the-art methods. In addition, we conducted ablation study to validate the effectiveness of the evidential branches (EPB and ERB) and tri-branch cross-supervision strategy, respectively.

JBHI Journal 2025 Journal Article

BianCang: A Traditional Chinese Medicine Large Language Model

  • Sibo Wei
  • Xueping Peng
  • Yifei Wang
  • Tao Shen
  • Jiasheng Si
  • Weiyu Zhang
  • Fa Zhu
  • Athanasios V. Vasilakos

The surge of large language models (LLMs) has driven significant progress in medical applications, including traditional Chinese medicine (TCM). However, current medical LLMs struggle with TCM diagnosis and syndrome differentiation due to substantial differences between TCM and modern medical theory, and the scarcity of specialized, high-quality corpora. To this end, in this paper we propose BianCang (扁仓) 1, a TCM-specific LLM, using a two-stage training process that first injects domainspecific knowledge and then aligns it through targeted stimulation to enhance diagnostic and differentiation capabilities. Specifically, we constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continual pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM. Evaluations across 11 test sets involving 31 models and 4 tasks demonstrate the effectiveness of BianCang, offering valuable insights for future research. Code, datasets, and models are available on GitHub.

JBHI Journal 2025 Journal Article

DRL-HNet: A Deep Residual Learning Framework for Microbe-Drug Associations Prediction Using Heterogeneous Network Feature

  • Jing Chen
  • Leyang Zhang
  • Yifei Wang
  • Susu Cui
  • Zhipan Liang
  • Xu Lu

In the field of biomedicine, predicting microbe-drug associations (MDAs) is crucial for advancing drug discovery and personalized therapy. However, traditional experimental approaches often fall short in meeting requirements for accuracy and scalability. Previous studies have primarily relied on feature similarities to predict microbe-drug associations, largely ignoring the complex interdependencies essential for improved prediction. In this paper, we propose a novel framework named Deep Residual Learning Framework Using Heterogeneous Network Feature (DRL-HNet) for MDAs prediction. DRL-HNet constructs a heterogeneous network representation by integrating relationships and features from multiple data sources for both microbes and drugs. The model incorporates deep residual learning with bottleneck layers to effectively reduce computational complexity while enhancing network expressiveness. Multi-source feature fusion is leveraged to capture complex interaction patterns, while residual connections mitigate overfitting and enhance training efficiency. Extensive cross-validation experiments demonstrate that DRL-HNet outperforms existing models across multiple evaluation metrics, validating its efficacy in accurately predicting microbe-drug associations.

IROS Conference 2025 Conference Paper

HEATS: A Hierarchical Framework for Efficient Autonomous Target Search with Mobile Manipulators

  • Hao Zhang
  • Yifei Wang
  • Weifan Zhang
  • Yu Wang
  • Haoyao Chen

Utilizing robots for autonomous target search in complex and unknown environments can greatly improve the efficiency of search and rescue missions. However, existing methods have shown inadequate performance due to hardware platform limitations, inefficient viewpoint selection strategies, and conservative motion planning. In this work, we propose HEATS, which enhances the search capability of mobile manipulators in complex and unknown environments. We design a target viewpoint planner tailored to the strengths of mobile manipulators, ensuring efficient and comprehensive viewpoint planning. Supported by this, a whole-body motion planner integrates global path search with local IPC optimization, enabling the mobile manipulator to safely and agilely visit target viewpoints, significantly improving search performance. We present extensive simulated and real-world tests, in which our method demonstrates reduced search time, higher target search completeness, and lower movement cost compared to classic and state-of-the-art approaches. Our method will be open-sourced for community benefit 3.

JBHI Journal 2025 Journal Article

HGBHAN: A Novel Framework for Microbe-Drug Interaction Prediction Using Heterogeneous Graphs and Bi-LSTM With Hierarchical Attention

  • Jing Chen
  • Leyang Zhang
  • Yifei Wang
  • Susu Cui
  • Zhipan Liang
  • Xu Lu

Predicting microbe–drug associations (MDAs) is vital for accelerating drug discovery and optimizing clinical interventions in biomedical research. Traditional laboratory-based methods, though reliable, are constrained by high costs and limited scalability. While many computational approaches have utilized feature similarities to infer MDAs, they often overlook the complex and heterogeneous relationships inherent in biological networks, as well as the challenge posed by imbalanced datasets. In this study, we propose HGBHAN, a novel framework for MDAs prediction using heterogeneous graphs and bidirectional long short-term memory (Bi-LSTM) with hierarchical attention, for robust MDAs prediction. HGBHAN constructs a comprehensive heterogeneous network by integrating microbe and drug similarities with known association information, capturing multi-level structural and sequential dependencies. The model employs Bi-LSTM modules and a hierarchical attention mechanism to learn discriminative node embeddings, while residual connections are incorporated to address the over-smoothing issue in graph neural networks. Extensive experiments conducted on three public benchmark datasets demonstrate that HGBHAN outperforms existing models across multiple evaluation metrics, validating its efficacy in accurately predicting microbe–drug associations.

NeurIPS Conference 2025 Conference Paper

Language Ranker: A Lightweight Ranking framework for LLM Decoding

  • Chenheng Zhang
  • Tianqi Du
  • Jizhe Zhang
  • Mingqing Xiao
  • Yifei Wang
  • Yisen Wang
  • Zhouchen Lin

Conventional research on large language models (LLMs) has primarily focused on refining output distributions, while paying less attention to the decoding process that transforms these distributions into final responses. Recent advances, such as scaling the computation of inference time with reward models, have underscored the importance of decoding, but these methods often suffer from high computational costs and limited applicability. In this paper, we revisit LLM generation through the lens of recommender systems, conceptualizing the decoding process as analogous to the ranking stage in recommendation pipelines. From this perspective, we observe that both traditional decoding methods and reward models exhibit clear limitations such as redundancy. Motivated by this insight, we propose Language Ranker, a novel framework that introduces a lightweight module to rerank candidate responses using features extracted by the base model. Experiments across a wide range of tasks show that Language Ranker achieves performance comparable to large-scale reward models, while requiring only <0. 5M additional parameters, significantly reducing the computational overhead during both training and inference stages. This highlights the efficiency and effectiveness of our method, showcasing its potential to fully unlock the capabilities of LLMs.

NeurIPS Conference 2025 Conference Paper

Next Semantic Scale Prediction via Hierarchical Diffusion Language Models

  • Cai Zhou
  • Chenyu Wang
  • Dinghuai Zhang
  • Shangyuan Tong
  • Yifei Wang
  • Stephen Bates
  • Tommi Jaakkola

In this paper we introduce Hierarchical Diffusion Language Models (HDLM) -- a novel family of discrete diffusion models for language modeling. HDLM builds on a hierarchical vocabulary where low-level tokens with detailed semantics are surjectively mapped to high-level tokens with coarse-grained meanings. In the forward process, each token is independently perturbed to its higher-level ancestor with more abstract semantics according to the scheduler, while in the reverse process the model progressively predicts the next, more detailed semantics. Taken together, HDLM provides a general time-varying next semantic scale prediction process for language modeling. We derive closed-form expressions for the diffusion Evidence Lower Bound (ELBO), and show that HDLM can be implemented in a flexible manner while including the existing MDLM as a special case. We also propose practical training techniques based on the insights. Extensive text generation experiments validate the effectiveness of HDLM, which demonstrates consistently lower validation and generative perplexity than baselines.

ICLR Conference 2025 Conference Paper

Scaling Large Language Model-based Multi-Agent Collaboration

  • Chen Qian
  • Zihao Xie
  • Yifei Wang
  • Wei Liu 0161
  • Kunlun Zhu
  • Hanchen Xia
  • Yufan Dang
  • Zhuoyun Du

Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning. Inspired by the neural scaling law—increasing neurons enhances performance, this study explores whether the continuous addition of collaborative agents can yield similar benefits. Technically, we utilize directed acyclic graphs to organize agents into a multi-agent collaboration network (MacNet), upon which their interactive reasoning is topologically orchestrated for autonomous task solving. Extensive evaluations reveal that it effectively supports collaboration among over a thousand agents, with irregular topologies outperforming regular ones. We also identify a collaborative scaling law—the overall performance follows a logistic growth pattern as agents scale, with collaborative emergence occurring earlier than traditional neural emergence. We speculate this may be because scaling agents catalyzes their multidimensional considerations during interactive reflection and refinement, thereby producing more comprehensive artifacts. The code is available at https://github.com/OpenBMB/ChatDev/tree/macnet.

AAAI Conference 2025 Conference Paper

Semi-IIN: Semi-Supervised Intra-Inter Modal Interaction Learning Network for Multimodal Sentiment Analysis

  • Jinhao Lin
  • Yifei Wang
  • Yanwu Xu
  • Qi Liu

Despite multimodal sentiment analysis being a fertile research ground that merits further investigation, current approaches take up high annotation cost and suffer from label ambiguity, non-amicable to high-quality labeled data acquisition. Furthermore, choosing the right interactions is essential because the significance of intra- or inter-modal interactions can differ among various samples. To this end, we propose Semi-IIN, a Semi-supervised Intra-inter modal Interaction learning Network for multimodal sentiment analysis. Semi-IIN integrates masked attention and gating mechanisms, enabling effective dynamic selection after independently capturing intra- and inter-modal interactive information. Combined with the self-training approach, Semi-IIN fully utilizes the knowledge learned from unlabeled data. Experimental results on two public datasets, MOSI and MOSEI, demonstrate the effectiveness of Semi-IIN, establishing a new state-of-the-art on several metrics.

NeurIPS Conference 2025 Conference Paper

Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction

  • Yifei Wang
  • Weimin Bai
  • colin zhang
  • Debing Zhang
  • Weijian Luo
  • He Sun

In this paper, we unify more than 10 existing one-step diffusion distillation approaches, such as Diff-Instruct, DMD, SIM, SiD, $f$-distill, etc, inside a theory-driven framework which we name the \textbf{\emph{Uni-Instruct}}. Uni-Instruct is motivated by our proposed diffusion expansion theory of the $f$-divergence family. Then we introduce key theories that overcome the intractability issue of the original expanded $f$-divergence, resulting in an equivalent yet tractable loss that effectively trains one-step diffusion models by minimizing the expanded $f$-divergence family. The novel unification introduced by Uni-Instruct not only offers new theoretical contributions that help understand existing approaches from a high-level perspective but also leads to state-of-the-art one-step diffusion generation performances. On the CIFAR10 generation benchmark, Uni-Instruct achieves record-breaking Frechet Inception Distance (FID) values of \textbf{\emph{1. 46}} for unconditional generation and \textbf{\emph{1. 38}} for conditional generation. On the ImageNet-$64\times 64$ generation benchmark, Uni-Instruct achieves a new SoTA one-step diffusion FID value of \textbf{\emph{1. 06}}, which outperforms its 79-step teacher diffusion with a significant improvement margin of 1. 29 (1. 06 vs 2. 35). We also apply Uni-Instruct on broader tasks like text-to-3D generation. For text-to-3D generation, Uni-Instruct gives decent results, which slightly outperforms previous methods, such as SDS and VSD, in terms of both generation quality and diversity. Both the solid theoretical and empirical contributions of Uni-Instruct will potentially help future studies on one-step diffusion distillation and knowledge transferring of diffusion models.

NeurIPS Conference 2024 Conference Paper

A Canonicalization Perspective on Invariant and Equivariant Learning

  • George Ma
  • Yifei Wang
  • Derek Lim
  • Stefanie Jegelka
  • Yisen Wang

In many applications, we desire neural networks to exhibit invariance or equivariance to certain groups due to symmetries inherent in the data. Recently, frame-averaging methods emerged to be a unified framework for attaining symmetries efficiently by averaging over input-dependent subsets of the group, i. e. , frames. What we currently lack is a principled understanding of the design of frames. In this work, we introduce a canonicalization perspective that provides an essential and complete view of the design of frames. Canonicalization is a classic approach for attaining invariance by mapping inputs to their canonical forms. We show that there exists an inherent connection between frames and canonical forms. Leveraging this connection, we can efficiently compare the complexity of frames as well as determine the optimality of certain frames. Guided by this principle, we design novel frames for eigenvectors that are strictly superior to existing methods --- some are even optimal --- both theoretically and empirically. The reduction to the canonicalization perspective further uncovers equivalences between previous methods. These observations suggest that canonicalization provides a fundamental understanding of existing frame-averaging methods and unifies existing equivariant and invariant learning methods. Code is available at https: //github. com/PKU-ML/canonicalization.

NeurIPS Conference 2024 Conference Paper

A Theoretical Understanding of Self-Correction through In-context Alignment

  • Yifei Wang
  • Yuyang Wu
  • Zeming Wei
  • Stefanie Jegelka
  • Yisen Wang

Going beyond mimicking limited human experiences, recent studies show initial evidence that, like humans, large language models (LLMs) are capable of improving their abilities purely by self-correction, i. e. , correcting previous responses through self-examination, as seen in models like OpenAI o1. Nevertheless, little is known about how such capabilities arise. In this work, based on a simplified setup akin to an alignment task, we theoretically analyze self-correction from an in-context learning perspective, showing that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way. Notably, going beyond previous theories on over-simplified linear transformers, our theoretical construction underpins the roles of several key designs of realistic transformers for self-correction: softmax attention, multi-head attention, and the MLP block. We validate these findings extensively on synthetic datasets. Inspired by these findings, we propose a simple self-correction strategy, Checking as Context (CaC), which finds novel applications in alleviating social bias and defending against LLM jailbreaks. We believe that these findings will inspire further research on understanding, exploiting, and enhancing self-correction for building better foundation models. Code is at https: //github. com/yifeiwang77/Self-Correction.

NeurIPS Conference 2024 Conference Paper

An Expectation-Maximization Algorithm for Training Clean Diffusion Models from Corrupted Observations

  • Weimin Bai
  • Yifei Wang
  • Wenzheng Chen
  • He Sun

Diffusion models excel in solving imaging inverse problems due to their ability to model complex image priors. However, their reliance on large, clean datasets for training limits their practical use where clean data is scarce. In this paper, we propose EMDiffusion, an expectation-maximization (EM) approach to train diffusion models from corrupted observations. Our method alternates between reconstructing clean images from corrupted data using a known diffusion model (E-step) and refining diffusion model weights based on these reconstructions (M-step). This iterative process leads the learned diffusion model to gradually converge to a local optimum, that is, to approximate the true clean data distribution. We validate our method through extensive experiments on diverse computational imaging tasks, including random inpainting, denoising, and deblurring, achieving new state-of-the-art performance.

NeurIPS Conference 2024 Conference Paper

Autonomous Agents for Collaborative Task under Information Asymmetry

  • Wei Liu
  • Chenxi Wang
  • Yifei Wang
  • Zihao Xie
  • Rennai Qiu
  • Yufan Dang
  • Zhuoyun Du
  • Weize Chen

Large Language Model Multi-Agent Systems (LLM-MAS) have greatly progressed in solving complex tasks. It communicates among agents within the system to collaboratively solve tasks, under the premise of shared information. However, when agents' collaborations are leveraged to perform multi-person tasks, a new challenge arises due to information asymmetry, since each agent can only access the information of its human user. Previous MAS struggle to complete tasks under this condition. To address this, we propose a new MAS paradigm termed iAgents, which denotes Informative Multi-Agent Systems. In iAgents, the human social network is mirrored in the agent network, where agents proactively exchange human information necessary for task resolution, thereby overcoming information asymmetry. iAgents employs a novel agent reasoning mechanism, InfoNav, to navigate agents' communication towards effective information exchange. Together with InfoNav, iAgents organizes human information in a mixed memory to provide agents with accurate and comprehensive information for exchange. Additionally, we introduce InformativeBench, the first benchmark tailored for evaluating LLM agents' task-solving ability under information asymmetry. Experimental results show that iAgents can collaborate within a social network of 140 individuals and 588 relationships, autonomously communicate over 30 turns, and retrieve information from nearly 70, 000 messages to complete tasks within 3 minutes.

NeurIPS Conference 2024 Conference Paper

Dissecting the Failure of Invariant Learning on Graphs

  • Qixun Wang
  • Yifei Wang
  • Yisen Wang
  • Xianghua Ying

Enhancing node-level Out-Of-Distribution (OOD) generalization on graphs remains a crucial area. In this paper, we develop a Structural Causal Model (SCM) to theoretically dissect the performance of two prominent invariant learning methods--Invariant Risk Minimization (IRM) and Variance-Risk Extrapolation (VREx)--in node-level OOD settings. Our analysis reveals a critical limitation: these methods may struggle to identify invariant features due to the complexities introduced by the message-passing mechanism, which can obscure causal features within a range of neighboring samples. To address this, we propose Cross-environment Intra-class Alignment (CIA), which explicitly eliminates spurious features by aligning representations within the same class, bypassing the need for explicit knowledge of underlying causal patterns. To adapt CIA to node-level OOD scenarios where environment labels are hard to obtain, we further propose CIA-LRA (Localized Reweighting Alignment) that leverages the distribution of neighboring labels to selectively align node representations, effectively distinguishing and preserving invariant features while removing spurious ones, all without relying on environment labels. We theoretically prove CIA-LRA's effectiveness by deriving an OOD generalization error bound based on PAC-Bayesian analysis. Experiments on graph OOD benchmarks validate the superiority of CIA and CIA-LRA, marking a significant advancement in node-level OOD generalization.

ECAI Conference 2024 Conference Paper

Dynamic Multimodal Prompt Tuning: Boost Few-Shot Learning with VLM-Guided Point Cloud Models

  • Xiang Gu
  • Shuchao Pang
  • Anan Du
  • Yifei Wang
  • Jixiang Miao
  • Jorge Díez 0001

Few-shot learning is crucial for downstream tasks involving point clouds, given the challenge of obtaining sufficient datasets due to extensive collecting and labeling efforts. Pre-trained VLM-Guided point cloud models, containing abundant knowledge, can compensate for the scarcity of training data, potentially leading to very good performance. However, adapting these pre-trained point cloud models to specific few-shot learning tasks is challenging due to their huge number of parameters and high computational cost. To this end, we propose a novel Dynamic Multimodal Prompt Tuning method, named DMMPT, for boosting few-shot learning with pre-trained VLM-Guided point cloud models. Specifically, we build a dynamic knowledge collector capable of gathering task- and data-related information from various modalities. Then, a multimodal prompt generator is constructed to integrate collected dynamic knowledge and generate multimodal prompts, which efficiently direct pre-trained VLM-guided point cloud models toward few-shot learning tasks and address the issue of limited training data. Our method is evaluated on benchmark datasets not only in a standard N-way K-shot few-shot learning setting, but also in a more challenging setting with all classes and K-shot few-shot learning. Notably, our method outperforms other prompt-tuning techniques, achieving highly competitive results comparable to full fine-tuning methods while significantly enhancing computational efficiency.

NeurIPS Conference 2024 Conference Paper

In-Context Symmetries: Self-Supervised Learning through Contextual World Models

  • Sharut Gupta
  • Chenyu Wang
  • Yifei Wang
  • Tommi Jaakkola
  • Stefanie Jegelka

At the core of self-supervised learning for vision is the idea of learning invariant or equivariant representations with respect to a set of data transformations. This approach, however, introduces strong inductive biases, which can render the representations fragile in downstream tasks that do not conform to these symmetries. In this work, drawing insights from world models, we propose to instead learn a general representation that can adapt to be invariant or equivariant to different transformations by paying attention to context --- a memory module that tracks task-specific states, actions and future states. Here, the action is the transformation, while the current and future states respectively represent the input's representation before and after the transformation. Our proposed algorithm, Contextual Self Supervised Learning (ContextSSL), learns equivariance to all transformations (as opposed to invariance). In this way, the model can learn to encode all relevant features as general representations while having the versatility to tail down to task-wise symmetries when given a few examples as the context. Empirically, we demonstrate significant performance gains over existing methods on equivariance-related tasks, supported by both qualitative and quantitative evaluations.

ICML Conference 2024 Conference Paper

On the Duality Between Sharpness-Aware Minimization and Adversarial Training

  • Yihao Zhang
  • Hangzhou He
  • Jingyu Zhu
  • Huanran Chen
  • Yifei Wang
  • Zeming Wei

Adversarial Training (AT), which adversarially perturb the input samples during training, has been acknowledged as one of the most effective defenses against adversarial attacks, yet suffers from inevitably decreased clean accuracy. Instead of perturbing the samples, Sharpness-Aware Minimization (SAM) perturbs the model weights during training to find a more flat loss landscape and improve generalization. However, as SAM is designed for better clean accuracy, its effectiveness in enhancing adversarial robustness remains unexplored. In this work, considering the duality between SAM and AT, we investigate the adversarial robustness derived from SAM. Intriguingly, we find that using SAM alone can improve adversarial robustness. To understand this unexpected property of SAM, we first provide empirical and theoretical insights into how SAM can implicitly learn more robust features, and conduct comprehensive experiments to show that SAM can improve adversarial robustness notably without sacrificing any clean accuracy, shedding light on the potential of SAM to be a substitute for AT when accuracy comes at a higher priority. Code is available at https: //github. com/weizeming/SAM_AT.

NeurIPS Conference 2024 Conference Paper

On the Role of Attention Masks and LayerNorm in Transformers

  • Xinyi Wu
  • Amir Ajorlou
  • Yifei Wang
  • Stefanie Jegelka
  • Ali Jadbabaie

Self-attention is the key mechanism of transformers, which are the essential building blocks of modern foundation models. Recent studies have shown that pure self-attention suffers from an increasing degree of rank collapse as depth increases, limiting model expressivity and further utilization of model depth. The existing literature on rank collapse, however, has mostly overlooked other critical components in transformers that may alleviate the rank collapse issue. In this paper, we provide a general analysis of rank collapse under self-attention, taking into account the effects of attention masks and layer normalization (LayerNorm). In particular, we find that although pure masked attention still suffers from exponential collapse to a rank one subspace, sparse or local masked attention can provably slow down the collapse rate. In the case of self-attention with LayerNorm, we first show that for certain classes of value matrices, collapse to a rank one subspace still happens exponentially. However, through construction of nontrivial counterexamples, we then establish that with proper choice of value matrices, a general class of sequences may not converge to a rank one subspace, and the self-attention dynamics with LayerNorm can simultaneously possess a rich set of equilibria with any possible rank between one and full. Our result refutes the previous hypothesis that LayerNorm plays no role in the rank collapse of self-attention and suggests that self-attention with LayerNorm constitutes a much more expressive, versatile nonlinear dynamical system than what was originally thought.

ICML Conference 2024 Conference Paper

OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift

  • Lin Li 0070
  • Yifei Wang
  • Chawin Sitawarin
  • Michael W. Spratling

Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i. e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i. e. out-of-distribution (OOD) testing. This omission is concerning as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i. e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i. e. , unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60. 7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness in a positive linear way. The latter enables the prediction of OOD robustness from ID robustness. We then predict and verify that existing methods are unlikely to achieve high OOD robustness. Novel methods are therefore required to achieve OOD robustness beyond our prediction. To facilitate the development of these methods, we investigate a wide range of techniques and identify several promising directions. Code and models are available at: https: //github. com/OODRobustBench/OODRobustBench.

NeurIPS Conference 2024 Conference Paper

Understanding the Role of Equivariance in Self-supervised Learning

  • Yifei Wang
  • Kaiwen Hu
  • Sharut Gupta
  • Ziyu Ye
  • Yisen Wang
  • Stefanie Jegelka

Contrastive learning has been a leading paradigm for self-supervised learning, but it is widely observed that it comes at the price of sacrificing useful features (\eg colors) by being invariant to data augmentations. Given this limitation, there has been a surge of interest in equivariant self-supervised learning (E-SSL) that learns features to be augmentation-aware. However, even for the simplest rotation prediction method, there is a lack of rigorous understanding of why, when, and how E-SSL learns useful features for downstream tasks. To bridge this gap between practice and theory, we establish an information-theoretic perspective to understand the generalization ability of E-SSL. In particular, we identify a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks. This synergy effect encourages models to extract class-relevant features to improve its equivariant prediction, which, in turn, benefits downstream tasks requiring semantic features. Based on this perspective, we theoretically analyze the influence of data transformations and reveal several principles for practical designs of E-SSL. Our theory not only aligns well with existing E-SSL methods but also sheds light on new directions by exploring the benefits of model equivariance. We believe that a theoretically grounded understanding on the role of equivariance would inspire more principled and advanced designs in this field. Code is available at https: //github. com/kaotty/Understanding-ESSL.

NeurIPS Conference 2023 Conference Paper

Adversarial Examples Are Not Real Features

  • Ang Li
  • Yifei Wang
  • Yiwen Guo
  • Yisen Wang

The existence of adversarial examples has been a mystery for years and attracted much interest. A well-known theory by \citet{ilyas2019adversarial} explains adversarial vulnerability from a data perspective by showing that one can extract non-robust features from adversarial examples and these features alone are useful for classification. However, the explanation remains quite counter-intuitive since non-robust features are mostly noise features to humans. In this paper, we re-examine the theory from a larger context by incorporating multiple learning paradigms. Notably, we find that contrary to their good usefulness under supervised learning, non-robust features attain poor usefulness when transferred to other self-supervised learning paradigms, such as contrastive learning, masked image modeling, and diffusion models. It reveals that non-robust features are not really as useful as robust or natural features that enjoy good transferability between these paradigms. Meanwhile, for robustness, we also show that naturally trained encoders from robust features are largely non-robust under AutoAttack. Our cross-paradigm examination suggests that the non-robust features are not really useful but more like paradigm-wise shortcuts, and robust features alone might be insufficient to attain reliable model robustness. Code is available at \url{https: //github. com/PKU-ML/AdvNotRealFeatures}.

NeurIPS Conference 2023 Conference Paper

Architecture Matters: Uncovering Implicit Mechanisms in Graph Contrastive Learning

  • Xiaojun Guo
  • Yifei Wang
  • Zeming Wei
  • Yisen Wang

With the prosperity of contrastive learning for visual representation learning (VCL), it is also adapted to the graph domain and yields promising performance. However, through a systematic study of various graph contrastive learning (GCL) methods, we observe that some common phenomena among existing GCL methods that are quite different from the original VCL methods, including 1) positive samples are not a must for GCL; 2) negative samples are not necessary for graph classification, neither for node classification when adopting specific normalization modules; 3) data augmentations have much less influence on GCL, as simple domain-agnostic augmentations (e. g. , Gaussian noise) can also attain fairly good performance. By uncovering how the implicit inductive bias of GNNs works in contrastive learning, we theoretically provide insights into the above intriguing properties of GCL. Rather than directly porting existing VCL methods to GCL, we advocate for more attention toward the unique architecture of graph learning and consider its implicit influence when designing GCL methods. Code is available at https: //github. com/PKU-ML/ArchitectureMattersGCL.

NeurIPS Conference 2023 Conference Paper

Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective

  • Yifei Wang
  • Liangchen Li
  • Jiansheng Yang
  • Zhouchen Lin
  • Yisen Wang

Adversarial Training (AT) has become arguably the state-of-the-art algorithm for extracting robust features. However, researchers recently notice that AT suffers from severe robust overfitting problems, particularly after learning rate (LR) decay. In this paper, we explain this phenomenon by viewing adversarial training as a dynamic minimax game between the model trainer and the attacker. Specifically, we analyze how LR decay breaks the balance between the minimax game by empowering the trainer with a stronger memorization ability, and show such imbalance induces robust overfitting as a result of memorizing non-robust features. We validate this understanding with extensive experiments, and provide a holistic view of robust overfitting from the dynamics of both the two game players. This understanding further inspires us to alleviate robust overfitting by rebalancing the two players by either regularizing the trainer's capacity or improving the attack strength. Experiments show that the proposed ReBalanced Adversarial Training (ReBAT) can attain good robustness and does not suffer from robust overfitting even after very long training. Code is available at https: //github. com/PKU-ML/ReBAT.

JBHI Journal 2023 Journal Article

BMAnet: Boundary Mining With Adversarial Learning for Semi-Supervised 2D Myocardial Infarction Segmentation

  • Chenchu Xu
  • Yifei Wang
  • Dong Zhang
  • Longfei Han
  • Yanping Zhang
  • Jie Chen
  • Shuo Li

Automatic segmentation of myocardial infarction (MI) regions in late gadolinium-enhanced cardiac magnetic resonance images is an essential step in the computed diagnosis of myocardial infarction. Most of the current myocardial infarction region segmentation methods are based on fully supervised deep learning. However, cardiologists' annotation of myocardial infarction regions in cardiac magnetic resonance images during the diagnosis process is time-consuming and expensive. This paper proposes a semi-supervised myocardial infarction segmentation. It consists of two models: 1) a boundary mining model and 2) an adversarial learning model. The boundary mining model can solve the boundary ambiguity problem by enlarging the gap between the foreground and background features, thus segmenting the myocardial infarction region accurately. The adversarial learning model can make the boundary mining model learn from additional unlabeled data by evaluating the segmentation performance and providing pseudo supervision, which significantly increases the robustness of the boundary mining model. We conduct extensive experiments on an in-house myocardial magnetic resonance dataset. The experimental results on six evaluation metrics demonstrate that our method achieves excellent results in myocardial infarction segmentation and outperforms the state-of-the-art semi-supervised methods.

IJCAI Conference 2023 Conference Paper

Contrastive Label Enhancement

  • Yifei Wang
  • Yiyang Zhou
  • Jihua Zhu
  • Xinyuan Liu
  • Wenbiao Yan
  • Zhiqiang Tian

Label distribution learning (LDL) is a new machine learning paradigm for solving label ambiguity. Since it is difficult to directly obtain label distributions, many studies are focusing on how to recover label distributions from logical labels, dubbed label enhancement (LE). Existing LE methods estimate label distributions by simply building a mapping relationship between features and label distributions under the supervision of logical labels. They typically overlook the fact that both features and logical labels are descriptions of the instance from different views. Therefore, we propose a novel method called Contrastive Label Enhancement (ConLE) which integrates features and logical labels into the unified projection space to generate high-level features by contrastive learning strategy. In this approach, features and logical labels belonging to the same sample are pulled closer, while those of different samples are projected farther away from each other in the projection space. Subsequently, we leverage the obtained high-level features to gain label distributions through a well-designed training strategy that considers the consistency of label attributes. Extensive experiments on LDL benchmark datasets demonstrate the effectiveness and superiority of our method.

NeurIPS Conference 2023 Conference Paper

Identifiable Contrastive Learning with Automatic Feature Importance Discovery

  • Qi Zhang
  • Yifei Wang
  • Yisen Wang

Existing contrastive learning methods rely on pairwise sample contrast $z_x^\top z_{x'}$ to learn data representations, but the learned features often lack clear interpretability from a human perspective. Theoretically, it lacks feature identifiability and different initialization may lead to totally different features. In this paper, we study a new method named tri-factor contrastive learning (triCL) that involves a 3-factor contrast in the form of $z_x^\top S z_{x'}$, where $S=\text{diag}(s_1, \dots, s_k)$ is a learnable diagonal matrix that automatically captures the importance of each feature. We show that by this simple extension, triCL can not only obtain identifiable features that eliminate randomness but also obtain more interpretable features that are ordered according to the importance matrix $S$. We show that features with high importance have nice interpretability by capturing common classwise features, and obtain superior performance when evaluated for image retrieval using a few features. The proposed triCL objective is general and can be applied to different contrastive learning methods like SimCLR and CLIP. We believe that it is a better alternative to existing 2-factor contrastive learning by improving its identifiability and interpretability with minimal overhead. Code is available at https: //github. com/PKU-ML/Tri-factor-Contrastive-Learning.

NeurIPS Conference 2023 Conference Paper

Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding

  • George Ma
  • Yifei Wang
  • Yisen Wang

Spectral embedding is a powerful graph embedding technique that has received a lot of attention recently due to its effectiveness on Graph Transformers. However, from a theoretical perspective, the universal expressive power of spectral embedding comes at the price of losing two important invariance properties of graphs, sign and basis invariance, which also limits its effectiveness on graph data. To remedy this issue, many previous methods developed costly approaches to learn new invariants and suffer from high computation complexity. In this work, we explore a minimal approach that resolves the ambiguity issues by directly finding canonical directions for the eigenvectors, named Laplacian Canonization (LC). As a pure pre-processing method, LC is light-weighted and can be applied to any existing GNNs. We provide a thorough investigation, from theory to algorithm, on this approach, and discover an efficient algorithm named Maximal Axis Projection (MAP) that works for both sign and basis invariance and successfully canonizes more than 90\% of all eigenvectors. Experiments on real-world benchmark datasets like ZINC, MOLTOX21, and MOLPCBA show that MAP consistently outperforms existing methods while bringing minimal computation overhead. Code is available at https: //github. com/PKU-ML/LaplacianCanonization.

AAAI Conference 2023 Conference Paper

On the Connection between Invariant Learning and Adversarial Training for Out-of-Distribution Generalization

  • Shiji Xin
  • Yifei Wang
  • Jingtong Su
  • Yisen Wang

Despite impressive success in many tasks, deep learning models are shown to rely on spurious features, which will catastrophically fail when generalized to out-of-distribution (OOD) data. Invariant Risk Minimization (IRM) is proposed to alleviate this issue by extracting domain-invariant features for OOD generalization. Nevertheless, recent work shows that IRM is only effective for a certain type of distribution shift (e.g., correlation shift) while it fails for other cases (e.g., diversity shift). Meanwhile, another thread of method, Adversarial Training (AT), has shown better domain transfer performance, suggesting that it has the potential to be an effective candidate for extracting domain-invariant features. This paper investigates this possibility by exploring the similarity between the IRM and AT objectives. Inspired by this connection, we propose Domain-wise Adversarial Training (DAT), an AT-inspired method for alleviating distribution shift by domain-specific perturbations. Extensive experiments show that our proposed DAT can effectively remove domain-varying features and improve OOD generalization under both correlation shift and diversity shift.

AAAI Conference 2023 Conference Paper

USER: Unsupervised Structural Entropy-Based Robust Graph Neural Network

  • Yifei Wang
  • Yupan Wang
  • Zeyu Zhang
  • Song Yang
  • Kaiqi Zhao
  • Jiamou Liu

Unsupervised/self-supervised graph neural networks (GNN) are susceptible to the inherent randomness in the input graph data, which adversely affects the model's performance in downstream tasks. In this paper, we propose USER, an unsupervised and robust version of GNN based on structural entropy, to alleviate the interference of graph perturbations and learn appropriate representations of nodes without label information. To mitigate the effects of undesirable perturbations, we analyze the property of intrinsic connectivity and define the intrinsic connectivity graph. We also identify the rank of the adjacency matrix as a crucial factor in revealing a graph that provides the same embeddings as the intrinsic connectivity graph. To capture such a graph, we introduce structural entropy in the objective function. Extensive experiments conducted on clustering and link prediction tasks under random-perturbation and meta-attack over three datasets show that USER outperforms benchmarks and is robust to heavier perturbations.

NeurIPS Conference 2022 Conference Paper

Beyond the Best: Distribution Functional Estimation in Infinite-Armed Bandits

  • Yifei Wang
  • Tavor Baharav
  • Yanjun Han
  • Jiantao Jiao
  • David Tse

In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution, and each arm can be sampled further to obtain noisy estimates of the average reward of that arm. Prior work focuses on the best arm, i. e. estimating the maximum of the average reward distribution. We consider a general class of distribution functionals beyond the maximum and obtain optimal sample complexities in both offline and online settings. We show that online estimation, where the learner can sequentially choose whether to sample a new or existing arm, offers no advantage over the offline setting for estimating the mean functional, but significantly reduces the sample complexity for other functionals such as the median, maximum, and trimmed mean. We propose unified meta algorithms for the online and offline settings and derive matching lower bounds using different Wasserstein distances. For the special case of median estimation, we identify a curious thresholding phenomenon on the indistinguishability between Gaussian convolutions with respect to the noise level, which may be of independent interest.

NeurIPS Conference 2022 Conference Paper

How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders

  • Qi Zhang
  • Yifei Wang
  • Yisen Wang

Masked Autoencoders (MAE) based on a reconstruction task have risen to be a promising paradigm for self-supervised learning (SSL) and achieve state-of-the-art performance across different benchmark datasets. However, despite its impressive empirical success, there is still limited theoretical understanding of it. In this paper, we propose a theoretical understanding of how masking matters for MAE to learn meaningful features. We establish a close connection between MAE and contrastive learning, which shows that MAE implicit aligns the mask-induced positive pairs. Built upon this connection, we develop the first downstream guarantees for MAE methods, and analyze the effect of mask ratio. Besides, as a result of the implicit alignment, we also point out the dimensional collapse issue of MAE, and propose a Uniformity-enhanced MAE (U-MAE) loss that can effectively address this issue and bring significant improvements on real-world datasets, including CIFAR-10, ImageNet-100, and ImageNet-1K. Code is available at https: //github. com/zhangq327/U-MAE.

NeurIPS Conference 2022 Conference Paper

Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors

  • Qixun Wang
  • Yifei Wang
  • Hong Zhu
  • Yisen Wang

Deep models often fail to generalize well in test domains when the data distribution differs from that in the training domain. Among numerous approaches to address this Out-of-Distribution (OOD) generalization problem, there has been a growing surge of interest in exploiting Adversarial Training (AT) to improve OOD performance. Recent works have revealed that the robust model obtained by conducting sample-wise AT also retains transferability to biased test domains. In this paper, we empirically show that sample-wise AT has limited improvement on OOD performance. Specifically, we find that AT can only maintain performance at smaller scales of perturbation while Universal AT (UAT) is more robust to larger-scale perturbations. This provides us with clues that adversarial perturbations with universal (low dimensional) structures can enhance the robustness against large data distribution shifts that are common in OOD scenarios. Inspired by this, we propose two AT variants with low-rank structures to train OOD-robust models. Extensive experiments on DomainBed benchmark show that our proposed approaches outperform Empirical Risk Minimization (ERM) and sample-wise AT. Our code is available at https: //github. com/NOVAglow646/NIPS22-MAT-and-LDAT-for-OOD.

NeurIPS Conference 2022 Conference Paper

When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture

  • Yichuan Mo
  • Dongxian Wu
  • Yifei Wang
  • Yiwen Guo
  • Yisen Wang

Vision Transformers (ViTs) have recently achieved competitive performance in broad vision tasks. Unfortunately, on popular threat models, naturally trained ViTs are shown to provide no more adversarial robustness than convolutional neural networks (CNNs). Adversarial training is still required for ViTs to defend against such adversarial attacks. In this paper, we provide the first and comprehensive study on the adversarial training recipe of ViTs via extensive evaluation of various training techniques across benchmark datasets. We find that pre-training and SGD optimizer are necessary for ViTs' adversarial training. Further considering ViT as a new type of model architecture, we investigate its adversarial robustness from the perspective of its unique architectural components. We find, when randomly masking gradients from some attention blocks or masking perturbations on some patches during adversarial training, the adversarial robustness of ViTs can be remarkably improved, which may potentially open up a line of work to explore the architectural information inside the newly designed models like ViTs. Our code is available at https: //github. com/mo666666/When-Adversarial-Training-Meets-Vision-Transformers.

NeurIPS Conference 2021 Conference Paper

Dissecting the Diffusion Process in Linear Graph Convolutional Networks

  • Yifei Wang
  • Yisen Wang
  • Jiansheng Yang
  • Zhouchen Lin

Graph Convolutional Networks (GCNs) have attracted more and more attentions in recent years. A typical GCN layer consists of a linear feature propagation step and a nonlinear transformation step. Recent works show that a linear GCN can achieve comparable performance to the original non-linear GCN while being much more computationally efficient. In this paper, we dissect the feature propagation steps of linear GCNs from a perspective of continuous graph diffusion, and analyze why linear GCNs fail to benefit from more propagation steps. Following that, we propose Decoupled Graph Convolution (DGC) that decouples the terminal time and the feature propagation steps, making it more flexible and capable of exploiting a very large number of feature propagation steps. Experiments demonstrate that our proposed DGC improves linear GCNs by a large margin and makes them competitive with many modern variants of non-linear GCNs.

NeurIPS Conference 2021 Conference Paper

Residual Relaxation for Multi-view Representation Learning

  • Yifei Wang
  • Zhengyang Geng
  • Feng Jiang
  • Chuming Li
  • Yisen Wang
  • Jiansheng Yang
  • Zhouchen Lin

Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation. In this paper, we notice that some other useful augmentations, such as image rotation, are harmful for multi-view methods because they cause a semantic shift that is too large to be aligned well. This observation motivates us to relax the exact alignment objective to better cultivate stronger augmentations. Taking image rotation as a case study, we develop a generic approach, Pretext-aware Residual Relaxation (Prelax), that relaxes the exact alignment by allowing an adaptive residual vector between different views and encoding the semantic shift through pretext-aware learning. Extensive experiments on different backbones show that our method can not only improve multi-view methods with existing augmentations, but also benefit from stronger image augmentations like rotation.

ICRA Conference 2007 Conference Paper

Real-Time High-Accuracy Micropipette Aspiration for Characterizing Mechanical Properties of Biological Cells

  • Xinyu Liu 0002
  • Yifei Wang
  • Yu Sun 0001

This paper presents a micropipette aspiration system and a cell contour visual tracking algorithm for realtime, high-accuracy mechanical characterization of individual cells. The computer vision tracking algorithm measures cell deformation parameters in real time (30Hz) with a resolution down to 0. 21 pixel, significantly enhancing the accuracy and efficiency of the micropipette aspiration technique. Representing another advantage over manual measurements in terms of characterization accuracies, the micropipette aspiration system features precise synchronization between cell deformations and applied pressure changes. Experimental results on both solid-like cells (interstitial cells) and liquid-like cells (neutrophils) demonstrate the effectiveness of the system and the visual tracking algorithm. Among several characterized mechanical parameters, the viscoelastic properties of porcine aortic valve interstitial cells were, for the first time, quantified in this study.