Arrow Research search

Author name cluster

Yawei Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

TMLR Journal 2026 Journal Article

Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation

  • Bin Ren
  • Eduard Zamfir
  • Zongwei Wu
  • Yawei Li
  • Yidi Li
  • Danda Pani Paudel
  • Radu Timofte
  • Ming-Hsuan Yang

Restoring multiple degradations efficiently via just one model has become increasingly significant and impactful, especially with the proliferation of mobile devices. Traditional solutions typically involve training dedicated models per degradation, resulting in inefficiency and redundancy. More recent approaches either introduce additional modules to learn visual prompts, significantly increasing the size of the model, or incorporate cross-modal transfer from large language models trained on vast datasets, adding complexity to the system architecture. In contrast, our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations to enable both efficient and comprehensive restoration through a joint embedding mechanism, without scaling up the model or relying on large language models. Specifically, we examine the sub-latent space of each input, identifying key components and reweighting them first in a gated manner. To unify intrinsic degradation awareness with contextualized attention, we propose a spatial–frequency parallel fusion strategy that strengthens spatially informed local–global interactions and enriches restoration fidelity from the frequency domain. Comprehensive evaluations across four all-in-one restoration benchmarks demonstrate that AnyIR attains state-of-the-art performance while reducing model parameters by 84% and FLOPs by 80% relative to the baseline. These results highlight the potential of AnyIR as an effective and lightweight solution for further all-in-one image restoration. Our code is available at: https://github.com/Amazingren/AnyIR.

TMLR Journal 2026 Journal Article

Steering Large Reasoning Models towards Concise Reasoning via Flow Matching

  • Yawei Li
  • Benjamin Bergner
  • Yinghan Zhao
  • Vihang Prakash Patil
  • Bei Chen
  • Cheng Wang

Large Reasoning Models (LRMs) excel at complex reasoning tasks, but their efficiency is often hampered by overly verbose outputs. Prior steering methods attempt to address this issue by applying a single, global vector to hidden representations—an approach grounded in the restrictive \textit{linear representation hypothesis}. In this work, we introduce FlowSteer, a nonlinear steering method that goes beyond uniform linear shifts by learning a complete \textit{transformation between the distributions} associated with verbose and concise reasoning. This transformation is learned via \textit{Flow Matching} as a velocity field, enabling precise, input-dependent control over the model's reasoning process. By aligning steered representations with the distribution of concise-reasoning activations, FlowSteer yields more compact reasoning than the linear shifts. Across diverse reasoning benchmarks, FlowSteer demonstrates strong task performance and token efficiency compared to leading inference-time baselines. Our work demonstrates that modeling the full distributional transport with generative techniques offers a more effective and principled foundation for controlling LRMs.

ICLR Conference 2025 Conference Paper

Calibrating LLMs with Information-Theoretic Evidential Deep Learning

  • Yawei Li
  • David Rügamer
  • Bernd Bischl
  • Mina Rezaei

Fine-tuned large language models (LLMs) often exhibit overconfidence, particularly when trained on small datasets, resulting in poor calibration and inaccurate uncertainty estimates. Evidential Deep Learning (EDL), an uncertainty-aware approach, enables uncertainty estimation in a single forward pass, making it a promising method for calibrating fine-tuned LLMs. However, despite its computational efficiency, EDL is prone to overfitting, as its training objective can result in overly concentrated probability distributions. To mitigate this, we propose regularizing EDL by incorporating an information bottleneck (IB). Our approach IB-EDL suppresses spurious information in the evidence generated by the model and encourages truly predictive information to influence both the predictions and uncertainty estimates. Extensive experiments across various fine-tuned LLMs and tasks demonstrate that IB-EDL outperforms both existing EDL and non-EDL approaches. By improving the trustworthiness of LLMs, IB-EDL facilitates their broader adoption in domains requiring high levels of confidence calibration.

NeurIPS Conference 2025 Conference Paper

CamSAM2: Segment Anything Accurately in Camouflaged Videos

  • Yuli Zhou
  • Yawei Li
  • Yuqian Fu
  • Luca Benini
  • Ender Konukoglu
  • Guolei Sun

Video camouflaged object segmentation (VCOS), aiming at segmenting camouflaged objects that seamlessly blend into their environment, is a fundamental vision task with various real-world applications. With the release of SAM2, video segmentation has witnessed significant progress. However, SAM2's capability of segmenting camouflaged videos is suboptimal, especially when given simple prompts such as point and box. To address the problem, we propose Camouflaged SAM2 (CamSAM2), which enhances SAM2's ability to handle camouflaged scenes without modifying SAM2's parameters. Specifically, we introduce a decamouflaged token to provide the flexibility of feature adjustment for VCOS. To make full use of fine-grained and high-resolution features from the current frame and previous frames, we propose implicit object-aware fusion (IOF) and explicit object-aware fusion (EOF) modules, respectively. Object prototype generation (OPG) is introduced to abstract and memorize object prototypes with informative details using high-quality features from previous frames. Extensive experiments are conducted to validate the effectiveness of our approach. While CamSAM2 only adds negligible learnable parameters to SAM2, it substantially outperforms SAM2 on three VCOS datasets, especially achieving 12. 2 mDice gains with click prompt on MoCA-Mask and 19. 6 mDice gains with mask prompt on SUN-SEG-Hard, with Hiera-T as the backbone. The code is available at https: //github. com/zhoustan/CamSAM2.

NeurIPS Conference 2025 Conference Paper

LUNA: Efficient and Topology-Agnostic Foundation Model for EEG Signal Analysis

  • Berkay Döner
  • Thorir Mar Ingolfsson
  • Luca Benini
  • Yawei Li

Electroencephalography (EEG) offers a non-invasive lens into human brain activity, but building large‐scale models is hampered by $\textit{topological heterogeneity}$: each public corpus defines its own electrode layout, limiting generalization. We introduce $\textbf{LUNA}$ ($\textbf{L}$atent $\textbf{U}$nified $\textbf{N}$etwork $\textbf{A}$rchitecture), a self-supervised foundation model that reconciles disparate electrode geometries while scaling linearly---not quadratically---with channel count. LUNA compresses multi-channel EEG into a fixed-size, topology-agnostic latent space via learned queries and cross-attention. Downstream transformer blocks then operate exclusively on this latent representation using patch-wise temporal self-attention, decoupling computation from electrode count. Pre-trained on TUEG and Siena ($\>$21, 000 h raw EEG across diverse montages) using a masked-patch reconstruction objective, LUNA transfers effectively to four downstream tasks: abnormality detection, artifact rejection, slowing classification, and emotion recognition. It demonstrates highly competitive performance across several benchmarks, achieving state-of-the-art results on TUAR and TUSL, e. g. , $\textbf{0. 921 AUROC}$ on TUAR, while reducing FLOPs by $\textbf{300}$$\times$ and trimming GPU memory use by up to $\textbf{10}$$\times$. Critically, these gains are consistent across all evaluated electrode configurations. Code is available at https: //github. com/pulp-bio/biofoundation

NeurIPS Conference 2025 Conference Paper

PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

  • Yanlong Chen
  • Mattia Orlandi
  • Pierangelo Rapa
  • Simone Benatti
  • Luca Benini
  • Yawei Li

Physiological signals are often corrupted by motion artifacts, baseline drift, and other low-SNR disturbances, posing significant challenges for analysis. Additionally, these signals exhibit strong non-stationarity, with sharp peaks and abrupt changes that evolve continuously, making them difficult to represent using traditional time-domain or filtering methods. To address these issues, a novel wavelet-based approach for physiological signal analysis is presented, aimed at capturing multi-scale time-frequency features across various physiological signals. Leveraging this technique, two large-scale pretrained models specific to EMG and ECG are introduced for the first time, achieving superior performance and setting new baselines in downstream tasks. Additionally, a unified multi-modal framework is constructed by integrating a pretrained EEG model, where each modality is guided through its dedicated branch and fused via learnable weighted fusion. This design effectively addresses challenges such as low signal-to-noise ratio, high inter-subject variability, and device mismatch, outperforming existing methods on multi-modal tasks. The proposed wavelet-based architecture lays a solid foundation for the analysis of diverse physiological signals, while the multi-modal design points to next-generation physiological signal processing with potential impacts on wearable health monitoring, clinical diagnostics, and broader biomedical applications. Code and data are available at: github. com/ForeverBlue816/PhysioWave

TMLR Journal 2024 Journal Article

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

  • Yawei Li
  • Yang Zhang
  • Kenji Kawaguchi
  • Ashkan Khakzar
  • Bernd Bischl
  • Mina Rezaei

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model’s behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. To address the limitations of previous evaluations, in this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

ICLR Conference 2024 Conference Paper

Probabilistic Self-supervised Representation Learning via Scoring Rules Minimization

  • Amirhossein Vahidi
  • Simon Schoßer
  • Lisa Wimmer
  • Yawei Li
  • Bernd Bischl
  • Eyke Hüllermeier
  • Mina Rezaei

% Self-supervised learning methods have shown promising results across a wide range of tasks in computer vision, natural language processing, and multimodal analysis. However, self-supervised approaches come with a notable limitation, dimensional collapse, where a model doesn't fully utilize its capacity to encode information optimally. Motivated by this, we propose ProSMin, a novel probabilistic self-supervised learning approach that leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks, the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through probabilistic knowledge distillation. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMin and demonstrate its modified scoring rule. This insight validates the method's optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, outperforming the self-supervised baseline in a variety of experiments on large datasets such as ImageNet-O and ImageNet-C. ProSMin thus demonstrates its scalability and real-world applicability. Our code is publicly available: https://github.com/amirvhd/SSL-sore-rule.

NeurIPS Conference 2024 Conference Paper

Sharing Key Semantics in Transformer Makes Efficient Image Restoration

  • Bin Ren
  • Yawei Li
  • Jingyun Liang
  • Rakesh Ranjan
  • Mengyuan Liu
  • Rita Cucchiara
  • Luc Van Gool
  • Ming-Hsuan Yang

Image Restoration (IR), a classic low-level vision task, has witnessed significant advancements through deep models that effectively model global information. Notably, the emergence of Vision Transformers (ViTs) has further propelled these advancements. When computing, the self-attention mechanism, a cornerstone of ViTs, tends to encompass all global cues, even those from semantically unrelated objects or regions. This inclusivity introduces computational inefficiencies, particularly noticeable with high input resolution, as it requires processing irrelevant information, thereby impeding efficiency. Additionally, for IR, it is commonly noted that small segments of a degraded image, particularly those closely aligned semantically, provide particularly relevant information to aid in the restoration process, as they contribute essential contextual cues crucial for accurate reconstruction. To address these challenges, we propose boosting IR's performance by sharing the key semantics via Transformer for IR (i. e. , SemanIR) in this paper. Specifically, SemanIR initially constructs a sparse yet comprehensive key-semantic dictionary within each transformer stage by establishing essential semantic connections for every degraded patch. Subsequently, this dictionary is shared across all subsequent transformer blocks within the same stage. This strategy optimizes attention calculation within each block by focusing exclusively on semantically related components stored in the key-semantic dictionary. As a result, attention calculation achieves linear computational complexity within each window. Extensive experiments across 6 IR tasks confirm the proposed SemanIR's state-of-the-art performance, quantitatively and qualitatively showcasing advancements. The visual results, code, and trained models are available at: https: //github. com/Amazingren/SemanIR.

NeurIPS Conference 2021 Conference Paper

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

  • Yang Zhang
  • Ashkan Khakzar
  • Yawei Li
  • Azade Farshad
  • Seong Tae Kim
  • Nassir Navab

One principal approach for illuminating a black-box neural network is feature attribution, i. e. identifying the importance of input features for the network’s prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features' information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.