Author name cluster

Li Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

96 papers

2 author rows

AAAI Conference 2026 Conference Paper

Beyond Sharpness: The Role of Nonuniformity in Generalization

Yingcong Zhou
Pingfan Wu
Li Wang
Zhiguo Fu
Fengqin Yang

Sharpness-aware minimization (SAM) is widely recognized for enhancing the generalization performance of deep neural networks. However, recent works have challenged the statement that flatness implies generalization, demonstrating that it is insufficient as the indicator of generalization. In this paper, we reveal an insightful phenomenon: among minima of similar sharpness, stochastic optimization algorithms tend to prefer those with lower nonuniformity. We define nonuniformity by both the magnitude and structure of the gradient noise, and show that it fundamentally differs from sharpness and plays a critical role in generalization. Specifically, we first theoretically prove that the expected generalization gap of models trained via stochastic optimization algorithm is positively correlated with nonuniformity (the magnitude of the gradient noise). Empirically, we show that nonuniformity exhibits a stronger correlation with generalization than sharpness, especially in Transformer models. Furthermore, we demonstrate that the nonuniformity (the structure of the gradient noise) more effectively guides the algorithm towards sparser solutions and exhibits better generalization performance than sharpness-based methods in the high-dimensional sparse regression problem. Finally, extensive experiments on various datasets and models confirm the advantages of nonuniformity for generalization: (1) optimization guided by nonuniformity achieves better generalization compared to those achieved through flatness (including standard training, transfer learning, hyperparameter sensitivity and robustness to label noise); (2) model architecture (such as depth and width) is closely related to nonuniformity.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning

Li Wang
Changhao Zhang
Zengqi Xiu
Kai Lu
Xin Yu
Kui Zhang
Wenjun Wu

Despite recent advances in the reasoning capabilities of Large Language Models (LLMs), improving the reasoning ability of Small Language Models (SLMs, e.g., up to 1.5B parameters) remains challenging. A key obstacle lies in the complexity and variability of natural language: essentially equivalent problems often appear in diverse surface forms, often obscured by redundant or distracting details. This imposes a dual burden on SLMs: they must first extract the core problem from complex linguistic input, and then perform reasoning based on that understanding. The resulting vast and noisy problem space hinders optimization, particularly for models with limited capacity. To address this, we propose a new framework that decouples understanding from reasoning by mapping natural language problems into a canonical problem space-a semantically simplified yet expressive domain. This enables SLMs to focus on reasoning over standardized inputs, free from linguistic variability. Within this framework, we introduce DURIT (Decoupled Understanding from Reasoning via Iterative Training), a three-step algorithm that iteratively: (1) mapping natural language problems via reinforcement learning, (2) aligns reasoning trajectories through self-distillation, and (3) trains reasoning policies in the problem space. The mapper and reasoner are co-trained in an alternating loop throughout this process. Experiments show that DURIT substantially improves SLMs' performance on both in-domain and out-of-domain mathematical and logical reasoning tasks. Beyond improving reasoning capabilities, DURIT also improves the robustness of reasoning, validating decoupling understanding from reasoning as an effective strategy for strengthening SLMs.

PDF Details DOI

JBHI Journal 2026 Journal Article

Direct PET-to-CT Generation for Attenuation Correction: A Slice-to-Slice Continual Transformer Segmentation-Aware Network

Rongjun Ge
Hanyuan Zheng
Yuxin Liu
Liutao Yang
Li Wang
Xu Ji
Jingtao Shen
Nan Li

Direct synthetic computed tomography (CT) generation from positron emission tomography (PET) plays a crucial role in PET attenuation correction, yet providing detailed structural information to compensate for functional imaging. Compared to the widely used PET/CT and indirect PET/MR-CT, the direct PET-to-CT translation method (denoted as PET-to-CT) offers several advantages: 1) The CT required for PET-to-CT is directly obtained from PET, thereby avoiding the intermediate errors generated in the inter-step processes of multimodal scanning in PET/CT and PET/MR-CT. 2) Furthermore, direct PET-to-CT eliminates the requirement for supplementary imaging equipment, thereby reducing complexity and scan duration in contrast to PET/CT and PET/MR-CT imaging. Thus, direct PET-to-CT is highly promising for clinical applications. However, it faces challenges, including spatial resolution mismatches between PET and CT, as well as voxel-wise semantic differences arising from functional and structural imaging. To address these challenges, this paper proposes a 2D hierarchical method called S2SCT (Slice-to-Slice Continual Transformer)-SA (Segmentation-aware) Network. It uses a slice-continual network to acquire semantic transformation knowledge from each PET slice to a CT slice, facilitating the conversion between functional and structural imaging domains. Subsequently, the segmentation-aware network is designed to futher capture spatial correlations both between slices and within slice, resulting in improved CT spatial resolution. The experiment results demonstrate that our proposed method outperforms mainstream methods in both CT generation and attenuation correction, as evidenced by both visual results and metric values.