Author name cluster

Shaowen Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

Zhixuan Pan
Shaowen Wang
Liao Pengfei
Jian Li

Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet principled explanations for their underlying mechanisms and several phenomena, such as scaling laws, hallucinations, and related behaviors, remain elusive. In this work, we revisit the classical relationship between compression and prediction, grounded in Kolmogorov complexity and Shannon information theory, to provide deeper insights into LLM behaviors. By leveraging the Kolmogorov Structure Function and interpreting LLM compression as a two-part coding process, we offer a detailed view of how LLMs acquire and store information across increasing model and data scales -- from pervasive syntactic patterns to progressively rarer knowledge elements. Motivated by this theoretical perspective and natural assumptions inspired by Heap’s and Zipf’s laws, we introduce a simplified yet representative hierarchical data-generation framework called the Syntax-Knowledge model. Under the Bayesian setting, we show that prediction and compression within this model naturally lead to diverse learning and scaling behaviors of LLMs. In particular, our theoretical analysis offers intuitive and principled explanations for both data and model scaling laws, the dynamics of knowledge acquisition during training and fine-tuning, factual knowledge hallucinations in LLMs. The experimental results validate our theoretical predictions.

PDF Details

AIJ Journal 2024 Journal Article

Learning spatio-temporal dynamics on mobility networks for adaptation to open-world events

Zhaonan Wang
Renhe Jiang
Hao Xue
Flora D. Salim
Xuan Song
Ryosuke Shibasaki
Wei Hu
Shaowen Wang

As a decisive part in the success of Mobility-as-a-Service (MaaS), spatio-temporal dynamics modeling on mobility networks is a challenging task particularly considering scenarios where open-world events drive mobility behavior deviated from the routines. While tremendous progress has been made to model high-level spatio-temporal regularities with deep learning, most, if not all of the existing methods are neither aware of the dynamic interactions among multiple transport modes on mobility networks, nor adaptive to unprecedented volatility brought by potential open-world events. In this paper, we are therefore motivated to improve the canonical spatio-temporal network (ST-Net) from two perspectives: (1) design a heterogeneous mobility information network (HMIN) to explicitly represent intermodality in multimodal mobility; (2) propose a memory-augmented dynamic filter generator (MDFG) to generate sequence-specific parameters in an on-the-fly fashion for various scenarios. The enhanced event-aware spatio-temporal network, namely EAST-Net, is evaluated on several real-world datasets with a wide variety and coverage of open-world events. Both quantitative and qualitative experimental results verify the superiority of our approach compared with the state-of-the-art baselines. What is more, experiments show generalization ability of EAST-Net to perform zero-shot inference over different open-world events that have not been seen.

Details DOI

NeurIPS Conference 2024 Conference Paper

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

Shaowen Wang
Linxi Yu
Jian Li

Fine-tuning large-scale pretrained models is prohibitively expensive in terms of computational and memory costs. LoRA, as one of the most popular Parameter-Efficient Fine-Tuning (PEFT) methods, offers a cost-effective alternative by fine-tuning an auxiliary low-rank model that has significantly fewer parameters. Although LoRA reduces the computational and memory requirements significantly at each iteration, extensive empirical evidence indicates that it converges at a considerably slower rate compared to full fine-tuning, ultimately leading to increased overall compute and often worse test performance. In our paper, we perform an in-depth investigation of the initialization method of LoRA and show that careful initialization (without any change of the architecture and the training algorithm) can significantly enhance both efficiency and performance. In particular, we introduce a novel initialization method, LoRA-GA (Low Rank Adaptation with Gradient Approximation), which aligns the gradients of low-rank matrix product with those of full fine-tuning at the first step. Our extensive experiments demonstrate that LoRA-GA achieves a convergence rate comparable to that of full fine-tuning (hence being significantly faster than vanilla LoRA as well as various recent improvements) while simultaneously attaining comparable or even better performance. For example, on the subset of the GLUE dataset with T5-Base, LoRA-GA outperforms LoRA by 5. 69% on average. On larger models such as Llama 2-7B, LoRA-GA shows performance improvements of 0. 34, 11. 52%, and 5. 05% on MTbench, GSM8k, and Human-eval, respectively. Additionally, we observe up to 2-4 times convergence speed improvement compared to vanilla LoRA, validating its effectiveness in accelerating convergence and enhancing model performance.

PDF Details DOI

TIST Journal 2022 Journal Article

Weakly Supervised Spatial Deep Learning for Earth Image Segmentation Based on Imperfect Polyline Labels

Zhe Jiang
Wenchong He
Marcus Stephen Kirby
Arpan Man Sainju
Shaowen Wang
Lawrence V. Stanislawski
Ethan J. Shavers
E. Lynn Usery

In recent years, deep learning has achieved tremendous success in image segmentation for computer vision applications. The performance of these models heavily relies on the availability of large-scale high-quality training labels (e.g., PASCAL VOC 2012). Unfortunately, such large-scale high-quality training data are often unavailable in many real-world spatial or spatiotemporal problems in earth science and remote sensing (e.g., mapping the nationwide river streams for water resource management). Although extensive efforts have been made to reduce the reliance on labeled data (e.g., semi-supervised or unsupervised learning, few-shot learning), the complex nature of geographic data such as spatial heterogeneity still requires sufficient training labels when transferring a pre-trained model from one region to another. On the other hand, it is often much easier to collect lower-quality training labels with imperfect alignment with earth imagery pixels (e.g., through interpreting coarse imagery by non-expert volunteers). However, directly training a deep neural network on imperfect labels with geometric annotation errors could significantly impact model performance. Existing research that overcomes imperfect training labels either focuses on errors in label class semantics or characterizes label location errors at the pixel level. These methods do not fully incorporate the geometric properties of label location errors in the vector representation. To fill the gap, this article proposes a weakly supervised learning framework to simultaneously update deep learning model parameters and infer hidden true vector label locations. Specifically, we model label location errors in the vector representation to partially reserve geometric properties (e.g., spatial contiguity within line segments). Evaluations on real-world datasets in the National Hydrography Dataset (NHD) refinement application illustrate that the proposed framework outperforms baseline methods in classification accuracy.

Details DOI

TIST Journal 2018 Journal Article

GeoBurst+

Chao Zhang
Dongming Lei
Quan Yuan
Honglei Zhuang
Lance Kaplan
Shaowen Wang
Jiawei Han

The real-time discovery of local events (e.g., protests, disasters) has been widely recognized as a fundamental socioeconomic task. Recent studies have demonstrated that the geo-tagged tweet stream serves as an unprecedentedly valuable source for local event detection. Nevertheless, how to effectively extract local events from massive geo-tagged tweet streams in real time remains challenging. To bridge the gap, we propose a method for effective and real-time local event detection from geo-tagged tweet streams. Our method, named G eo B urst+, first leverages a novel cross-modal authority measure to identify several pivots in the query window. Such pivots reveal different geo-topical activities and naturally attract similar tweets to form candidate events. G eo B urst+ further summarizes the continuous stream and compares the candidates against the historical summaries to pinpoint truly interesting local events. Better still, as the query window shifts, G eo B urst+ is capable of updating the event list with little time cost, thus achieving continuous monitoring of the stream. We used crowdsourcing to evaluate G eo B urst+ on two million-scale datasets and found it significantly more effective than existing methods while being orders of magnitude faster.

Details DOI