Arrow Research search

Author name cluster

Wenya Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
1 author row

Possible papers

14

AAAI Conference 2026 Conference Paper

Causality Matters: How Temporal Information Emerges in Video Language Models

  • Yumeng Shi
  • Quanyu Long
  • Yin Wu
  • Wenya Wang

Video language models (VideoLMs) have made significant progress in multimodal understanding. However, temporal understanding, which involves identifying event order, duration, and relationships across time, still remains a core challenge. Prior works emphasize positional encodings (PEs) as a key mechanism for encoding temporal structure. Surprisingly, we find that removing or modifying PEs in video inputs yields minimal degradation in the performance of temporal understanding. In contrast, reversing the frame sequence while preserving the original PEs causes a substantial drop. To explain this behavior, we conduct substantial analysis experiments to trace how temporal information is integrated within the model. We uncover a causal information pathway: temporal cues are progressively synthesized through inter-frame attention, aggregated in the final frame, and subsequently integrated into the query tokens. This emergent mechanism shows that temporal reasoning emerges from inter-visual token interactions under the constraints of causal attention, which implicitly encodes temporal structure. Based on these insights, we propose two efficiency-oriented strategies: staged cross-modal attention and a temporal exit mechanism for early token truncation. Experiments on two benchmarks validate the effectiveness of both approaches.

AAAI Conference 2026 Conference Paper

LLMC+: Benchmarking Vision-Language Model Compression with a plug-and-play Toolkit

  • Chengtao Lv
  • Bilang Zhang
  • Yang Yong
  • Ruihao Gong
  • Yushi Huang
  • Shiqiao Gu
  • Jiajun Wu
  • Yumeng Shi

Large Vision-Language Models (VLMs) exhibit impressive multi-modal capabilities but suffer from prohibitive computational and memory demands, due to their long visual token sequences and massive parameter sizes. To address these issues, recent works have proposed training-free compression methods. However, existing efforts often suffer from three major limitations: (1) Current approaches do not decompose techniques into comparable modules, hindering fair evaluation across spatial and temporal redundancy. (2) Evaluation confined to simple single-turn tasks, failing to reflect performance in realistic scenarios. (3) Isolated use of individual compression techniques, without exploring their joint potential. To overcome these gaps, we introduce LLMC+, a comprehensive VLM compression benchmark with a versatile, plug-and-play toolkit. LLMC+ supports over 20 algorithms across five representative VLM families and enables systematic study of token-level and model-level compression. Our benchmark reveals that: (1) Spatial and temporal redundancies demand distinct technical strategies. (2) Token reduction methods degrade significantly in multi-turn dialogue and detail-sensitive tasks. (3) Combining token and model compression achieves extreme compression with minimal performance loss. We believe LLMC+ will facilitate fair evaluation and inspire future research in efficient VLM.

AAAI Conference 2026 Conference Paper

Skill Path: Unveiling Language Skills from Circuit Graphs

  • Hang Chen
  • Xinyu Yang
  • Jiaying Zhu
  • Wenya Wang

Circuit graph discovery has emerged as a fundamental approach to elucidating the skill mechanistic of language models. Despite the output faithfulness of circuit graphs, they suffer from atomic ablation, which causes the loss of causal dependencies between connected components. In addition, their discovery process, designed to preserve output faithfulness, inadvertently captures extraneous effects other than an isolated target skill. To alleviate these challenges, we introduce skill paths, which offer a more refined and compact representation by isolating individual skills within a linear chain of components. To enable skill path extracting from circuit graphs, we propose a three-step framework, consisting of decomposition, pruning, and post-hoc causal mediation. In particular, we offer a complete linear decomposition of the transformer model which leads to a disentangled computation graph. After pruning, we further adopt causal analysis techniques, including counterfactuals and interventions, to extract the final skill paths from the circuit graph. To underscore the significance of skill paths, we investigate three generic language skills—Previous Token Skill, Induction Skill, and In-Context Learning Skill—using our framework. Experiments support two crucial properties of these skills, namely stratification and inclusiveness.

AAAI Conference 2025 Conference Paper

Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation

  • Ziyan Wang
  • Yingpeng Du
  • Zhu Sun
  • Haoyan Chua
  • Kaidong Feng
  • Wenya Wang
  • Jie Zhang

Emerging advancements in large language models (LLMs) show significant potential for enhancing recommendations. However, prompt-based methods often struggle to find ideal prompts without task-specific feedback, while fine-tuning-based methods are hindered by high computational demands and dependence on open-source backbones. To address these challenges, we propose a Reflective Reinforcement Large Language Model (Re2LLM) for session-based recommendation, which refines LLMs to generate and utilize specialized knowledge effectively and efficiently. Specifically, we first devise the Reflective Exploration Module to extract and present knowledge in a form that LLMs can easily process. This module enables LLMs to reflect on their recommendation mistakes and construct a hint knowledge base to rectify them effectively. Next, we design the Reinforcement Utilization Module to train a lightweight retrieval agent that elicits correct LLM reasoning. This module recognizes hints as signals to facilitate LLM recommendations and learns to select appropriate hints from the constructed knowledge base using task-specific feedback efficiently. Lastly, we conduct experiments on real-world datasets and demonstrate the superiority of our Re2LLM over state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates

  • Hang Chen
  • Jiaying Zhu
  • Xinyu Yang
  • Wenya Wang

Circuit discovery has gradually become one of the prominent methods for mechanistic interpretability, and research on circuit completeness has also garnered increasing attention. Methods of circuit discovery that do not guarantee completeness not only result in circuits that are not fixed across different runs but also cause key mechanisms to be omitted. The nature of incompleteness arises from the presence of OR gates within the circuit, which are often only partially detected in standard circuit discovery methods. To this end, we systematically introduce three types of logic gates: AND, OR, and ADDER gates, and decompose the circuit into combinations of these logical gates. Through the concept of these gates, we derive the minimum requirements necessary to achieve faithfulness and completeness. Furthermore, we propose a framework that combines noising-based and denoising-based interventions, which can be easily integrated into existing circuit discovery methods without significantly increasing computational complexity. This framework is capable of fully identifying the logic gates and distinguishing them within the circuit. In addition to the extensive experimental validation of the framework's ability to restore the faithfulness, completeness, and sparsity of circuits, using this framework, we uncover fundamental properties of the three logic gates, such as their proportions and contributions to the output, and explore how they behave among the functionalities of language models.

AAAI Conference 2020 Conference Paper

Integrating Deep Learning with Logic Fusion for Information Extraction

  • Wenya Wang
  • Sinno Jialin Pan

Information extraction (IE) aims to produce structured information from an input text, e. g. , Named Entity Recognition and Relation Extraction. Various attempts have been proposed for IE via feature engineering or deep learning. However, most of them fail to associate the complex relationships inherent in the task itself, which has proven to be especially crucial. For example, the relation between 2 entities is highly dependent on their entity types. These dependencies can be regarded as complex constraints that can be efficiently expressed as logical rules. To combine such logic reasoning capabilities with learning capabilities of deep neural networks, we propose to integrate logical knowledge in the form of first-order logic into a deep learning system, which can be trained jointly in an end-to-end manner. The integrated framework is able to enhance neural outputs with knowledge regularization via logic rules, and at the same time update the weights of logic rules to comply with the characteristics of the training data. We demonstrate the effectiveness and generalization of the proposed model on multiple IE tasks.

IJCAI Conference 2019 Conference Paper

Cooperative Pruning in Cross-Domain Deep Neural Network Compression

  • Shangyu Chen
  • Wenya Wang
  • Sinno Jialin Pan

The advancement of deep models poses great challenges to real-world deployment because of the limited computational ability and storage space on edge devices. To solve this problem, existing works have made progress to compress deep models by pruning or quantization. However, most existing methods rely on a large amount of training data and a pre-trained model in the same domain. When only limited in-domain training data is available, these methods fail to perform well. This prompts the idea of transferring knowledge from a resource-rich source domain to a target domain with limited data to perform model compression. In this paper, we propose a method to perform cross-domain pruning by cooperatively training in both domains: taking advantage of data and a pre-trained model from the source domain to assist pruning in the target domain. Specifically, source and target pruned models are trained simultaneously and interactively, with source information transferred through the construction of a cooperative pruning mask. Our method significantly improves pruning quality in the target domain, and shed light to model compression in the cross-domain setting.

AAAI Conference 2019 Conference Paper

Deep Neural Network Quantization via Layer-Wise Optimization Using Limited Training Data

  • Shangyu Chen
  • Wenya Wang
  • Sinno Jialin Pan

The advancement of deep models poses great challenges to real-world deployment because of the limited computational ability and storage space on edge devices. To solve this problem, existing works have made progress to prune or quantize deep models. However, most existing methods rely heavily on a supervised training process to achieve satisfactory performance, acquiring large amount of labeled training data, which may not be practical for real deployment. In this paper, we propose a novel layer-wise quantization method for deep neural networks, which only requires limited training data (1% of original dataset). Specifically, we formulate parameters quantization for each layer as a discrete optimization problem, and solve it using Alternative Direction Method of Multipliers (ADMM), which gives an efficient closed-form solution. We prove that the final performance drop after quantization is bounded by a linear combination of the reconstructed errors caused at each layer. Based on the proved theorem, we propose an algorithm to quantize a deep neural network layer by layer with an additional weights update step to minimize the final error. Extensive experiments on benchmark deep models are conducted to demonstrate the effectiveness of our proposed method using 1% of CIFAR10 and ImageNet datasets. Codes are available in: https: //github. com/csyhhu/L-DNQ

NeurIPS Conference 2019 Conference Paper

MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization

  • Shangyu Chen
  • Wenya Wang
  • Sinno Jialin Pan

Tremendous amount of parameters make deep neural networks impractical to be deployed for edge-device-based real-world applications due to the limit of computational power and storage space. Existing studies have made progress on learning quantized deep models to reduce model size and energy consumption, i. e. converting full-precision weights ($r$'s) into discrete values ($q$'s) in a supervised training manner. However, the training process for quantization is non-differentiable, which leads to either infinite or zero gradients ($g_r$) w. r. t. $r$. To address this problem, most training-based quantization methods use the gradient w. r. t. $q$ ($g_q$) with clipping to approximate $g_r$ by Straight-Through-Estimator (STE) or manually design their computation. However, these methods only heuristically make training-based quantization applicable, without further analysis on how the approximated gradients can assist training of a quantized network. In this paper, we propose to learn $g_r$ by a neural network. Specifically, a meta network is trained using $g_q$ and $r$ as inputs, and outputs $g_r$ for subsequent weight updates. The meta network is updated together with the original quantized network. Our proposed method alleviates the problem of non-differentiability, and can be trained in an end-to-end manner. Extensive experiments are conducted with CIFAR10/100 and ImageNet on various deep networks to demonstrate the advantage of our proposed method in terms of a faster convergence rate and better performance. Codes are released at: \texttt{https: //github. com/csyhhu/MetaQuant}

AAAI Conference 2019 Conference Paper

Transferable Interactive Memory Network for Domain Adaptation in Fine-Grained Opinion Extraction

  • Wenya Wang
  • Sinno Jialin Pan

In fine-grained opinion mining, aspect and opinion terms extraction has become a fundamental task that provides key information for user-generated texts. Despite its importance, a lack of annotated resources in many domains impede the ability to train a precise model. Very few attempts have applied unsupervised domain adaptation methods to transfer fine-grained knowledge (in the word level) from some labeled source domain(s) to any unlabeled target domain. Existing methods depend on the construction of “pivot” knowledge, e. g. , common opinion terms or syntactic relations between aspect and opinion words. In this work, we propose an interactive memory network that consists of local and global memory units. The model could exploit both local and global memory interactions to capture intra-correlations among aspect words or opinion words themselves, as well as the interconnections between aspect and opinion words. The source space and the target space are aligned through these domaininvariant interactions by incorporating an auxiliary task and domain adversarial networks. The proposed model does not require any external resources and demonstrates promising results on 3 benchmark datasets.

IJCAI Conference 2018 Conference Paper

Transition-based Adversarial Network for Cross-lingual Aspect Extraction

  • Wenya Wang
  • Sinno Jialin Pan

In fine-grained opinion mining, the task of aspect extraction involves the identification of explicit product features in customer reviews. This task has been widely studied in some major languages, e. g. , English, but was seldom addressed in other minor languages due to the lack of annotated corpus. To solve it, we develop a novel deep model to transfer knowledge from a source language with labeled training data to a target language without any annotations. Different from cross-lingual sentiment classification, aspect extraction across languages requires more fine-grained adaptation. To this end, we utilize transition-based mechanism that reads a word each time and forms a series of configurations that represent the status of the whole sentence. We represent each configuration as a continuous feature vector and align these representations from different languages into a shared space through an adversarial network. In addition, syntactic structures are also integrated into the deep model to achieve more syntactically-sensitive adaptations. The proposed method is end-to-end and achieves state-of-the-art performance on English, French and Spanish restaurant review datasets.

AAAI Conference 2017 Conference Paper

Coupled Multi-Layer Attentions for Co-Extraction of Aspect and Opinion Terms

  • Wenya Wang
  • Sinno Jialin Pan
  • Daniel Dahlmeier
  • Xiaokui Xiao

The task of aspect and opinion terms co-extraction aims to explicitly extract aspect terms describing features of an entity and opinion terms expressing emotions from user-generated texts. To achieve this task, one effective approach is to exploit relations between aspect terms and opinion terms by parsing syntactic structure for each sentence. However, this approach requires expensive effort for parsing and highly depends on the quality of the parsing results. In this paper, we offer a novel deep learning model, named coupled multi-layer attentions. The proposed model provides an end-to-end solution and does not require any parsers or other linguistic resources for preprocessing. Specifically, the proposed model is a multilayer attention network, where each layer consists of a couple of attentions with tensor operators. One attention is for extracting aspect terms, while the other is for extracting opinion terms. They are learned interactively to dually propagate information between aspect terms and opinion terms. Through multiple layers, the model can further exploit indirect relations between terms for more precise information extraction. Experimental results on three benchmark datasets in SemEval Challenge 2014 and 2015 show that our model achieves stateof-the-art performances compared with several baselines.