Arrow Research search

Author name cluster

Longyue Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

AAAI Conference 2026 Conference Paper

Finding the Translation Switch: Discovering and Exploiting the Task-Initiation Features in LLMs

  • Xinwei Wu
  • Heng Liu
  • Xiaohu Zhao
  • Yuqi Ren
  • Linlong Xu
  • Longyue Wang
  • Deyi Xiong
  • Weihua Luo

Large Language Models (LLMs) frequently exhibit strong translation abilities, even without task-specific fine-tuning. However, the internal mechanisms governing this innate capability remain largely opaque. To demystify this process, we leverage Sparse Autoencoders (SAEs) and introduce a novel framework for identifying task-specific features. Our method first recalls features that are frequently co-activated on translation inputs and then filters them for functional coherence using a PCA-based consistency metric. This framework successfully isolates a small set of "translation initiation" features. Causal interventions demonstrate that amplifying these features steers the model towards correct translation, while ablating them induces hallucinations and off-task outputs, confirming they represent a core component of the model's innate translation competency. Moving from analysis to application, we leverage this mechanistic insight to propose a new data selection strategy for efficient fine-tuning. Specifically, we prioritize training on "mechanistically hard" samples—those that fail to naturally activate the translation initiation features. Experiments show this approach significantly improves data efficiency and suppresses hallucinations. Furthermore, we find these mechanisms are transferable to larger models of the same family. Our work not only decodes a core component of the translation mechanism in LLMs but also provides a blueprint for using internal model mechanism to create more robust and efficient models.

NeurIPS Conference 2025 Conference Paper

Alleviating Hallucinations in Large Language Models through Multi-Model Contrastive Decoding and Dynamic Hallucination Detection

  • Chenyu Zhu
  • Yefeng Liu
  • Hao Zhang
  • Aowen Wang
  • Guanhua Chen
  • Longyue Wang
  • Weihua Luo
  • Kaifu Zhang

Despite their outstanding performance in numerous applications, large language models (LLMs) remain prone to hallucinations, generating content inconsistent with their pretraining corpora. Currently, almost all contrastive decoding approaches alleviate hallucinations by introducing a model susceptible to hallucinations and appropriately widening the contrastive logits gap between hallucinatory tokens and target tokens. However, although existing contrastive decoding methods mitigate hallucinations, they lack enough confidence in the factual accuracy of the generated content. In this work, we propose Multi-Model Contrastive Decoding (MCD), which integrates a pretrained language model with an evil model and a truthful model for contrastive decoding. Intuitively, a token is assigned a high probability only when deemed potentially hallucinatory by the evil model while being considered factual by the truthful model. This decoding strategy significantly enhances the model’s confidence in its generated responses and reduces potential hallucinations. Furthermore, we introduce a dynamic hallucination detection mechanism that facilitates token-by-token identification of hallucinations during generation and a tree-based revision mechanism to diminish hallucinations further. Extensive experimental evaluations demonstrate that our MCD strategy effectively reduces hallucinations in LLMs and outperforms state-of-the-art methods across various benchmarks.

ICLR Conference 2025 Conference Paper

D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

  • Zhongwei Wan
  • Xinjian Wu
  • Yu Zhang 0133
  • Yi Xin 0003
  • Chaofan Tao
  • Zhihong Zhu
  • Xin Wang 0120
  • Siqi Luo

Efficient generative inference in Large Language Models (LLMs) is impeded by the growing memory demands of Key-Value (KV) cache, especially for longer sequences. Traditional KV Cache eviction strategies, which discard less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce **D**ynamic **D**iscriminative **O**perations ($\mathbf{D_2 O}$), a novel method that optimizes KV cache size dynamically and discriminatively at two levels without fine-tuning, while preserving essential context. At **layer-level**, by observing the varying densities of attention weights between shallow and deep layers, we dynamically determine which layers should avoid excessive eviction via our proposed ***dynamic allocation strategy*** to minimize information loss. At **token-level**, for the eviction strategy in each layer, $\mathbf{D_2 O}$ innovatively incorporates a ***compensation mechanism*** that maintains a similarity threshold to re-discriminate the importance of currently discarded tokens, determining whether they should be recalled and merged with similar tokens. Extensive experiments on various benchmarks and LLM architectures have shown that $\mathbf{D_2 O}$ not only achieves significant memory savings and enhances inference throughput by more than 3$\times$ but also maintains high-quality long-text generation.

NeurIPS Conference 2024 Conference Paper

Benchmarking LLMs via Uncertainty Quantification

  • Fanghua Ye
  • Mingming Yang
  • Jianhui Pang
  • Longyue Wang
  • Derek F. Wong
  • Emine Yilmaz
  • Shuming Shi
  • Zhaopeng Tu

The proliferation of open-source Large Language Models (LLMs) from various institutions has highlighted the urgent need for comprehensive evaluation methods. However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect -- uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking approach for LLMs that integrates uncertainty quantification. Our examination involves nine LLMs (LLM series) spanning five representative natural language processing tasks. Our findings reveal that: I) LLMs with higher accuracy may exhibit lower certainty; II) Larger-scale LLMs may display greater uncertainty compared to their smaller counterparts; and III) Instruction-finetuning tends to increase the uncertainty of LLMs. These results underscore the significance of incorporating uncertainty in the evaluation of LLMs. Our implementation is available at https: //github. com/smartyfh/LLM-Uncertainty-Bench.

JBHI Journal 2024 Journal Article

BloodPatrol: Revolutionizing Blood Cancer Diagnosis - Advanced Real-Time Detection Leveraging Deep Learning & Cloud Technologies

  • Jinhang Wei
  • Longyue Wang
  • Zhecheng Zhou
  • Linlin Zhuo
  • Xiangxiang Zeng
  • Xiangzheng Fu
  • Quan Zou
  • Keqin Li

Cloud computing and Internet of Things (IoT) technologies are gradually becoming the technological changemakers in cancer diagnosis. Blood cancer is an aggressive disease affecting the blood, bone marrow, and lymphatic system, and its early detection is crucial for subsequent treatment. Flow cytometry has been widely studied as a commonly used method for detecting blood cancer. However, the high computation and resource consumption severely limit its practical application, especifically in regions with limited medical and computational resources. In this study, with the help of cloud computing and IoT technologies, we develop a novel blood cancer dynamic monitoring diagnostic model named BloodPatrol based on an intelligent feature weight fusion mechanism. The proposed model is capable of capturing the dual-view importance relationship between cell samples and features, greatly improving prediction accuracy and significantly surpassing previous models. Besides, benefiting from the powerful processing ability of cloud computing, BloodPatrol can run on a distributed network to efficiently process large-scale cell data, which provides immediate and scalable blood cancer diagnostic services. We have also created a cloud diagnostic platform to facilitate access to our work, the latest access link and updates are available at: https://github.com/kkkayle/BloodPatrol.

ECAI Conference 2024 Conference Paper

On the Cultural Gap in Text-to-Image Generation

  • Bingshuai Liu
  • Longyue Wang
  • Chenyang Lyu
  • Yong Zhang 0034
  • Jinsong Su
  • Shuming Shi 0001
  • Zhaopeng Tu

One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model’s ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the fine-tuning data in the target culture, which is used to fine-tune a T2I model to improve cross-cultural generation. Experimental results show that our multi-modal metric provides stronger data selection performance on the C3 benchmark than existing metrics, in which the object-text alignment is crucial. We release the benchmark, data, code, and generated images to facilitate future research on culturally diverse T2I generation.

ECAI Conference 2024 Conference Paper

Reassessing Non-Autoregressive Neural Machine Translation with a Fine-Grained Error Taxonomy

  • Yan Liu
  • Longyue Wang
  • Zhaopeng Tu
  • Deyi Xiong

Non-autoregressive neural machine translation (NAT) has made remarkable progress since it is proposed. The performance of NAT in terms of BLEU has approached or even matched that of autoregressive neural machine translation (AT). However, other evaluation metrics show that NAT still lags behind. Unfortunately, these metrics only provide a numerical difference, and it is unclear how the translations produced by NAT differ from those produced by AT. In addition, the multimodality problem is always a significant issue in NAT. To assess whether NAT models are fully capable of solving the multimodality problem and achieving the performance of AT, we specifically design an error taxonomy to annotate errors in translations. The taxonomy is grounded on a systematic and hierarchical error analysis. We carry out an extensive annotation with professional annotators and analyze four NAT models and two AT models. Our analysis and experiments show that (1) the number of errors in NAT translations marked by annotators is 1. 54 times that of AT translations, (2) the multimodality problem of NAT affects translations from lexical to syntactic levels, and even up to discourse, and (3) the four NAT models cannot fully eradicate the multimodality problem despite mitigation efforts.

ICML Conference 2024 Conference Paper

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

  • Yunxin Li
  • Baotian Hu
  • Haoyuan Shi
  • Wei Wang 0164
  • Longyue Wang
  • Min Zhang 0005

Large Multimodal Models (LMMs) have achieved impressive success in visual reasoning, particularly in visual mathematics. However, problem-solving capabilities in graph theory remain less explored for LMMs, despite being a crucial aspect of mathematical reasoning that requires an accurate understanding of graphical structures and multi-step reasoning on visual graphs. To step forward in this direction, we are the first to design a benchmark named VisionGraph, used to explore the capabilities of advanced LMMs in solving multimodal graph theory problems. It encompasses eight complex graph problem tasks, from connectivity to shortest path problems. Subsequently, we present a Description-Program-Reasoning (DPR) chain to enhance the logical accuracy of reasoning processes through graphical structure description generation and algorithm-aware multi-step reasoning. Our extensive study shows that 1) GPT-4V outperforms Gemini Pro in multi-step graph reasoning; 2) All LMMs exhibit inferior perception accuracy for graphical structures, whether in zero/few-shot settings or with supervised fine-tuning (SFT), which further affects problem-solving performance; 3) DPR significantly improves the multi-step graph reasoning capabilities of LMMs and the GPT-4V (DPR) agent achieves SOTA performance.

ICLR Conference 2021 Conference Paper

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning

  • Xuebo Liu 0002
  • Longyue Wang
  • Derek F. Wong
  • Liang Ding 0006
  • Lidia S. Chao
  • Zhaopeng Tu

Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven effective on various NLP tasks. However, it is still not entirely clear why and when EncoderFusion should work. In this paper, our main contribution is to take a step further in understanding EncoderFusion. Many of previous studies believe that the success of EncoderFusion comes from exploiting surface and syntactic information embedded in lower encoder layers. Unlike them, we find that the encoder embedding layer is more important than other intermediate encoder layers. In addition, the uppermost decoder layer consistently pays more attention to the encoder embedding layer across NLP tasks. Based on this observation, we propose a simple fusion method, SurfaceFusion, by fusing only the encoder embedding layer for the softmax layer. Experimental results show that SurfaceFusion outperforms EncoderFusion on several NLP benchmarks, including machine translation, text summarization, and grammatical error correction. It obtains the state-of-the-art performance on WMT16 Romanian-English and WMT14 English-French translation tasks. Extensive analyses reveal that SurfaceFusion learns more expressive bilingual word embeddings by building a closer relationship between relevant source and target embeddings. Source code is freely available at https://github.com/SunbowLiu/SurfaceFusion.

ICLR Conference 2021 Conference Paper

Understanding and Improving Lexical Choice in Non-Autoregressive Translation

  • Liang Ding 0006
  • Longyue Wang
  • Xuebo Liu 0002
  • Derek F. Wong
  • Dacheng Tao
  • Zhaopeng Tu

Knowledge distillation (KD) is essential for training non-autoregressive translation (NAT) models by reducing the complexity of the raw data with an autoregressive teacher model. In this study, we empirically show that as a side effect of this training, the lexical choice errors on low-frequency words are propagated to the NAT model from the teacher model. To alleviate this problem, we propose to expose the raw data to NAT models to restore the useful information of low-frequency words, which are missed in the distilled data. To this end, we introduce an extra Kullback-Leibler divergence term derived by comparing the lexical choice of NAT model and that embedded in the raw data. Experimental results across language pairs and model architectures demonstrate the effectiveness and universality of the proposed approach. Extensive analyses confirm our claim that our approach improves performance by reducing the lexical choice errors on low-frequency words. Encouragingly, our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.

AAAI Conference 2020 Conference Paper

Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks

  • Yong Wang
  • Longyue Wang
  • Shuming Shi
  • Victor O.K. Li
  • Zhaopeng Tu

The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In response to this problem, we augment NMT model with additional domain transformation networks to transform the general representations to domain-specific representations, which are subsequently fed to the NMT decoder. To guarantee the knowledge transformation, we also propose two complementary supervision signals by leveraging the power of knowledge distillation and adversarial learning. Experimental results on several language pairs, covering both balanced and unbalanced multi-domain translation, demonstrate the effectiveness and universality of the proposed approach. Encouragingly, the proposed unified model achieves comparable results with the fine-tuning approach that requires multiple models to preserve the particular knowledge. Further analyses reveal that the domain transformation networks successfully capture the domain-specific knowledge as expected. 1

AAAI Conference 2019 Conference Paper

Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement

  • Zi-Yi Dou
  • Zhaopeng Tu
  • Xing Wang
  • Longyue Wang
  • Shuming Shi
  • Tong Zhang

With the promising progress of deep neural networks, layer aggregation has been used to fuse information across layers in various fields, such as computer vision and machine translation. However, most of the previous methods combine layers in a static fashion in that their aggregation strategy is independent of specific hidden states. Inspired by recent progress on capsule networks, in this paper we propose to use routing-by-agreement strategies to aggregate layers dynamically. Specifically, the algorithm learns the probability of a part (individual layer representations) assigned to a whole (aggregated representations) in an iterative way and combines parts accordingly. We implement our algorithm on top of the state-of-the-art neural machine translation model TRANSFORMER and conduct experiments on the widely-used WMT14 English⇒German and WMT17 Chinese⇒English translation datasets. Experimental results across language pairs show that the proposed approach consistently outperforms the strong baseline model and a representative static aggregation model.

AAAI Conference 2018 Conference Paper

Translating Pro-Drop Languages With Reconstruction Models

  • Longyue Wang
  • Zhaopeng Tu
  • Shuming Shi
  • Tong Zhang
  • Yvette Graham
  • Qun Liu

Pronouns are frequently omitted in pro-drop languages, such as Chinese, generally leading to significant challenges with respect to the production of complete translations. To date, very little attention has been paid to the dropped pronoun (DP) problem within neural machine translation (NMT). In this work, we propose a novel reconstruction-based approach to alleviating DP translation problems for NMT models. Firstly, DPs within all source sentences are automatically annotated with parallel information extracted from the bilingual training corpus. Next, the annotated source sentence is reconstructed from hidden representations in the NMT model. With auxiliary training objectives, in terms of reconstruction scores, the parameters associated with the NMT model are guided to produce enhanced hidden representations that are encouraged as much as possible to embed annotated DP information. Experimental results on both Chinese–English and Japanese–English dialogue translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT baseline, which is directly built on the training data annotated with DPs.