Arrow Research search

Author name cluster

Nayu Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
1 author row

Possible papers

8

AAAI Conference 2026 Conference Paper

HyCoRA: Hyper-Contrastive Role-Adaptive Learning for Role-Playing

  • Shihao Yang
  • Zhicong Lu
  • Yong Yang
  • Bo Lv
  • Yang Shen
  • Nayu Liu

Multi-character role-playing aims to equip models with the capability to simulate diverse roles. Existing methods either use one shared parameterized module across all roles or assign a separate parameterized module to each role. However, the role-shared module may ignore distinct traits of each role, weakening personality learning, while the role-specific module may overlook shared traits across multiple roles, hindering commonality modeling. In this paper, we propose a novel HyCoRA: Hyper-Contrastive Role-Adaptive learning framework, which efficiently improves multi-character role-playing agents' ability by balancing the learning of distinct and shared traits. Specifically, we propose a Hyper-Half Low-Rank Adaptation structure, where one half is a role-specific module generated by a lightweight hyper-network, and the other half is a trainable role-shared module. The role-specific module is devised to represent distinct persona signatures, while the role-shared module serves to capture common traits. Moreover, to better reflect distinct personalities across different roles, we design a hyper-contrastive learning mechanism to help the hyper-network distinguish their unique characteristics. Extensive experimental results on both English and Chinese available benchmarks demonstrate the superiority of our framework. Further GPT-4 evaluations and visual analyses also verify the capability of HyCoRA to capture role characteristics.

AAAI Conference 2026 Conference Paper

Rectify Evaluation Preference: Improving LLMs’ Critique on Math Reasoning via Perplexity-aware Reinforcement Learning

  • Changyuan Tian
  • Zhicong Lu
  • Shuang Qian
  • Nayu Liu
  • Peiguang Li
  • Li Jin
  • Leiyi Hu
  • Zhizhao Zeng

To improve Multi-step Mathematical Reasoning (MsMR) of Large Language Models (LLMs), it is crucial to obtain scalable supervision from the corpus by automatically critiquing mistakes in the reasoning process of MsMR and rendering a final verdict of the problem-solution. Most existing methods rely on crafting high-quality supervised fine-tuning demonstrations for critiquing capability enhancement and pay little attention to delving into the underlying reason for the poor critiquing performance of LLMs. In this paper, we orthogonally quantify and investigate the potential reason — imbalanced evaluation preference, and conduct a statistical preference analysis. Motivated by the analysis of the reason, a novel perplexity-aware reinforcement learning algorithm is proposed to rectify the evaluation preference, elevating the critiquing capability. Specifically, to probe into LLMs' critiquing characteristics, a One-to-many Problem-Solution (OPS) benchmark is meticulously constructed to quantify the behavior difference of LLMs when evaluating the problem solutions generated by itself and others. Then, to investigate the behavior difference in depth, we conduct a statistical preference analysis oriented on perplexity and find an intriguing phenomenon — "LLMs incline to judge solutions with lower perplexity as correct", which is dubbed as imbalanced evaluation preference. To rectify this preference, we regard perplexity as the baton in the algorithm of Group Relative Policy Optimization, supporting the LLMs to explore trajectories that judge lower perplexity as wrong and higher perplexity as correct. Extensive experimental results on our built OPS and existing available critic benchmarks demonstrate the validity of our method.

AAAI Conference 2026 Conference Paper

UMNet: Uncertainty-guided Memory Network for Hyperspectral Pansharpening

  • Xiaozheng Wang
  • Yong Yang
  • Shuying Huang
  • Nayu Liu
  • Ziyang Liu

At present, most hyperspectral (HS) sharpening methods have not fully utilized the feature correlation between adjacent bands in HS images, nor have they explored the problem of feature uncertainty generated by the model during the fusion process. This may lead to inaccurate fusion features generated by the model, resulting in spatial and spectral distortions in the fusion results. To address these issues, we propose an uncertainty-guided memory network (UMNet) for HS pansharpening. A spatial-spectral recurrent fusion unit (SRFU) is designed based on the concept of temporal data modeling, which utilizes the correlation between adjacent bands to fuse spectral and spatial features from PAN and LRHS images. In SRFU, a state memory interaction unit (SMIU) is constructed based on non-negative matrix factorization (NMF) to learn the global spatial-spectral dependency of PAN and HS images in the recurrent state space. Moreover, based on uncertainty theory, we define two spatial-spectral uncertainty-guided loss functions for the HS pansharpening task to train the model step by step, ensuring that the network can reconstruct more accurate spectral and spatial features. Extensive experiments on three widely used datasets demonstrate that, compared with some state-of-the-art (SOTA) methods, the proposed UMNet has achieved significant improvements in both spatial and spectral quality metrics.

NeurIPS Conference 2025 Conference Paper

SpecEM: Training-Free LLM Ensembling via Iterative Drafting, Verification, and Online Feedback

  • Bo Lv
  • Nayu Liu
  • Chen Tang
  • Xin Liu
  • Yue Yu
  • Ping Luo

Ensembles of generative large language models (LLMs) are a promising way to compensate for individual model limitations, integrating the strengths of different LLMs. Existing LLM ensemble methods, however, face limitations such as first-token delay and challenges in long-range semantic collaboration between models, Moreover, they typically assume equal voting weights for all models during ensemble, ignoring performance differences between models for a given task. In this work, we propose SpecEM, a training-free, plug-and-play LLM ensemble framework that dynamically adjusts each model's model contribution in real time based on task performance. Inspired by speculative decoding, SpecFuse iteratively performs drafting and verification, allowing models to collaborate semantically at the segment level for integrated output. Furthermore, we introduce an online feedback mechanism with multiplicative weight updates, where each model's voting weight is adjusted on-the-fly according to how often it "outperforms" others during verification stage, ensuring that stronger models exert greater influence on the ensemble during generation. Experimental results on five popular LLMs (ranging from 7B to 72B parameters) and six benchmark tasks, spanning instruction following, reasoning, commonsense, and general instruction response, demonstrate consistent performance improvements compared to state-of-the-art LLM ensemble methods.

AAAI Conference 2024 Conference Paper

CAMEL: Capturing Metaphorical Alignment with Context Disentangling for Multimodal Emotion Recognition

  • Linhao Zhang
  • Li Jin
  • Guangluan Xu
  • Xiaoyu Li
  • Cai Xu
  • Kaiwen Wei
  • Nayu Liu
  • Haonan Liu

Understanding the emotional polarity of multimodal content with metaphorical characteristics, such as memes, poses a significant challenge in Multimodal Emotion Recognition (MER). Previous MER researches have overlooked the phenomenon of metaphorical alignment in multimedia content, which involves non-literal associations between concepts to convey implicit emotional tones. Metaphor-agnostic MER methods may be misinformed by the isolated unimodal emotions, which are distinct from the real emotions blended in multimodal metaphors. Moreover, contextual semantics can further affect the emotions associated with similar metaphors, leading to the challenge of maintaining contextual compatibility. To address the issue of metaphorical alignment in MER, we propose to leverage a conditional generative approach for capturing metaphorical analogies. Our approach formulates schematic prompts and corresponding references based on theoretical foundations, which allows the model to better grasp metaphorical nuances. In order to maintain contextual sensitivity, we incorporate a disentangled contrastive matching mechanism, which undergoes curricular adjustment to regulate its intensity during the learning process. The automatic and human evaluation experiments on two benchmarks prove that, our model provides considerable and stable improvements in recognizing multimodal emotion with metaphor attributes.

AAAI Conference 2024 Conference Paper

Video Event Extraction with Multi-View Interaction Knowledge Distillation

  • Kaiwen Wei
  • Runyan Du
  • Li Jin
  • Jian Liu
  • Jianhua Yin
  • Linhao Zhang
  • Jintao Liu
  • Nayu Liu

Video event extraction (VEE) aims to extract key events and generate the event arguments for their semantic roles from the video. Despite promising results have been achieved by existing methods, they still lack an elaborate learning strategy to adequately consider: (1) inter-object interaction, which reflects the relation between objects; (2) inter-modality interaction, which aligns the features from text and video modality. In this paper, we propose a Multi-view Interaction with knowledge Distillation (MID) framework to solve the above problems with the Knowledge Distillation (KD) mechanism. Specifically, we propose the self-Relational KD (self-RKD) to enhance the inter-object interaction, where the relation between objects is measured by distance metric, and the high-level relational knowledge from the deeper layer is taken as the guidance for boosting the shallow layer in the video encoder. Meanwhile, to improve the inter-modality interaction, the Layer-to-layer KD (LKD) is proposed, which integrates additional cross-modal supervisions (i.e., the results of cross-attention) with the textual supervising signal for training each transformer decoder layer. Extensive experiments show that without any additional parameters, MID achieves the state-of-the-art performance compared to other strong methods in VEE.

AAAI Conference 2023 Conference Paper

TOT:Topology-Aware Optimal Transport for Multimodal Hate Detection

  • Linhao Zhang
  • Li Jin
  • Xian Sun
  • Guangluan Xu
  • Zequn Zhang
  • Xiaoyu Li
  • Nayu Liu
  • Qing Liu

Multimodal hate detection, which aims to identify the harmful content online such as memes, is crucial for building a wholesome internet environment. Previous work has made enlightening exploration in detecting explicit hate remarks. However, most of their approaches neglect the analysis of implicit harm, which is particularly challenging as explicit text markers and demographic visual cues are often twisted or missing. The leveraged cross-modal attention mechanisms also suffer from the distributional modality gap and lack logical interpretability. To address these semantic gap issues, we propose TOT: a topology-aware optimal transport framework to decipher the implicit harm in memes scenario, which formulates the cross-modal aligning problem as solutions for optimal transportation plans. Specifically, we leverage an optimal transport kernel method to capture complementary information from multiple modalities. The kernel embedding provides a non-linear transformation ability to reproduce a kernel Hilbert space (RKHS), which reflects significance for eliminating the distributional modality gap. Moreover, we perceive the topology information based on aligned representations to conduct bipartite graph path reasoning. The newly achieved state-of-the-art performance on two publicly available benchmark datasets, together with further visual analysis, demonstrate the superiority of TOT in capturing implicit cross-modal alignment.

AAAI Conference 2022 Conference Paper

PolygonE: Modeling N-ary Relational Data as Gyro-Polygons in Hyperbolic Space

  • Shiyao Yan
  • Zequn Zhang
  • Xian Sun
  • Guangluan Xu
  • Shuchao Li
  • Qing Liu
  • Nayu Liu
  • Shensi Wang

N-ary relational knowledge base (KBs) embedding aims to map binary and beyond-binary facts into low-dimensional vector space simultaneously. Existing approaches typically decompose n-ary relational facts into subtuples, and they generally model n-ary relational KBs in Euclidean space. However, n-ary relational facts are semantically and structurally intact; decomposition undermines the semantical and structural integrity. Moreover, compared to the binary relational KBs, n-ary ones are characterized by more abundant and complicated hierarchy structures, which could not be well expressed in Euclidean space. To address the issues, we propose a gyro-polygon embedding framework to realize n-ary fact integrity keeping and hierarchy capturing, termed PolygonE. Specifically, n-ary relational facts are modeled as gyropolygons in the hyperbolic space, where we denote entities in facts as vertexes of gyro-polygons and relations as entity translocation operations. Importantly, we design a fact plausibility measuring strategy based on the vertex-gyrocentroid geodesic to optimize the relation-adjusted gyro-polygon. Experimental results demonstrate that PolygonE shows SOTA performance on all benchmark datasets and generalizes well on binary data. Finally, we also visualize the embedding to help comprehend PolygonE’s awareness of hierarchies.