Arrow Research search

Author name cluster

Nan Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
1 author row

Possible papers

8

AAAI Conference 2025 Conference Paper

3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

  • Xindian Ma
  • Wenyuan Liu
  • Peng Zhang
  • Nan Xu

An essential component in Large Language Models (LLMs) is Rotary Position Encoding (RoPE), which efficiently manages positional dependencies in long-context modeling. However, when the number of input tokens surpasses the pretrained capacity of LLMs, their ability to process and generate text is markedly weakened. Although position interpolation techniques for RoPE can mitigate this issue, an increase in interpolations leads to a decrease in positional resolution. To tackle this challenge, drawing inspiration from the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D RoPE, with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay, 3D-RPE allows for the regulation of long-term decay within the chunk size, ensuring the modeling of relative positional information between tokens at a distant relative position. For improved position resolution, 3D-RPE can mitigate the degradation of position resolution caused by position interpolation on RoPE. We have conducted experiments on long-context Natural Language Understanding (NLU) and long sequence Language Modeling (LM) tasks. From the experimental results, 3D-RPE achieved performance improvements over RoPE, especially in long-context NLU tasks.

IS Journal 2022 Journal Article

An Orthogonal Subspace Decomposition Method for Cross-Modal Retrieval

  • Zhixiong Zeng
  • Nan Xu
  • Wenji Mao
  • Daniel Zeng

As a general characteristic observed in the real-world datasets, multimodal data are usually partially associated, which comprise the commonly shared information across modalities (i. e. , modality-shared information) and the specific information only exists in a single modality (i. e. , modality-specific information). Cross-modal retrieval methods typically use these information in multimodal data as a whole and project them into a common representation space to calculate the similarity measure. In fact, only modality-shared information can be well aligned in the learning of common representations, whereas modality-specific information usually brings about interference term and decreases the performance of cross-modal retrieval. The explicit distinction and utilization of these two kinds of multimodal information are important to cross-modal retrieval, but rarely studied in previous research. In this article, we explicitly distinguish and utilize modality-shared and modality-specific features for learning better common representations, and propose an orthogonal subspace decomposition method for cross-modal retrieval, named orthogonal subspace decomposition method. Specifically, we introduce a structure preservation loss to ensure modality-shared information to be well preserved, and optimize the intramodal discrimination loss and intermodal invariance loss to learn the semantic discriminative features for cross-modal retrieval. We conduct comprehensive experiments on four widely used benchmark datasets, and the experimental results demonstrate the effectiveness of our proposed method.

IS Journal 2021 Journal Article

MDA: Multimodal Data Augmentation Framework for Boosting Performance on Sentiment/Emotion Classification Tasks

  • Nan Xu
  • Wenji Mao
  • Penghui Wei
  • Daniel Zeng

Multimodal data analysis has drawn increasing attention with the explosive growth of multimedia data. Although traditional unimodal data analysis tasks have accumulated abundant labeled datasets, there are few labeled multimodal datasets due to the difficulty and complexity of multimodal data annotation, nor is it easy to directly transfer unimodal knowledge to multimodal data. Unfortunately, there is little related data augmentation work in multimodal domain, especially for image–text data. In this article, to address the scarcity problem of labeled multimodal data, we propose a Multimodal Data Augmentation framework for boosting the performance on multimodal image–text classification task. Our framework learns a cross-modality matching network to select image–text pairs from existing unimodal datasets as the multimodal synthetic dataset, and uses this dataset to enhance the performance of classifiers. We take the multimodal sentiment analysis and multimodal emotion analysis as the experimental tasks and the experimental results show the effectiveness of our framework for boosting the performance on multimodal classification task.

AAAI Conference 2020 Conference Paper

MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control

  • Xinshi Zang
  • Huaxiu Yao
  • Guanjie Zheng
  • Nan Xu
  • Kai Xu
  • Zhenhui Li

Using reinforcement learning for traffic signal control has attracted increasing interests recently. Various value-based reinforcement learning methods have been proposed to deal with this classical transportation problem and achieved better performances compared with traditional transportation methods. However, current reinforcement learning models rely on tremendous training data and computational resources, which may have bad consequences (e. g. , traffic jams or accidents) in the real world. In traffic signal control, some algorithms have been proposed to empower quick learning from scratch, but little attention is paid to learning by transferring and reusing learned experience. In this paper, we propose a novel framework, named as MetaLight, to speed up the learning process in new scenarios by leveraging the knowledge learned from existing scenarios. MetaLight is a value-based metareinforcement learning workflow based on the representative gradient-based meta-learning algorithm (MAML), which includes periodically alternate individual-level adaptation and global-level adaptation. Moreover, MetaLight improves thestate-of-the-art reinforcement learning model FRAP in traffic signal control by optimizing its model structure and updating paradigm. The experiments on four real-world datasets show that our proposed MetaLight not only adapts more quickly and stably in new traffic scenarios, but also achieves better performance.

AAAI Conference 2020 Conference Paper

Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control

  • Chacha Chen
  • Hua Wei
  • Nan Xu
  • Guanjie Zheng
  • Ming Yang
  • Yuanhao Xiong
  • Kai Xu
  • Zhenhui Li

Traffic congestion plagues cities around the world. Recent years have witnessed an unprecedented trend in applying reinforcement learning for traffic signal control. However, the primary challenge is to control and coordinate traffic lights in large-scale urban networks. No one has ever tested RL models on a network of more than a thousand traffic lights. In this paper, we tackle the problem of multi-intersection traf- fic signal control, especially for large-scale networks, based on RL techniques and transportation theories. This problem is quite difficult because there are challenges such as scalability, signal coordination, data feasibility, etc. To address these challenges, we (1) design our RL agents utilizing ‘pressure’ concept to achieve signal coordination in region-level; (2) show that implicit coordination could be achieved by individual control agents with well-crafted reward design thus reducing the dimensionality; and (3) conduct extensive experiments on multiple scenarios, including a real-world scenario with 2510 traffic lights in Manhattan, New York City 1 2.

AAAI Conference 2019 Conference Paper

Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis

  • Nan Xu
  • Wenji Mao
  • Guandan Chen

As a fundamental task of sentiment analysis, aspect-level sentiment analysis aims to identify the sentiment polarity of a specific aspect in the context. Previous work on aspect-level sentiment analysis is text-based. With the prevalence of multimodal user-generated content (e. g. text and image) on the Internet, multimodal sentiment analysis has attracted increasing research attention in recent years. In the context of aspectlevel sentiment analysis, multimodal data are often more important than text-only data, and have various correlations including impacts that aspect brings to text and image as well as the interactions associated with text and image. However, there has not been any related work carried out so far at the intersection of aspect-level and multimodal sentiment analysis. To fill this gap, we are among the first to put forward the new task, aspect based multimodal sentiment analysis, and propose a novel Multi-Interactive Memory Network (MIMN) model for this task. Our model includes two interactive memory networks to supervise the textual and visual information with the given aspect, and learns not only the interactive influences between cross-modality data but also the self influences in single-modality data. We provide a new publicly available multimodal aspect-level sentiment dataset to evaluate our model, and the experimental results demonstrate the effectiveness of our proposed model for this new task.