Arrow Research search

Author name cluster

Ang Lv

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
2 author rows

Possible papers

6

ICML Conference 2025 Conference Paper

Autonomy-of-Experts Models

  • Ang Lv
  • Ruobing Xie
  • Yining Qian
  • Songhao Wu
  • Xingwu Sun
  • Zhanhui Kang
  • Di Wang 0052
  • Rui Yan 0001

Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router’s decision-making and the experts’ execution is a critical yet overlooked issue, leading to suboptimal expert selection and learning. To address this, we propose Autonomy-of-Expert (AoE), a novel MoE paradigm in which experts autonomously select themselves to process inputs. AoE is based on the insight that an expert is aware of its own capacity to effectively process a token, an awareness reflected in the scale of its internal activations. In AoE, routers are removed; instead, experts pre-compute internal activations for inputs and are ranked based on their activation norms. Only the top-ranking experts proceed with the forward pass, while the others abort. The overhead of pre-computing activations is reduced through a low-rank weight factorization. This self-evaluating-then-partner-comparing approach ensures improved expert selection and effective learning. We pre-train language models having 700M up to 4B parameters, demonstrating that AoE outperforms traditional MoE models with comparable efficiency.

IJCAI Conference 2025 Conference Paper

GETMusic: Generating Music Tracks with a Unified Representation and Diffusion Framework

  • Ang Lv
  • Xu Tan
  • Peiling Lu
  • Wei Ye
  • Shikun Zhang
  • Jiang Bian
  • Rui Yan

Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrument tracks based on provided source tracks. In practical scenarios where there’s a predefined ensemble of tracks and various composition needs, an efficient and effective generative model that can generate any target tracks based on the other tracks becomes crucial. However, previous efforts have fallen short in addressing this necessity due to limitations in their music representations and models. In this paper, we introduce a framework known as GETMusic, with ``GET'' standing for ``GEnerate music Tracks. '' This framework encompasses a novel music representation ``GETScore'' and a diffusion model ``GETDiff. '' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. At a training step, each track of a music piece is randomly selected as either the target or source. The training involves two processes: In the forward process, target tracks are corrupted by masking their tokens, while source tracks remain as the ground truth; in the denoising process, GETDiff is trained to predict the masked target tokens conditioning on the source tracks. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations. Our experiments demonstrate that the versatile GETMusic outperforms prior works proposed for certain specific composition tasks.

NeurIPS Conference 2025 Conference Paper

PolarQuant: Leveraging Polar Transformation for Key Cache Quantization and Decoding Acceleration

  • Songhao Wu
  • Ang Lv
  • xiao feng
  • Yufei Zhang
  • Xun Zhang
  • Guojun Yin
  • Wei Lin
  • Rui Yan

The increasing demand for long-context generation has made the KV cache in large language models a bottleneck in memory consumption. Quantizing the cache to lower bit widths is an effective way to reduce memory costs; however, previous methods struggle with key cache quantization due to outliers, resulting in suboptimal performance. We propose a novel quantization approach PolarQuant, which provides a new perspective for key cache quantization and efficiently addresses the outlier dilemma. We observe that the distribution of the key states reveals well-structured patterns under polar transformation. Outliers generally appear in only one of the two dimensions, which are rotated together by a specific angle when rotary position embeddings are applied. When represented as two-dimensional vectors, these dimensions exhibit well-organized patterns, with radii and angles smoothly distributed in polar space. This alleviates the channel-wise outliers, making them well-suited for key cache quantization. PolarQuant divides key vectors into groups of two-dimensional sub-vectors, encoding them as the quantized radius and the polar angle, rather than quantizing original key vectors directly. PolarQuant achieves the superior efficiency in KV cache quantization and accelerates the decoding process by turning the query-key inner product into a table lookup, all while maintaining the downstream performance of full-precision models. Our code is available at https: //github. com/ericshwu/PolarQuant.

NeurIPS Conference 2024 Conference Paper

Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

  • Hongzhan Lin
  • Ang Lv
  • Yuhan Chen
  • Chen Zhu
  • Yang Song
  • Hengshu Zhu
  • Rui Yan

Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions. Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging. In this paper, for LLMs utilizing RoPE as position embeddings, we introduce a novel method called "Mixture of In-Context Experts" (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy: (1) MoICE views each RoPE angle as an 'in-context' expert, demonstrated to be capable of directing the attention of a head to specific contextual positions. Consequently, each attention head flexibly processes tokens using multiple RoPE angles dynamically selected by the router to attend to the needed positions. This approach mitigates the risk of overlooking essential contextual information. (2) The router-only training strategy entails freezing LLM parameters and exclusively updating routers for only a few steps. When applied to open-source LLMs including Llama and Mistral, MoICE surpasses prior methods across multiple tasks on long context understanding and generation, all while maintaining commendable inference efficiency.

IJCAI Conference 2024 Conference Paper

Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation

  • Ang Lv
  • Xu Tan
  • Tao Qin
  • Tie-Yan Liu
  • Rui Yan

Current lyric-to-melody generation methods struggle with the lack of paired lyric-melody data to train, and the lack of adherence to composition guidelines, resulting in melodies that do not sound human-composed. To address these issues, we propose a novel paradigm called Re-creation of Creations (ROC) that combines the strengths of both rule-based and neural-based methods. ROC consists of a two-stage generation-retrieval pipeline: the creation and re-creation stages. In the creation stage, we train a melody language model using melody data to generate high-quality music fragments, which are stored in a database indexed by key features. In the re-creation stage, users provide lyrics and a preferred chord progression, and ROC infers melody features for each lyric sentence. By querying the database, we obtain relevant melody fragments that satisfy composition guidelines, and these candidates are filtered, re-ranked, and concatenated based on the guidelines and the melody language model scores. ROC offers two main advantages: it does not require paired lyric-melody data, and it incorporates commonly used composition guidelines, resulting in music that sounds more human-composed with better controllability. Both objective and subjective evaluation results on English and Chinese lyrics show the effectiveness of ROC.

ICLR Conference 2022 Conference Paper

Target-Side Input Augmentation for Sequence to Sequence Generation

  • Shufang Xie 0003
  • Ang Lv
  • Yingce Xia
  • Lijun Wu 0003
  • Tao Qin 0001
  • Tie-Yan Liu
  • Rui Yan 0001

Autoregressive sequence generation, a prevalent task in machine learning and natural language processing, generates every target token conditioned on both a source input and previously generated target tokens. Previous data augmentation methods, which have been shown to be effective for the task, mainly enhance source inputs (e.g., injecting noise into the source sequence by random swapping or masking, back translation, etc.) while overlooking the target-side augmentation. In this work, we propose a target-side augmentation method for sequence generation. In training, we use the decoder output probability distributions as soft indicators, which are multiplied with target token embeddings, to build pseudo tokens. These soft pseudo tokens are then used as target tokens to enhance the training. We conduct comprehensive experiments on various sequence generation tasks, including dialog generation, machine translation, and abstractive summarization. Without using any extra labeled data or introducing additional model parameters, our method significantly outperforms strong baselines. The code is available at https://github.com/TARGET-SIDE-DATA-AUG/TSDASG.