Author name cluster

Ang Lv

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

ICML Conference 2025 Conference Paper

Autonomy-of-Experts Models

Ang Lv
Ruobing Xie
Yining Qian
Songhao Wu
Xingwu Sun
Zhanhui Kang
Di Wang 0052
Rui Yan 0001

Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router’s decision-making and the experts’ execution is a critical yet overlooked issue, leading to suboptimal expert selection and learning. To address this, we propose Autonomy-of-Expert (AoE), a novel MoE paradigm in which experts autonomously select themselves to process inputs. AoE is based on the insight that an expert is aware of its own capacity to effectively process a token, an awareness reflected in the scale of its internal activations. In AoE, routers are removed; instead, experts pre-compute internal activations for inputs and are ranked based on their activation norms. Only the top-ranking experts proceed with the forward pass, while the others abort. The overhead of pre-computing activations is reduced through a low-rank weight factorization. This self-evaluating-then-partner-comparing approach ensures improved expert selection and effective learning. We pre-train language models having 700M up to 4B parameters, demonstrating that AoE outperforms traditional MoE models with comparable efficiency.

Details

IJCAI Conference 2025 Conference Paper

GETMusic: Generating Music Tracks with a Unified Representation and Diffusion Framework

Ang Lv
Xu Tan
Peiling Lu
Wei Ye
Shikun Zhang
Jiang Bian
Rui Yan

Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrument tracks based on provided source tracks. In practical scenarios where there’s a predefined ensemble of tracks and various composition needs, an efficient and effective generative model that can generate any target tracks based on the other tracks becomes crucial. However, previous efforts have fallen short in addressing this necessity due to limitations in their music representations and models. In this paper, we introduce a framework known as GETMusic, with ``GET'' standing for ``GEnerate music Tracks. '' This framework encompasses a novel music representation ``GETScore'' and a diffusion model ``GETDiff. '' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. At a training step, each track of a music piece is randomly selected as either the target or source. The training involves two processes: In the forward process, target tracks are corrupted by masking their tokens, while source tracks remain as the ground truth; in the denoising process, GETDiff is trained to predict the masked target tokens conditioning on the source tracks. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations. Our experiments demonstrate that the versatile GETMusic outperforms prior works proposed for certain specific composition tasks.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

PolarQuant: Leveraging Polar Transformation for Key Cache Quantization and Decoding Acceleration

Songhao Wu
Ang Lv
xiao feng
Yufei Zhang
Xun Zhang
Guojun Yin
Wei Lin
Rui Yan

The increasing demand for long-context generation has made the KV cache in large language models a bottleneck in memory consumption. Quantizing the cache to lower bit widths is an effective way to reduce memory costs; however, previous methods struggle with key cache quantization due to outliers, resulting in suboptimal performance. We propose a novel quantization approach PolarQuant, which provides a new perspective for key cache quantization and efficiently addresses the outlier dilemma. We observe that the distribution of the key states reveals well-structured patterns under polar transformation. Outliers generally appear in only one of the two dimensions, which are rotated together by a specific angle when rotary position embeddings are applied. When represented as two-dimensional vectors, these dimensions exhibit well-organized patterns, with radii and angles smoothly distributed in polar space. This alleviates the channel-wise outliers, making them well-suited for key cache quantization. PolarQuant divides key vectors into groups of two-dimensional sub-vectors, encoding them as the quantized radius and the polar angle, rather than quantizing original key vectors directly. PolarQuant achieves the superior efficiency in KV cache quantization and accelerates the decoding process by turning the query-key inner product into a table lookup, all while maintaining the downstream performance of full-precision models. Our code is available at https: //github. com/ericshwu/PolarQuant.

PDF Details

NeurIPS Conference 2024 Conference Paper

Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

Hongzhan Lin
Ang Lv
Yuhan Chen
Chen Zhu
Yang Song
Hengshu Zhu
Rui Yan

Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions. Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging. In this paper, for LLMs utilizing RoPE as position embeddings, we introduce a novel method called "Mixture of In-Context Experts" (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy: (1) MoICE views each RoPE angle as an 'in-context' expert, demonstrated to be capable of directing the attention of a head to specific contextual positions. Consequently, each attention head flexibly processes tokens using multiple RoPE angles dynamically selected by the router to attend to the needed positions. This approach mitigates the risk of overlooking essential contextual information. (2) The router-only training strategy entails freezing LLM parameters and exclusively updating routers for only a few steps. When applied to open-source LLMs including Llama and Mistral, MoICE surpasses prior methods across multiple tasks on long context understanding and generation, all while maintaining commendable inference efficiency.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation

Ang Lv
Xu Tan
Tao Qin
Tie-Yan Liu
Rui Yan

Current lyric-to-melody generation methods struggle with the lack of paired lyric-melody data to train, and the lack of adherence to composition guidelines, resulting in melodies that do not sound human-composed. To address these issues, we propose a novel paradigm called Re-creation of Creations (ROC) that combines the strengths of both rule-based and neural-based methods. ROC consists of a two-stage generation-retrieval pipeline: the creation and re-creation stages. In the creation stage, we train a melody language model using melody data to generate high-quality music fragments, which are stored in a database indexed by key features. In the re-creation stage, users provide lyrics and a preferred chord progression, and ROC infers melody features for each lyric sentence. By querying the database, we obtain relevant melody fragments that satisfy composition guidelines, and these candidates are filtered, re-ranked, and concatenated based on the guidelines and the melody language model scores. ROC offers two main advantages: it does not require paired lyric-melody data, and it incorporates commonly used composition guidelines, resulting in music that sounds more human-composed with better controllability. Both objective and subjective evaluation results on English and Chinese lyrics show the effectiveness of ROC.

PDF Details DOI

ICLR Conference 2022 Conference Paper

Target-Side Input Augmentation for Sequence to Sequence Generation

Shufang Xie 0003
Ang Lv
Yingce Xia
Lijun Wu 0003
Tao Qin 0001
Tie-Yan Liu
Rui Yan 0001

Autoregressive sequence generation, a prevalent task in machine learning and natural language processing, generates every target token conditioned on both a source input and previously generated target tokens. Previous data augmentation methods, which have been shown to be effective for the task, mainly enhance source inputs (e.g., injecting noise into the source sequence by random swapping or masking, back translation, etc.) while overlooking the target-side augmentation. In this work, we propose a target-side augmentation method for sequence generation. In training, we use the decoder output probability distributions as soft indicators, which are multiplied with target token embeddings, to build pseudo tokens. These soft pseudo tokens are then used as target tokens to enhance the training. We conduct comprehensive experiments on various sequence generation tasks, including dialog generation, machine translation, and abstractive summarization. Without using any extra labeled data or introducing additional model parameters, our method significantly outperforms strong baselines. The code is available at https://github.com/TARGET-SIDE-DATA-AUG/TSDASG.

Details