Arrow Research search

Author name cluster

Jiaxi Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
1 author row

Possible papers

7

AAAI Conference 2025 Conference Paper

Fine-Tuning Language Models with Collaborative and Semantic Experts

  • Jiaxi Yang
  • Binyuan Hui
  • Min Yang
  • Jian Yang
  • Lei Zhang
  • Qiang Qu
  • Junyang Lin

Recent advancements in large language models (LLMs) have broadened their application scope but revealed challenges in balancing capabilities across general knowledge, coding, and mathematics. To address this, we introduce a Collaborative and Semantic Experts (CoE) approach for supervised fine-tuning (SFT), which employs a two-phase training strategy. Initially, expert training fine-tunes the feed-forward network on specialized datasets, developing distinct experts in targeted domains. Subsequently, expert leveraging synthesizes these trained experts into a structured model with semantic guidance to activate specific experts, enhancing performance and interpretability. Evaluations on comprehensive benchmarks across MMLU, HumanEval, GSM8K, MT-Bench, and AlpacaEval confirm CoE's efficacy, demonstrating improved performance and expert collaboration in diverse tasks, significantly outperforming traditional SFT methods.

AAAI Conference 2025 Conference Paper

Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

  • Lei Zhang
  • Yunshui Li
  • Jiaming Li
  • Xiaobo Xia
  • Jiaxi Yang
  • Run Luo
  • Minzheng Wang
  • Longze Chen

Some of the latest released Code Large Language Models (Code LLMs) have been trained on repository-level code data, enabling them to perceive repository structures and utilize cross-file code information. This capability allows us to directly concatenate the content of repository code files in prompts to achieve repository-level code completion. However, in real development scenarios, directly concatenating all code repository files in a prompt can easily exceed the context window of Code LLMs, leading to a significant decline in completion performance. Additionally, overly long prompts can increase completion latency, negatively impacting the user experience. In this study, we conducted extensive experiments, including completion error analysis, topology dependency analysis, and cross-file content analysis, to investigate the factors affecting repository-level code completion. Based on the conclusions drawn from these preliminary experiments, we proposed a strategy called **Hierarchical Context Pruning (HCP)** to construct high-quality completion prompts. We applied the **HCP** to six Code LLMs and evaluated them on the CrossCodeEval dataset. The experimental results showed that, compared to previous methods, the prompts constructed using our **HCP** strategy achieved higher completion accuracy on five out of six Code LLMs. Additionally, the **HCP** managed to keep the prompt length around 8k tokens (whereas the full repository code is approximately 50k tokens), significantly improving completion throughput. Our code and data will be publicly available.

NeurIPS Conference 2025 Conference Paper

Parallel Scaling Law for Language Models

  • Mouxiang Chen
  • Binyuan Hui
  • Zeyu Cui
  • Jiaxi Yang
  • Dayiheng Liu
  • Jianling Sun
  • Junyang Lin
  • Zhongxin Liu

It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce another and more inference-efficient scaling paradigm: increasing the model's parallel computation during both training and inference time. We apply $P$ diverse and learnable transformations to the input, execute forward passes of the model in parallel, and dynamically aggregate the $P$ outputs. This method, namely parallel scaling (ParScale), scales parallel computation by reusing existing parameters and can be applied to any model structure, optimization procedure, data, or task. We theoretically propose a new scaling law and validate it through large-scale pre-training, which shows that a model with $P$ parallel streams is similar to scaling the parameters by $\mathcal O(\log P)$ while showing superior inference efficiency. For example, ParScale can use up to 22$\times$ less memory increase and 6$\times$ less latency increase compared to parameter scaling that achieves the same performance improvement. It can also recycle an off-the-shelf pre-trained model into a parallelly scaled one by post-training on a small amount of tokens, further reducing the training budget. The new scaling law we discovered potentially facilitates the deployment of more powerful models in low-resource scenarios, and provides an alternative perspective for the role of computation in machine learning. Our code and 67 trained model checkpoints are publicly available at https: //github. com/QwenLM/ParScale and https: //huggingface. co/ParScale.

AAAI Conference 2025 Conference Paper

Separating the Wheat from the Chaff: Spatio-Temporal Transformer with View-interweaved Attention for Photon-Efficient Depth Sensing

  • Letian Yu
  • Jiaxi Yang
  • Bo Dong
  • Qirui Bao
  • Yuanbo Wang
  • Felix Heide
  • Xiaopeng Wei
  • Xin Yang

Time-resolved imaging is an emerging sensing modality that has been shown to enable advanced applications, including remote sensing, fluorescence lifetime imaging, and even non-line-of-sight sensing. Single-photon avalanche diodes (SPADs) outperform relevant time-resolved imaging technologies thanks to their excellent photon sensitivity and superior temporal resolution on the order of tens of picoseconds. The capability of exceeding the sensing limits of conventional cameras for SPADs also draws attention to the photon-efficient imaging area. However, photon-efficient imaging under degraded conditions with low photon counts and low signal-to-background ratio (SBR) still remains an inevitable challenge. In this paper, we propose a spatio-temporal transformer network for photon-efficient imaging under low-flux scenarios. In particular, we introduce a view-interweaved attention mechanism (VIAM) to extract both spatial-view and temporal-view self-attention in each transformer block. We also design an adaptive-weighting scheme to dynamically adjust the weights between different views of self-attention in VIAM for different signal-to-background levels. We extensively validate and demonstrate the effectiveness of our approach on the simulated Middlebury dataset and a specially self-collected dataset with real-world-captured SPAD measurements and well-annotated ground truth depth maps.

NeurIPS Conference 2023 Conference Paper

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

  • Jinyang Li
  • Binyuan Hui
  • Ge Qu
  • Jiaxi Yang
  • Binhua Li
  • Bowen Li
  • Bailin Wang
  • Bowen Qin

Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, GPT-4 and Claude-2 have shown impressive results in this task. However, most of the prevalent benchmarks, i. e. , Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and real-world applications. To mitigate this gap, we present BIRD, a BIg benchmark for laRge-scale Database grounded in text-to-SQL tasks, containing 12, 751 pairs of text-to-SQL data and 95 databases with a total size of 33. 4 GB, spanning 37 professional domains. Our emphasis on database values highlights the new challenges of dirty database contents, external knowledge between NL questions and database contents, and SQL efficiency, particularly in the context of massive databases. To solve these problems, text-to-SQL models must feature database value comprehension in addition to semantic parsing. The experimental results demonstrate the significance of database values in generating accurate text-to-SQLs for big databases. Furthermore, even the most popular and effective text-to-SQL models, i. e. GPT-4, only achieve 54. 89% in execution accuracy, which is still far from the human result of 92. 96%, proving that challenges still stand. We also provide an efficiency analysis to offer insights into generating text-to-efficient-SQLs that are beneficial to industries. We believe that BIRD will contribute to advancing real-world applications of text-to-SQL research. The leaderboard and source code are available: https: //bird-bench. github. io/.

AAAI Conference 2021 Conference Paper

A User-Adaptive Layer Selection Framework for Very Deep Sequential Recommender Models

  • Lei Chen
  • Fajie Yuan
  • Jiaxi Yang
  • Xiang Ao
  • Chengming Li
  • Min Yang

Sequential recommender systems (SRS) have become a research hotspot in recent studies. Because of the requirement in capturing user’s dynamic interests, sequential neural network based recommender models often need to be stacked with more hidden layers (e. g. , up to 100 layers) compared with standard collaborative filtering methods. However, the high network latency has become the main obstacle when deploying very deep recommender models into a production environment. In this paper, we argue that the typical prediction framework that treats all users equally during the inference phase is inefficient in running time, as well as sub-optimal in accuracy. To resolve such an issue, we present SkipRec, an adaptive inference framework by learning to skip inactive hidden layers on a per-user basis. Specifically, we devise a policy network to automatically determine which layers should be retained and which layers are allowed to be skipped, so as to achieve user-specific decisions. To derive the optimal skipping policy, we propose using gumbel softmax and reinforcement learning to solve the non-differentiable problem during backpropagation. We perform extensive experiments on three real-world recommendation datasets, and demonstrate that SkipRec attains comparable or better accuracy with much less inference time.

IJCAI Conference 2021 Conference Paper

Differentially Private Correlation Alignment for Domain Adaptation

  • Kaizhong Jin
  • Xiang Cheng
  • Jiaxi Yang
  • Kaiyuan Shen

Domain adaptation solves a learning problem in a target domain by utilizing the training data in a different but related source domain. As a simple and efficient method for domain adaptation, correlation alignment transforms the distribution of the source domain by utilizing the covariance matrix of the target domain, such that a model trained on the transformed source data can be applied to the target data. However, when source and target domains come from different institutes, exchanging information between the two domains might pose a potential privacy risk. In this paper, for the first time, we propose a differentially private correlation alignment approach for domain adaptation called PRIMA, which can provide privacy guarantees for both the source and target data. In PRIMA, to relieve the performance degradation caused by perturbing the covariance matrix in high dimensional setting, we present a random subspace ensemble based covariance estimation method which splits the feature spaces of source and target data into several low dimensional subspaces. Moreover, since perturbing the covariance matrix may destroy its positive semi-definiteness, we develop a shrinking based method for the recovery of positive semi-definiteness of the covariance matrix. Experimental results on standard benchmark datasets confirm the effectiveness of our approach.