Arrow Research search

Author name cluster

Ye Qiao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
1 author row

Possible papers

4

AAAI Conference 2026 Short Paper

APEX-Q: Arbitrary-dimension Product-EXtension Quantization for Accelerated LLM Deployment (Student Abstract)

  • Yian Wang
  • Ye Qiao
  • Sitao Huang
  • Hyoukjun Kwon

We present APEX-Q, a flexible product quantization framework for compressing large language models. Unlike prior multi-codebook quantization methods with fixed partitions, APEX-Q supports arbitrary-dimensional tensor quantization, better capturing weight redundancy. It achieves performance on par with 4-bit and 8-bit baselines, enables post-training quantization without retraining, and reveals key trade-offs across subvector dimensions, codebook sizes, and hardware efficiency. APEX-Q thus provides a unified, hardware-friendly approach to scalable LLM deployment.

AAAI Conference 2026 Short Paper

HARK: Hierarchical Agentic Retrieval with Keyframing for Video Understanding (Student Abstract)

  • Jingcheng Li
  • Ye Qiao
  • Sitao Huang

Current video understanding models struggle with temporal reasoning and efficient processing while balancing detail preservation with computational efficiency. We propose a hierarchical memory system that segments videos into action and scene units, combined with question-aware agentic keyframe selection. Our method achieves 70.3% overall accuracy on VideoMME short video benchmarks.

AAAI Conference 2026 Short Paper

Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs (Student Abstract)

  • Ye Qiao
  • Sitao Huang

Extending LLM context windows is key for long-range tasks. RoPE-based position interpolation (PI) scales input length without retraining, and post-training quantization (PTQ) enables efficient deployment; however, combining PI with PTQ degrades accuracy due to long-context aliasing, dynamic-range dilation, axis-grid anisotropy, and outlier shifts that induce position-dependent logit noise. We give the first systematic analysis of PI+PTQ and propose two diagnostics: Interpolation Pressure (per-band phase-scaling sensitivity) and Tail Inflation Ratio (outlier shift from short to long contexts). We then introduce Q-ROAR, a RoPE-aware, weight-only stabilization that bands RoPE dimensions and lightly searches per-band scales for W_Q,W_K, with an optional symmetric variant. Q-ROAR needs only a tiny long-context dev set and no fine-tuning or kernel changes, recovering up to 0.7% accuracy and more than 14% GovReport perplexity reduction while preserving short-context performance.