Author name cluster

Qi Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

AAAI Conference 2026 Conference Paper

Circuit-Think: A Multimodal Reasoning Framework for Automated Circuit-to-Netlist Translation with Trajectory-Guided Reinforcement Learning

Yuqi Jiang
Yupeng Hu
Jinyuan Deng
Xiaotian Qiu
Yucheng Cui
Xuyang He
Ruidong Li
Qi Sun

Vision Language Models (VLMs) have shown strong performance in multimodal understanding, offering promise for the circuit-to-netlist translation task. However, the diverse component symbols and complex connections in circuit images challenge VLMs in understanding physical layouts and reasoning for electrical connection logic. To address these, we propose Circuit-Think, the first multimodal reasoning framework for the automated circuit-to-netlist translation task, which employs a Trajectory-Guided Reinforcement Learning (TGRL) paradigm for structured logical reasoning on circuit images. Circuit-Think initializes reasoning capabilities through supervised fine-tuning (SFT) on image-netlist pairs, then optimizes reasoning trajectories and netlist generation decisions using TGRL. Firstly, TGRL introduces a step-by-step reasoning paradigm, which guides the model with stepwise reward functions to simulate the human cognitive trajectory of ``identifying ports, recognizing devices, and inferring connections''. Secondly, we customize a multi-level reward that maps reasoning and answers into graph structures and node sets, jointly optimizing logical consistency and netlist accuracy via graph similarity and set matching. Thirdly, TGRL contains a reflective learning mechanism for low-scoring samples, which corrects the reasoning trajectory through reference answers as hints, avoiding local optima caused by sparse reward signals or erroneous reasoning paths. Moreover, we construct a circuit image-netlist reasoning dataset with 3,100 samples, offering step-by-step annotations for converting circuit images to netlists. Extensive experiments demonstrate that Circuit-Think achieves SOTA netlist accuracy and significantly improves the accuracy of downstream tasks.

PDF Details DOI

TMLR Journal 2026 Journal Article

Cost-Aware Routing for Efficient Text-To-Image Generation

Qinchan Li
Kenneth Chen
Changyue Su
Wittawat Jitkrittum
Qi Sun
Patsorn Sangkloy

Diffusion models are well known for their ability to generate a high-fidelity image for an input prompt through an iterative denoising process. Unfortunately, the high fidelity also comes at a high computational cost due to the inherently sequential generative process. In this work, we seek to optimally balance quality and computational cost, and propose a framework to allow the amount of computation to vary for each prompt, depending on its complexity. Each prompt is automatically routed to the most appropriate text-to-image generation function, which may correspond to a distinct number of denoising steps of a diffusion model, or a disparate, independent text-to-image model. Unlike uniform cost reduction techniques (e.g., distillation, model quantization), our approach achieves the optimal trade-off by learning to reserve expensive choices (e.g., 100+ denoising steps) only for a few complex prompts, and employ more economical choices (e.g., small distilled model) for less sophisticated prompts. We empirically demonstrate on COCO and DiffusionDB that by learning to route to nine already-trained text-to-image models, our approach is able to deliver an average quality that is higher than that achievable by any of these models alone.

PDF Details

AAAI Conference 2026 Conference Paper

Optimization Method for Surrogate Function in Spiking Neural Networks Based on Membrane Potential Distribution

Qi Sun
Zhen Cao
Kaige Geng
Ziyi Zhang
Biao Hou

Spiking Neural Networks (SNNs) offer promising energy efficiency and temporal sparsity for edge intelligence, but their training remains difficult due to gradient mismatch, membrane potential drift, and discretization errors. In this paper, we propose a membrane potential-guided surrogate optimization(MPO) framework that dynamically aligns the surrogate function with the membrane potential distribution to enhance the gradient propagation. Specifically, we introduce a KL-divergence-based regularization to stabilize membrane potential dynamics, and an adaptive width constraint to synchronize the surrogate gradient range with neural activity statistics. Additionally, we design a spike discretization error metric and a correction strategy to mitigate temporal discretization effects. Experiments on CIFAR-10, CIFAR-100, and ImageNet show our method achieves 94.76%, 74.20%, and 65.70% top-1 accuracy respectively, while improving gradient stability and energy efficiency. This work provides a principled optimization scheme for robust and scalable SNN training in practical neuromorphic systems.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents

Jiangyuan Wang
Kejun Xiao
Qi Sun
Huaipeng Zhao
Tao Luo
Jian Dong Zhang
Xiaoyi Zeng

Existing benchmarks in e-commerce primarily focus on basic user intents, such as finding or purchasing products. However, real-world users often pursue more complex goals, such as applying vouchers, managing budgets, and finding multi-products seller. To bridge this gap, we propose ShoppingBench, a novel end-to-end shopping benchmark designed to encompass increasingly challenging levels of grounded intent. Specifically, we propose a scalable framework to simulate user instructions based on various intents derived from sampled real-world products. To facilitate consistent and reliable evaluations, we provide a large-scale shopping sandbox that serves as an interactive simulated environment, incorporating over 2.5 million real-world products. Experimental results demonstrate that even state-of-the-art language agents (such as GPT-4.1) achieve absolute success rates under 50% on our benchmark tasks, highlighting the significant challenges posed by our ShoppingBench. In addition, we propose a trajectory distillation strategy and leverage supervised fine-tuning, along with reinforcement learning on synthetic trajectories, to distill the capabilities of a large language agent into a smaller one. As a result, our trained agent achieves competitive performance compared to GPT-4.1.

PDF Details DOI

ICLR Conference 2025 Conference Paper

An Evolved Universal Transformer Memory

Edoardo Cetin
Qi Sun
Tianyu Zhao 0001
Yujin Tang

Prior methods propose to offset the escalating costs of modern foundation models by dropping specific parts of their contexts with hand-designed rules, while attempting to preserve their original performance. We overcome this trade-off with Neural Attention Memory Models (NAMMs), introducing a learned network for memory management that improves both the performance and efficiency of transformers. We evolve NAMMs atop pre-trained transformers to provide different latent contexts focusing on the most relevant information for individual layers and attention heads. NAMMs are universally applicable to any model using self-attention as they condition exclusively on the values in the produced attention matrices. Learning NAMMs on a small set of problems, we achieve substantial performance improvements across multiple long-context benchmarks while cutting the model's input contexts up to a fraction of the original sizes. We show the generality of our conditioning enables zero-shot transfer of NAMMs trained only on language to entirely new transformer architectures even across input modalities, with their benefits carrying over to vision and reinforcement learning.

Details

JBHI Journal 2025 Journal Article

Benchmarking Large Language Models in Evidence-Based Medicine

Jin Li
Yiyan Deng
Qi Sun
Junjie Zhu
Yu Tian
Jingsong Li
Tingting Zhu

Evidence-based medicine (EBM) represents a paradigm of providing patient care grounded in the most current and rigorously evaluated research. Recent advances in large language models (LLMs) offer a potential solution to transform EBM by automating labor-intensive tasks and thereby improving the efficiency of clinical decision-making. This study explores integrating LLMs into the key stages in EBM, evaluating their ability across evidence retrieval (PICO extraction, biomedical question answering), synthesis (summarizing randomized controlled trials), and dissemination (medical text simplification). We conducted a comparative analysis of seven LLMs, including both proprietary and open-source models, as well as those fine-tuned on medical corpora. Specifically, we benchmarked the performance of various LLMs on each EBM task under zero-shot settings as baselines, and employed prompting techniques, including in-context learning, chain-of-thought reasoning, and knowledge-guided prompting to enhance their capabilities. Our extensive experiments revealed the strengths of LLMs, such as remarkable understanding capabilities even in zero-shot settings, strong summarization skills, and effective knowledge transfer via prompting. Promoting strategies such as knowledge-guided prompting proved highly effective (e. g. , improving the performance of GPT-4 by 13. 10% over zero-shot in PICO extraction). However, the experiments also showed limitations, with LLM performance falling well below state-of-the-art baselines like PubMedBERT in handling named entity recognition tasks. Moreover, human evaluation revealed persisting challenges with factual inconsistencies and domain inaccuracies, underscoring the need for rigorous quality control before clinical application. This study provides insights into enhancing EBM using LLMs while highlighting critical areas for further research.

Details DOI

ICML Conference 2025 Conference Paper

Concurrent Reinforcement Learning with Aggregated States via Randomized Least Squares Value Iteration

Yan Chen
Qinxun Bai
Yiteng Zhang
Maria Dimakopoulou
Shi Dong
Qi Sun
Zhengyuan Zhou

Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents concurently explore an environment. The theoretical results established in this work tender an affirmative answer to this question. We adapt the concurrent learning framework to randomized least-squares value iteration (RLSVI) with aggregated state representation. We demonstrate polynomial worst-case regret bounds in both finite- and infinite-horizon environments. In both setups the per-agent regret decreases at an optimal rate of $\Theta\left(\frac{1}{\sqrt{N}}\right)$, highlighting the advantage of concurent learning. Our algorithm exhibits significantly lower space complexity compared to Russo (2019) and Agrawal et. al (2021). We reduce the space complexity by a factor of $K$ while incurring only a $\sqrt{K}$ increase in the worst-case regret bound, compared to Russo (2019) and Agrawal et. al (2021). Interestingly, our algorithm improves the worst-case regret bound of Russo (2019) by a factor of $H^{1/2}$, matching the improvement in Agrawal et. al (2021). However, this result is achieved through a fundamentally different algorithmic enhancement and proof technique. Additionally, we conduct numerical experiments to demonstrate our theoretical findings.

Details

ICLR Conference 2025 Conference Paper

EG4D: Explicit Generation of 4D Object without Score Distillation

Qi Sun
Zhiyang Guo
Ziyu Wan
Jing Nathan Yan
Shengming Yin
Wengang Zhou 0001
Jing Liao 0001
Houqiang Li

In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and Janus problem. Therefore, inspired by recent progress of video diffusion models, we propose to optimize a 4D representation by explicitly generating multi-view videos from one input image. However, it is far from trivial to handle practical challenges faced by such a pipeline, including dramatic temporal inconsistency, inter-frame geometry and texture diversity, and semantic defects brought by video generation results. To address these issues, we propose EG4D, a novel multi-stage framework that generates high-quality and consistent 4D assets without score distillation. Specifically, collaborative techniques and solutions are developed, including an attention injection strategy to synthesize temporal-consistent multi-view videos, a robust and efficient dynamic reconstruction method based on Gaussian Splatting, and a refinement stage with diffusion prior for semantic restoration. The qualitative comparisons and quantitative results demonstrate that our framework outperforms the baselines in generation quality by a considerable margin.

Details

EAAI Journal 2025 Journal Article

Named entity recognition based on anchor span for manufacturing knowledge extraction

Yahui Li
Qi Sun
Chunjie Zhou
Lu Liu
Yu-Chu Tian

Details DOI

AAAI Conference 2025 Conference Paper

Transformer Layers as Painters

Qi Sun
Marc Pickett
Aakash Kumar Nain
Llion Jones

Despite their nearly universal adoption for large language models, the internal workings of transformers are not well understood. We aim to better understand the impact of removing or reorganizing information throughout the layers of a pretrained transformer. Such an understanding could both yield better usage of existing models as well as to make architectural improvements to produce new variants. We present a series of empirical studies on frozen models that show that the lower and final layers of pretrained transformers differ from middle layers, but that middle layers have a surprising amount of uniformity. We further show that some classes of problems have robustness to skipping layers, running the layers in an order different from how they were trained, or running the layers in parallel. Our observations suggest that even frozen pretrained models may gracefully trade accuracy for latency by skipping layers or running layers in parallel.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Transformer-Squared: Self-adaptive LLMs

Qi Sun
Edoardo Cetin
Yujin Tang

Self-adaptive large language models (LLMs) aim to solve the challenges posed by traditional fine-tuning methods, which are often computationally intensive and static in their ability to handle diverse tasks. We introduce Transformer-Squared, a novel self-adaptation framework that adapts LLMs for unseen tasks in real-time by selectively adjusting only the singular components of their weight matrices. During inference, Transformer-Squared employs a two-pass mechanism: first, a dispatch system identifies the task properties, and then task-specific 'expert' vectors, trained using reinforcement learning, are dynamically mixed to obtain targeted behavior for the incoming prompt. Our method consistently outperforms ubiquitous approaches such as LoRA, with fewer parameters and greater efficiency. Furthermore, Transformer-Squared demonstrates versatility across different LLM architectures and modalities, including vision-language tasks. Transformer-Squared represents a significant leap forward, offering a scalable, efficient solution for enhancing the adaptability and task-specific performance of LLMs, paving the way for truly dynamic, self-organizing AI systems.

Details

EAAI Journal 2025 Journal Article

TSPCS-net: Two-stage pavement crack segmentation network based on encoder-decoder architecture

Biao Yue
Jianwu Dang
Qi Sun
Yangping Wang
Yongzhi Min
Feng Wang

Details DOI

IJCAI Conference 2024 Conference Paper

RSAP-DFM: Regime-Shifting Adaptive Posterior Dynamic Factor Model for Stock Returns Prediction

Quanzhou Xiang
Zhan Chen
Qi Sun
Rujun Jiang

As the latest development of asset pricing research, how to use machine learning to improve the performance of factor models has become a topic of concern in recent years. The variability of the instantaneous macro environment brings great difficulties to quantitative investment, so the extended factor model must learn how to self-adapt to extract the macro pattern from the massive stock volume and price information, and how to continuously map the extracted macro pattern to the stock investment is also an open question. To this end, we propose the first continuous regime-based dynamic factor model, RSAP-DFM, which adaptively extracts continuous macroeconomic information and completes the dynamic explicit mapping of stock returns for the first time through dual regime shifting, while the adversarial posterior factors effectively correct the mapping deviation of prior factors. In addition, our model integrates an innovative two-stage optimization algorithm and normally distributed sampling, which further enhances the robustness of the model. Performance on three real stock datasets validates the validity of our model, which exceeds any previous methods available.

PDF Details DOI

AAAI Conference 2023 Conference Paper

AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution

Yuxuan Zhao
Qi Sun
Zhuolun He
Yang Bai
Bei Yu

Deep learning frameworks optimize the computation graphs and intra-operator computations to boost the inference performance on GPUs, while inter-operator parallelism is usually ignored. In this paper, a unified framework, AutoGraph, is proposed to obtain highly optimized computation graphs in favor of parallel executions of GPU kernels. A novel dynamic programming algorithm, combined with backtracking search, is adopted to explore the optimal graph optimization solution, with the fast performance estimation from the mixed critical path cost. Accurate runtime information based on GPU Multi-Stream launched with CUDA Graph is utilized to determine the convergence of the optimization. Experimental results demonstrate that our method achieves up to 3.47x speedup over existing graph optimization methods. Moreover, AutoGraph outperforms state-of-the-art parallel kernel launch frameworks by up to 1.26x.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Context-Based Contrastive Learning for Scene Text Recognition

Xinyun Zhang
Binwu Zhu
Xufeng Yao
Qi Sun
Ruiyu Li
Bei Yu

Pursuing accurate and robust recognizers has been a longlasting goal for scene text recognition (STR) researchers. Recently, attention-based methods have demonstrated their effectiveness and achieved impressive results on public benchmarks. The attention mechanism enables models to recognize scene text with severe visual distortions by leveraging contextual information. However, recent studies revealed that the implicit over-reliance of context leads to catastrophic out-ofvocabulary performance. On the contrary to the superior accuracy of the seen text, models are prone to misrecognize unseen text even with good image quality. We propose a novel framework, Context-based contrastive learning (ConCLR), to alleviate this issue. Our proposed method first generates characters with different contexts via simple image concatenation operations and then optimizes contrastive loss on their embeddings. By pulling together clusters of identical characters within various contexts and pushing apart clusters of different characters in embedding space, ConCLR suppresses the side-effect of overfitting to specific contexts and learns a more robust representation. Experiments show that ConCLR significantly improves out-of-vocabulary generalization and achieves stateof-the-art performance on public benchmarks together with attention-based recognizers.

PDF Details

IJCAI Conference 2020 Conference Paper

Multi-scale Two-way Deep Neural Network for Stock Trend Prediction

Guang Liu
Yuzhao Mao
Qi Sun
Hailong Huang
Weiguo Gao
Xuan Li
Jianping Shen
Ruifan Li

Stock Trend Prediction(STP) has drawn wide attention from various fields, especially Artificial Intelligence. Most previous studies are single-scale oriented which results in information loss from a multi-scale perspective. In fact, multi-scale behavior is vital for making intelligent investment decisions. A mature investor will thoroughly investigate the state of a stock market at various time scales. To automatically learn the multi-scale information in stock data, we propose a Multi-scale Two-way Deep Neural Network. It learns multi-scale patterns from two types of scale-information, wavelet-based and downsampling-based, by eXtreme Gradient Boosting and Recurrent Convolutional Neural Network, respectively. After combining the learned patterns from the two-way, our model achieves state-of-the-art performance on FI-2010 and CSI-2016, where the latter is our published long-range stock dataset to help future studies for STP task. Extensive experimental results on the two datasets indicate that multi-scale information can significantly improve the STP performance and our model is superior in capturing such information.

PDF Details DOI

NeurIPS Conference 2018 Conference Paper

Nonlocal Neural Networks, Nonlocal Diffusion and Nonlocal Modeling

Yunzhe Tao
Qi Sun
Qiang Du
Wei Liu

Nonlocal neural networks have been proposed and shown to be effective in several computer vision tasks, where the nonlocal operations can directly capture long-range dependencies in the feature space. In this paper, we study the nature of diffusion and damping effect of nonlocal networks by doing spectrum analysis on the weight matrices of the well-trained networks, and then propose a new formulation of the nonlocal block. The new block not only learns the nonlocal interactions but also has stable dynamics, thus allowing deeper nonlocal structures. Moreover, we interpret our formulation from the general nonlocal modeling perspective, where we make connections between the proposed nonlocal network and other nonlocal models, such as nonlocal diffusion process and Markov jump process.

PDF Details