Arrow Research search

Author name cluster

Qi Guo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

43 papers
2 author rows

Possible papers

43

EAAI Journal 2026 Journal Article

DWCL: Dual-Weighted Contrastive Learning for robust multi-view clustering

  • Hanning Yuan
  • Zhihui Zhang
  • Qi Guo
  • Lianhua Chi
  • Sijie Ruan
  • Wei Zhou
  • Jinhui Pang
  • Xiaoshuai Hao

Multi-view contrastive clustering (MVCC) aims to learn consistent clustering structures from multiple views by maximizing the agreement between view-specific representations. However, existing methods often construct all pairwise cross-views indiscriminately, leading to numerous unreliable view combinations and representation degeneration. To address these issues, we propose Dual-Weighted Contrastive Learning (DWCL), a novel framework that selects the most reliable view using the silhouette coefficient and constructs targeted cross-views with other views via a Best-Other (B-O) contrastive mechanism. This strategy reduces the number of cross-views from quadratic to linear complexity, significantly improving computational efficiency. Additionally, we introduce a dual-weighting strategy that combines a view quality weight and a view discrepancy weight to adaptively emphasize high-quality, low-discrepancy cross-views. Extensive experiments on eight multi-view datasets demonstrate that DWCL consistently outperforms state-of-the-art methods. Specifically, DWCL achieves an absolute accuracy improvement of 3. 5% on Caltech5V7 and 4. 4% on CIFAR10. Theoretical analysis further validates the advantages of DWCL in improving mutual information bounds and reducing the influence of low-quality views. These results confirm that DWCL is a robust and efficient solution for scalable multi-view clustering.

AAAI Conference 2026 Conference Paper

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems

  • Qi Guo
  • Xiaojun Jia
  • Shanmin Pang
  • Simeng Qin
  • Lin Wang
  • Ju Jia
  • Yang Liu
  • Qing Guo

Multimodal Large Language Models (MLLMs) are becoming integral to autonomous driving (AD) systems due to their strong vision-language reasoning capabilities. However, MLLMs are vulnerable to adversarial attacks—particularly adversarial patch attacks—which can pose serious threats in real-world scenarios. Existing patch-based attack methods are primarily designed for object detection models. Due to the more complex architectures and strong reasoning capabilities of MLLMs, these approaches perform poorly when transferred to MLLM-based systems. To address these limitations, we propose PhysPatch, a physically realizable and transferable adversarial patch framework tailored for MLLM-based AD systems. PhysPatch jointly optimizes patch location, shape, and content to enhance attack effectiveness and real-world applicability. It introduces a semantic-based mask initialization strategy for realistic placement, an SVD-based local alignment loss with patch-guided crop-resize to improve transferability, and a potential field-based mask refinement method. Extensive experiments across open-source, commercial, and reasoning-capable MLLMs demonstrate that PhysPatch significantly outperforms state-of-the-art (SOTA) methods in steering MLLM-based AD systems toward target-aligned perception and planning outputs. Moreover, PhysPatch consistently places adversarial patches in physically feasible regions of AD scenes, ensuring strong real-world applicability and deployability.

AAAI Conference 2026 Conference Paper

QiMeng-CRUX: Narrowing the Gap Between Natural Language and Verilog via Core Refined Understanding eXpression

  • Lei Huang
  • Rui Zhang
  • Jiaming Guo
  • Yang Zhang
  • Di Huang
  • Shuyao Cheng
  • Pengwei Jin
  • Chongxiao Li

Large language models (LLMs) have shown promising capabilities in hardware description language (HDL) generation. However, existing approaches often rely on free-form natural language descriptions that are often ambiguous, redundant, and unstructured, which poses significant challenges for downstream Verilog code generation. We treat hardware code generation as a complex transformation from an open-ended natural language space to a domain-specific, highly constrained target space. To bridge this gap, we introduce Core Refined Understanding eXpression (CRUX), a structured intermediate space that captures the essential semantics of user intent while organizing the expression for precise Verilog code generation. We further design a two-stage training framework, comprising Joint Expression Modeling and Dual-Space Optimization, to enhance the quality of both CRUX and Verilog code. Experiments across multiple Verilog generation benchmarks demonstrate that our model, QiMeng-CRUX, achieves state-of-the-art performance among general models, particularly under challenging design tasks. Furthermore, the CRUX space proves transferable and beneficial when used as input prompts for other code models, highlighting its effectiveness in narrowing the gap between free-form natural language descriptions and precise Verilog generation.

AAAI Conference 2026 Conference Paper

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

  • Xinguo Zhu
  • Shaohui Peng
  • Jiaming Guo
  • Yunji Chen
  • Qi Guo
  • Yuanbo Wen
  • Hang Qin
  • Ruizhi Chen

Developing high-performance GPU kernels is critical for AI and scientific computing, but remains challenging due to its reliance on expert crafting and poor portability. While large language models (LLMs) offer promise for automation, both general-purpose and finetuned LLMs suffer from two fundamental and conflicting limitations: correctness and efficiency. The key reason is that existing LLM-based approaches directly generate the entire optimized low-level programs, requiring exploration of an extremely vast space encompassing both optimization policies and implementation codes. To address the challenge of exploring an intractable space, we propose Macro Thinking Micro Coding (MTMC), a hierarchical framework inspired by the staged optimization strategy of human experts. It decouples optimization strategy from implementation details, ensuring efficiency through high-level strategy and correctness through low-level implementation. Specifically, Macro Thinking employs reinforcement learning to guide lightweight LLMs in efficiently exploring and learning semantic optimization strategies that maximize hardware utilization. Micro Coding leverages general-purpose LLMs to incrementally implement the stepwise optimization proposals from Macro Thinking, avoiding full-kernel generation errors. Together, they effectively navigate the vast optimization space and intricate implementation details, enabling LLMs for high-performance GPU kernel generation. Comprehensive results on widely adopted benchmarks demonstrate the superior performance of MTMC on GPU kernel generation in both accuracy and running time. On KernelBench, MTMC achieves near 100% and 70% accuracy at Levels 1-2 and 3, over 50% than SOTA general-purpose and domain-finetuned LLMs, with up to 7.3× speedup over LLMs, and 2.2× over expert-optimized PyTorch Eager kernels. On the more challenging TritonBench, MTMC attains up to 59.64% accuracy and 34× speedup. All models and datasets will be made publicly available.

AAAI Conference 2026 Conference Paper

Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

  • Deyang Kong
  • Qi Guo
  • Xiangyu Xi
  • Wei Wang
  • Jingang Wang
  • Xunliang Cai
  • Shikun Zhang
  • Wei Ye

The low sampling efficiency during the rollout phase poses a significant challenge to scaling reinforcement learning for large language model reasoning. Existing methods attempt to improve efficiency by scheduling problems based on problem difficulties. However, these approaches suffer from unstable and biased estimations of problem difficulty and fail to capture the alignment between model competence and problem difficulty in RL training, leading to suboptimal results. To address these challenges, we introduce Competence-Difficulty Alignment Sampling (CDAS). This approach allows for accurate and stable estimation of problem difficulties by aggregating historical performance discrepancies across problems. Subsequently, model competence is quantified to adaptively select problems whose difficulties align with the model's current competence using a fixed-point system. Extensive experiments in mathematical RL training show that CDAS consistently outperforms strong baselines, achieving the highest average accuracy of 45.89%. Furthermore, CDAS reduces the training step time overhead by 57.06% compared to the widely-used Dynamic Sampling strategy, verifying the efficiency of CDAS. Additional experiments on different tasks, model architectures, and model sizes demonstrate the generalization capability of CDAS.

IJCAI Conference 2025 Conference Paper

Automated Superscalar Processor Design by Learning Data Dependencies

  • Shuyao Cheng
  • Rui Zhang
  • Wenkai He
  • Pengwei Jin
  • Chongxiao Li
  • Zidong Du
  • Xing Hu
  • Yifan Hao

Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single-cycle processors that execute one instruction per cycle, their performance cannot compete with modern superscalar processors that execute multiple instructions per cycle. Previous methods fail on superscalar processor design because they cannot address inter-instruction data dependencies, leading to inefficient sequential instruction execution. This paper proposes a novel approach to automatically designing superscalar processors using a hardware-friendly model called the Stateful Binary Speculation Diagram (State-BSD). We observe that processor parallelism can be enhanced through on-the-fly inter-instruction dependent data predictors, reusing the processor's internal states to learn the data dependency. To meet the challenge of both hardware-resource limitation and design functional correctness, State-BSD consists of two components: 1) a lightweight state-selector trained by simulated annealing method to detect the most reusable processor states and store them in a small buffer; and 2) a highly precise state-speculator trained by BSD expansion method to predict the inter-instruction dependent data using the selected states. It is the first work to achieve the automated superscalar processor design, i. e. QiMeng-CPU-v2, which improves the performance by about 380x than the state-of-the-art automated design and is comparable to human-designed superscalar processors such as ARM Cortex A53.

ICLR Conference 2025 Conference Paper

Causal Effect Estimation with Mixed Latent Confounders and Post-treatment Variables

  • Yaochen Zhu
  • Jing Ma 0002
  • Liang Wu 0006
  • Qi Guo
  • Liangjie Hong
  • Jundong Li

Causal inference from observational data has attracted considerable attention among researchers. One main obstacle is the handling of confounders. As direct measurement of confounders may not be feasible, recent methods seek to address the confounding bias via proxy variables, i.e., covariates postulated to be conducive to the inference of latent confounders. However, the selected proxies may scramble both confounders and post-treatment variables in practice, which risks biasing the estimation by controlling for variables affected by the treatment. In this paper, we systematically investigate the bias due to latent post-treatment variables, i.e., latent post-treatment bias, in causal effect estimation. Specifically, we first derive the bias when selected proxies scramble both latent confounders and post-treatment variables, which we demonstrate can be arbitrarily bad. We then propose a Confounder-identifiable VAE (CiVAE) to address the bias. Based on a mild assumption that the prior of latent variables that generate the proxy belongs to a general exponential family with at least one invertible sufficient statistic in the factorized part, CiVAE individually identifies latent confounders and latent post-treatment variables up to bijective transformations. We then prove that with individual identification, the intractable disentanglement problem of latent confounders and post-treatment variables can be transformed into a tractable independence test problem despite arbitrary dependence may exist among them. Finally, we prove that the true causal effects can be unbiasedly estimated with transformed confounders inferred by CiVAE. Experiments on both simulated and real-world datasets demonstrate significantly improved robustness of CiVAE.

EAAI Journal 2025 Journal Article

Inspection of cracking in stamping parts surfaces using anomaly detection

  • Xingjun Dong
  • Changsheng Zhang
  • Dawei Wang
  • Qi Guo
  • Xinrui Deng
  • Chenyu Li

Stamping parts are critical components of automobiles, and cracking represents the most serious quality issue in these parts. To effectively address the challenges of delay and low efficiency inherent in manual visual inspections, this article proposes an automated cracking detection framework. Owing to the difficulty in collecting cracking data in actual production, this research proposes a network named local and global self-supervision (LGSS) achieves cracking detection using normal data for model training and is utilized within the designed framework. The proposed LGSS network leverages collected normal samples of stamping parts to self-supervise the pre-trained model for fine-tuning, thereby enabling the model to extract features more effectively. The multivariate Gaussian distribution is employed to calculate the feature distribution of each pixel for anomaly detection (AD), addressing the issue of excessive exposure on stamping part surfaces being misidentified as defects. AD is conducted through both global and local branches, balancing hardware resource utilization while enhancing feature extraction and abnormal score calculation. The LGSS network achieved the score of area under the receiver operating characteristic (AUROC) curve of 100. 0% for detection and 99. 3% for localization in the actual cracking dataset. It can process stamping part images of sizes up to 1792 × 448 within 3 s using limited hardware resources. Experimental results demonstrate that the proposed framework can detect surface cracking on stamping parts both timely and accurately. Furthermore, the versatility of the proposed algorithm for defect detection in other industrial products was evaluated using the MVTec AD and BeanTech AD (BTAD) datasets.

AAAI Conference 2025 Conference Paper

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

  • Yutong Wu
  • Di Huang
  • Wenxuan Shi
  • Wei Wang
  • Yewen Pu
  • Lingzhe Gao
  • Shihao Liu
  • Ziyuan Nan

Recent advancements in open-source code large language models (LLMs) have been driven by fine-tuning on the data generated from powerful closed-source LLMs, which are expensive to obtain. This paper explores whether it is possible to use a fine-tuned open-source model to generate additional data to augment its instruction-tuning dataset. We make two observations: (1) A code snippet can serve as the response to different instructions. (2) Instruction-tuned code LLMs perform better at translating code into instructions than the reverse. Based on these observations, we propose Inverse-Instruct, a data augmentation technique that uses a fine-tuned LLM to generate additional instructions of code responses from its own training dataset. The additional instruction-response pairs are added to the original dataset, and a stronger code LLM can be obtained by fine-tuning on the augmented dataset. We empirically validate Inverse-Instruct on a range of open-source code models (e.g. CodeLlama-Python and DeepSeek-Coder) and benchmarks (e.g., HumanEval(+), MBPP(+), DS-1000 and MultiPL-E), showing it consistently improves the base models.

NeurIPS Conference 2025 Conference Paper

MigGPT: Harnessing Large Language Models for Automated Migration of Out-of-Tree Linux Kernel Patches Across Versions

  • Pucheng Dang
  • Di Huang
  • Dong Li
  • Kang Chen
  • Yuanbo Wen
  • Qi Guo
  • Xing Hu

Out-of-tree kernel patches are essential for adapting the Linux kernel to new hardware or enabling specific functionalities. Maintaining and updating these patches across different kernel versions demands significant effort from experienced engineers. Large language models (LLMs) have shown remarkable progress across various domains, suggesting their potential for automating out-of-tree kernel patch migration. However, our findings reveal that LLMs, while promising, struggle with incomplete code context understanding and inaccurate migration point identification. In this work, we propose MigGPT, a framework that employs a novel code fingerprint structure to retain code snippet information and incorporates three meticulously designed modules to improve the migration accuracy and efficiency of out-of-tree kernel patches. Furthermore, we establish a robust benchmark using real-world out-of-tree kernel patch projects to evaluate LLM capabilities. Evaluations show that MigGPT significantly outperforms the direct application of vanilla LLMs, achieving an average completion rate of 72. 59\% ($\uparrow 50. 74\%$) for migration tasks.

NeurIPS Conference 2025 Conference Paper

QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation

  • Yaoyu Zhu
  • Di Huang
  • Hanqi Lyu
  • Xiaoyun Zhang
  • Chongxiao Li
  • Wenxuan Shi
  • Yutong Wu
  • Jianan Mu

Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification, such as software programming and mathematical problems. Extending RLVR to electronic design automation (EDA), especially automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) specifications, however, poses three key challenges: the lack of automated and accurate verification environments, the scarcity of high-quality NL-code pairs, and the prohibitive computation cost of RLVR. To this end, we introduce CodeV-R1, an RLVR framework for training Verilog generation LLMs. First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM-generated NL descriptions, verifies code–NL–code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage "distill-then-RL" training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate. The resulting model, CodeV-R1-7B, achieves 68. 6 \% and 72. 9 \% pass@1 on VerilogEval v2 and RTLLM v1. 1, respectively, surpassing prior state-of-the-art by 12$\sim$20 \%, while even exceeding the performance of 671B DeepSeek-R1 on RTLLM. We have released our model, training code, and dataset to facilitate research in EDA and LLM communities.

AAAI Conference 2025 Conference Paper

QiMeng-GEMM: Automatically Generating High-Performance Matrix Multiplication Code by Exploiting Large Language Models

  • Qirui Zhou
  • Yuanbo Wen
  • Ruizhi Chen
  • Ke Gao
  • Weiqiang Xiong
  • Ling Li
  • Qi Guo
  • Yanjun Wu

As a crucial operator in numerous scientific and engineering computing applications, the automatic optimization of General Matrix Multiplication (GEMM) with full utilization of ever-evolving hardware architectures (e.g. GPUs and RISC-V) is of paramount importance. While Large Language Models (LLMs) can generate functionally correct code for simple tasks, they have yet to produce high-performance code. The key challenge resides in deeply understanding diverse hardware architectures and crafting prompts that effectively unleash the potential of LLMs to generate high-performance code. In this paper, we propose a novel prompt mechanism called QiMeng-GEMM which enables LLMs to comprehend the architectural characteristics of different hardware platforms and automatically search for the optimization combinations for GEMM. The key of QiMeng-GEMM is a set of informative, adaptive, and iterative meta-prompts. Based on this, a searching strategy for optimal combinations of meta-prompts is used to iteratively generate high-performance code. Extensive experiments conducted on 4 leading LLMs, various paradigmatic hardware platforms, and representative matrix dimensions unequivocally demonstrate QiMeng-GEMM’s superior performance in auto-generating optimized GEMM code. Compared to vanilla prompts, our method achieves a performance enhancement of up to 113×. Even when compared to human experts, our method can reach 115% of cuBLAS on NVIDIA GPUs and 211% of OpenBLAS on RISC-V CPUs. Notably, while human experts often take months to optimize GEMM, our approach reduces the development cost by over 240×.

NeurIPS Conference 2025 Conference Paper

QiMeng-MuPa: Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

  • Changxin Ke
  • Rui Zhang
  • Shuo Wang
  • Li Ding
  • Guangli Li
  • Yuanbo Wen
  • Shuoming Zhang
  • Ruiyuan Xu

The rise of GPU-based high-performance computing (HPC) has driven the widespread adoption of parallel programming models such as CUDA. Yet, the inherent complexity of parallel programming creates a demand for the automated sequential-to-parallel approaches. However, data scarcity poses a significant challenge for machine learning-based sequential-to-parallel code translation. Although recent back-translation methods show promise, they still fail to ensure functional equivalence in the translated code. In this paper, we propose \textbf{QiMeng-MuPa}, a novel \textbf{Mu}tual-Supervised Learning framework for Sequential-to-\textbf{Pa}rallel code translation, to address the functional equivalence issue. QiMeng-MuPa consists of two models, a Translator and a Tester. Through an iterative loop consisting of Co-verify and Co-evolve steps, the Translator and the Tester mutually generate data for each other and improve collectively. The Tester generates unit tests to verify and filter functionally equivalent translated code, thereby evolving the Translator, while the Translator generates translated code as augmented input to evolve the Tester. Experimental results demonstrate that QiMeng-MuPa significantly enhances the performance of the base models: when applied to Qwen2. 5-Coder, it not only improves Pass@1 by up to 28. 91\% and boosts Tester performance by 68. 90\%, but also outperforms the previous state-of-the-art method CodeRosetta by 1. 56 and 6. 92 in BLEU and CodeBLEU scores, while achieving performance comparable to DeepSeek-R1 and GPT-4. 1. Our code is available at \url{https: //github. com/kcxain/mupa}.

NeurIPS Conference 2025 Conference Paper

QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code

  • Hainan Fang
  • Yuanbo Wen
  • Jun Bi
  • Yihan Wang
  • Tonghui He
  • Yanlin Tang
  • Di Huang
  • Jiaming Guo

Compilers, while essential, are notoriously complex systems that demand prohibitively expensive human expertise to develop and maintain. The recent advancements in Large Language Models (LLMs) offer a compelling new paradigm: Neural Compilation, which could potentially simplify compiler development for new architectures and facilitate the discovery of innovative optimization techniques. However, several critical obstacles impede its practical adoption. Firstly, a significant lack of dedicated benchmarks and robust evaluation methodologies hinders objective assessment and tracking of progress in the field. Secondly, systematically enhancing the reliability and performance of LLM-generated assembly remains a critical challenge. Addressing these challenges, this paper introduces NeuComBack, a novel benchmark dataset specifically designed for IR-to-assembly compilation. Leveraging this dataset, we first define a foundational Neural Compilation workflow and conduct a comprehensive evaluation of the capabilities of recent frontier LLMs on Neural Compilation, establishing new performance baselines. We further propose a self-evolving prompt optimization method that enables LLMs to iteratively evolve their internal prompt strategies by extracting insights from prior self-debugging traces, thereby enhancing their neural compilation capabilities. Experiments demonstrate that our method significantly improves both the functional correctness and the performance of LLM-generated assembly code. Compared to baseline prompts, the functional correctness rates improved from 44% to 64% on x86 64 and from 36% to 58% on aarch64, respectively. More significantly, among the 16 correctly generated x86 64 programs using our method, 14 (87. 5%) surpassed clang-O3 performance. These consistent improvements across diverse architectures (x86_64 and aarch64) and program distributions (NeuComBack L1 and L2) validate our method's superiority over conventional approaches and its potential for broader adoption in low-level neural compilation.

NeurIPS Conference 2025 Conference Paper

QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation

  • Yang Zhang
  • Rui Zhang
  • Jiaming Guo
  • Huang Lei
  • Di Huang
  • Yunpu Zhao
  • Shuyao Cheng
  • Pengwei Jin

The remarkable progress of Large Language Models (LLMs) presents promising opportunities for Verilog code generation which is significantly important for automated circuit design. The lacking of meaningful functional rewards hinders the preference optimization based on Reinforcement Learning (RL) for producing functionally correct Verilog code. In this paper, we propose Signal-Aware Learning for Verilog code generation (QiMeng-SALV) by leveraging code segments of functionally correct output signal to optimize RL training. Considering Verilog code specifies the structural interconnection of hardware gates and wires so that different output signals are independent, the key insight of QiMeng-SALV is to extract verified signal-aware implementations in partially incorrect modules, so as to enhance the extraction of meaningful functional rewards. Roughly, we verify the functional correctness of signals in generated module by comparing with that of reference module in the training data. Then abstract syntax tree (AST) is employed to identify signal-aware code segments which can provide meaningful functional rewards from erroneous modules. Finally, we introduce signal-aware DPO which is optimized on the correct signal-level code segments, thereby preventing noise and interference from incorrect signals. The proposed QiMeng-SALV underscores the paradigm shift from conventional module-level to fine-grained signal-level optimization in Verilog code generation, addressing the issue of insufficient functional rewards. Experiments demonstrate that our method achieves state-of-the-art performance on VerilogEval and RTLLM, with a 7B parameter model matching the performance of the DeepSeek v3 671B model and significantly outperforming the leading open-source model CodeV trained on the same dataset.

IJCAI Conference 2025 Conference Paper

QiMeng-TensorOp: One-Line Prompt is Enough for High-Performance Tensor Operator Generation with Hardware Primitives

  • Xuzhi Zhang
  • Shaohui Peng
  • Qirui Zhou
  • Yuanbo Wen
  • Qi Guo
  • Ruizhi Chen
  • Xinguo Zhu
  • Weiqiang Xiong

Computation-intensive tensor operators constitute over 90% of the computations in Large Language Models (LLMs) and Deep Neural Networks. Automatically and efficiently generating high-performance tensor operators with hardware primitives is crucial for diverse and ever-evolving hardware architectures like RISC-V, ARM, and GPUs, as manually optimized implementation takes at least months and lacks portability. LLMs excel at generating high-level language codes, but they struggle to fully comprehend hardware characteristics and produce high-performance tensor operators. We introduce a tensor-operator auto-generation framework with a one-line user prompt (QiMeng-TensorOp), which enables LLMs to automatically exploit hardware characteristics to generate tensor operators with hardware primitives, and tune parameters for optimal performance across diverse hardware. Experimental results on various hardware platforms, SOTA LLMs, and typical tensor operators demonstrate that QiMeng-TensorOp effectively unleashes the computing capability of various hardware platforms, and automatically generates tensor operators of superior performance. Compared with vanilla LLMs, QiMeng-TensorOp achieves up to 1291× performance improvement. Even compared with human experts, QiMeng-TensorOp could reach 251% of OpenBLAS on RISC-V CPUs, and 124% of cuBLAS on NVIDIA GPUs. Additionally, QiMeng-TensorOp also significantly reduces development costs by 200× compared with human experts.

NeurIPS Conference 2025 Conference Paper

SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens

  • Yinhan He
  • Wendy Zheng
  • Yaochen Zhu
  • Zaiyi Zheng
  • Lin Su
  • Sriram Vasudevan
  • Qi Guo
  • Liangjie Hong

Chain-of-Thought (CoT) enhances the performance of Large Language Models (LLMs) on reasoning tasks by encouraging step-by-step solutions. However, the verbosity of CoT reasoning hinders its mass deployment in efficiency-critical applications. Recently, implicit CoT approaches have emerged, which encode reasoning steps within LLM's hidden embeddings (termed ``implicit reasoning'') rather than explicit tokens. This approach accelerates CoT reasoning by reducing the reasoning length and bypassing some LLM components. However, existing implicit CoT methods face two significant challenges: (1) they fail to preserve the semantic alignment between the implicit reasoning (when transformed to natural language) and the ground-truth reasoning, resulting in a significant CoT performance degradation, and (2) they focus on reducing the length of the implicit reasoning; however, they neglect the considerable time cost for an LLM to generate one individual implicit reasoning token. To tackle these challenges, we propose a novel semantically-aligned implicit CoT framework termed SemCoT. In particular, for the first challenge, we design a contrastively trained sentence transformer that evaluates semantic alignment between implicit and explicit reasoning, which is used to enforce semantic preservation during implicit reasoning optimization. To address the second challenge, we introduce an efficient implicit reasoning generator by finetuning a lightweight language model using knowledge distillation. This generator is guided by our sentence transformer to distill ground-truth reasoning into semantically aligned implicit reasoning, while also optimizing for accuracy. SemCoT is the first approach that enhances CoT efficiency by jointly optimizing token-level generation speed and preserving semantic alignment with ground-truth reasoning. Extensive experiments demonstrate the superior performance of SemCoT compared to state-of-the-art methods in both efficiency and effectiveness. Our code can be found at https: //github. com/YinhanHe123/SemCoT/.

IJCAI Conference 2024 Conference Paper

Automated CPU Design by Learning from Input-Output Examples

  • Shuyao Cheng
  • Pengwei Jin
  • Qi Guo
  • Zidong Du
  • Rui Zhang
  • Xing Hu
  • Yongwei Zhao
  • Yifan Hao

Designing a central processing unit (CPU) requires intensive manual work of talented experts to implement the circuit logic from design specifications. Although considerable progress has been made in electronic design automation (EDA) to relieve human efforts, all existing EDA tools require hand-crafted formal program codes (e. g. , Verilog, Chisel, or C) as the input. To automate the CPU design without human programming, we are motivated to learn the CPU design from only input-output (IO) examples. The key challenge is that the learned CPU design should have almost zero tolerance for inaccuracy, which makes well-known approximate algorithms such as neural networks ineffective. We propose a new AI approach to generate the CPU design in the form of a large-scale Boolean function, from only external IO examples instead of formal program code. This approach employs a novel graph structure called Binary Speculative Diagram (BSD) to approximate the CPU-scale Boolean function accurately. We propose an efficient BSD expansion method based on Boolean Distance, a new metric to quantitatively measure the structural similarity between Boolean functions, gradually increasing the design accuracy up to 100%. Our approach generates an industrial-scale RISC-V CPU design within 5 hours, reducing the design cycle by about 1000x without human involvement. The taped-out chip, Enlightenment-1, the world's first CPU designed by AI, successfully runs the Linux operating system and performs comparably against the human-design Intel 80486SX CPU. Our approach even autonomously discovers human knowledge of the von Neumann architecture.

NeurIPS Conference 2024 Conference Paper

AutoSurvey: Large Language Models Can Automatically Write Surveys

  • Yidong Wang
  • Qi Guo
  • Wenjin Yao
  • Hongbo Zhang
  • Xin Zhang
  • Zhen Wu
  • Meishan Zhang
  • Xinyu Dai

This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due to the vast volume and complexity of information, prompting the need for efficient survey methods. While large language models (LLMs) offer promise in automating this process, challenges such as context window limitations, parametric knowledge constraints, and the lack of evaluation benchmarks remain. AutoSurvey addresses these challenges through a systematic approach that involves initial retrieval and outline generation, subsection drafting by specialized LLMs, integration and refinement, and rigorous evaluation and iteration. Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.

NeurIPS Conference 2024 Conference Paper

ColJailBreak: Collaborative Generation and Editing for Jailbreaking Text-to-Image Deep Generation

  • Yizhuo Ma
  • Shanmin Pang
  • Qi Guo
  • Tianyu Wei
  • Qing Guo

The commercial text-to-image deep generation models (e. g. DALL·E) can produce high-quality images based on input language descriptions. These models incorporate a black-box safety filter to prevent the generation of unsafe or unethical content, such as violent, criminal, or hateful imagery. Recent jailbreaking methods generate adversarial prompts capable of bypassing safety filters and producing unsafe content, exposing vulnerabilities in influential commercial models. However, once these adversarial prompts are identified, the safety filter can be updated to prevent the generation of unsafe images. In this work, we propose an effective, simple, and difficult-to-detect jailbreaking solution: generating safe content initially with normal text prompts and then editing the generations to embed unsafe content. The intuition behind this idea is that the deep generation model cannot reject safe generation with normal text prompts, while the editing models focus on modifying the local regions of images and do not involve a safety strategy. However, implementing such a solution is non-trivial, and we need to overcome several challenges: how to automatically confirm the normal prompt to replace the unsafe prompts, and how to effectively perform editable replacement and naturally generate unsafe content. In this work, we propose the collaborative generation and editing for jailbreaking text-to-image deep generation (ColJailBreak), which comprises three key components: adaptive normal safe substitution, inpainting-driven injection of unsafe content, and contrastive language-image-guided collaborative optimization. We validate our method on three datasets and compare it to two baseline methods. Our method could generate unsafe content through two commercial deep generation models including GPT-4 and DALL·E 2.

AAAI Conference 2024 Conference Paper

Emergent Communication for Numerical Concepts Generalization

  • Enshuai Zhou
  • Yifan Hao
  • Rui Zhang
  • Yuxuan Guo
  • Zidong Du
  • Xishan Zhang
  • Xinkai Song
  • Chao Wang

Research on emergent communication has recently gained significant traction as a promising avenue for the linguistic community to unravel human language's origins and explore artificial intelligence's generalization capabilities. Current research has predominantly concentrated on recognizing qualitative patterns of object attributes(e.g., shape and color) and paid little attention to the quantitative relationship among object quantities which is known as the part of numerical concepts. The ability to generalize numerical concepts, i.e., counting and calculations with unseen quantities, is essential, as it mirrors humans' foundational abstract reasoning abilities. In this work, we introduce the NumGame, leveraging the referential game framework, forcing agents to communicate and generalize the numerical concepts effectively. Inspired by the human learning process of numbers, we present a two-stage training approach that sequentially fosters a rudimentary numerical sense followed by the ability of arithmetic calculation, ultimately aiding agents in generating semantically stable and unambiguous language for numerical concepts. The experimental results indicate the impressive generalization capabilities to unseen quantities and regularity of the language emergence from communication.

EAAI Journal 2024 Journal Article

Enhancing accuracy, diversity, and random input compatibility in face attribute manipulation

  • Qi Guo
  • Xiaodong Gu

Recent advancements in semantic face attribute manipulation have marked significant progress, yet challenges persist regarding flexible manipulation while retaining high-accuracy reconstruction, especially given the limitations of fixed angles and layout in input facial images. To address these limitations, this paper introduces the Accurate Results, Diverse Options, and Random Input Face Attribute Manipulation Model (ADR-FACEM), a novel text-guided approach designed for nuanced and disentangled manipulation of facial attributes. This method stands out for its adaptability in attribute selection, offering a unique blend of flexibility and randomness. At the core of our proposed model lies the innovative Latent Direction Model (LDM), which leverages an adaptive nonlinear transformation trajectory. This model adeptly processes face latent codes, enabling precise manipulation of targeted attributes while preserving other facial features, all conditioned on textual descriptions. Complementing this, the Feature Distortion Alignment Model (FDAM) is intricately designed to rectify feature distortions within the image features space, thereby significantly enhancing the reconstruction quality of non-frontal images. Through comprehensive experiments, including the accuracy of facial attribute manipulation, the diversity of facial attribute manipulation options, and the inclusiveness of random unbiased input, our model ADR-FACEM demonstrates outstanding ability to maintain complex details of facial images. Quantitative comparison and qualitative analysis of nine indicators further reinforce the superiority of our method, highlighting its excellent performance in providing a wider range of choices and improvements and its compatibility with random input in the field of facial attribute manipulation.

AAAI Conference 2024 Conference Paper

Hypothesis, Verification, and Induction: Grounding Large Language Models with Self-Driven Skill Learning

  • Shaohui Peng
  • Xing Hu
  • Qi Yi
  • Rui Zhang
  • Jiaming Guo
  • Di Huang
  • Zikang Tian
  • Ruizhi Chen

Large language models (LLMs) show their powerful automatic reasoning and planning capability with a wealth of semantic knowledge about the human world. However, the grounding problem still hinders the applications of LLMs in the real-world environment. Existing studies try to fine-tune the LLM or utilize pre-defined behavior APIs to bridge the LLMs and the environment, which not only costs huge human efforts to customize for every single task but also weakens the generality strengths of LLMs. To autonomously ground the LLM onto the environment, we proposed the Hypothesis, Verification, and Induction (HYVIN) framework to automatically and progressively ground the LLM with self-driven skill learning. HYVIN first employs the LLM to propose the hypothesis of sub-goals to achieve tasks and then verify the feasibility of the hypothesis via interacting with the underlying environment. Once verified, HYVIN can then learn generalized skills with the guidance of these successfully grounded subgoals. These skills can be further utilized to accomplish more complex tasks that fail to pass the verification phase. Verified in the famous instruction following task set, BabyAI, HYVIN achieves comparable performance in the most challenging tasks compared with imitation learning methods that cost millions of demonstrations, proving the effectiveness of learned skills and showing the feasibility and efficiency of our framework.

NeurIPS Conference 2023 Conference Paper

ANPL: Towards Natural Programming with Interactive Decomposition

  • Di Huang
  • Ziyuan Nan
  • Xing Hu
  • Pengwei Jin
  • Shaohui Peng
  • Yuanbo Wen
  • Rui Zhang
  • Zidong Du

Though LLMs are capable of generating plausible programs, it’s challenging to interact with the LLMs further to revise the program, especially if the user’s specific requirements are different from the initial proposal. In this paper, we introduce ANPL, an interactive programming system that ensures users can always refine the generated code towards their specific programmatic intents via structureddecompositions. Borrowing the paradigm of sketching from program synthesis, an ANPL program consists of a set of input-outputs that it must satisfy, a “sketch” — control/data flow expressed in precise code (e. g. Python), and “holes” — sub-modules to be implemented by the LLM specified with natural language. The user revises an ANPL program by either modifying the sketch, changing the language used to describe the holes, or providing additional input-outputs to a particular hole, turning it into a sub-ANPL program that can be solved recursively. This workflow allows the users to offload programming burdens to the LLM as much as possible while retaining the ability to pinpoint and resolve bugs locally, without exposing the rest of the program to the LLM. We deploy ANPL on the Abstraction and Reasoning Corpus (ARC), a set of unique tasks that are challenging for state-of-the-art AI systems, showing it outperforms baseline programming systems that (a) without the ability to decompose tasks interactively and (b) without the guarantee that the modules can be correctly composed together. Additional evaluations on APPS, HumanEval, and real-world programming tasks have validated that the ANPL framework is applicable to multiple programming domains. We release the ANPL solutions to the ARC tasks as a dataset, providing insights into how humans decompose novel tasks programmatically.

AAAI Conference 2023 Conference Paper

Conceptual Reinforcement Learning for Language-Conditioned Tasks

  • Shaohui Peng
  • Xing Hu
  • Rui Zhang
  • Jiaming Guo
  • Qi Yi
  • Ruizhi Chen
  • Zidong Du
  • Ling Li

Despite the broad application of deep reinforcement learning (RL), transferring and adapting the policy to unseen but similar environments is still a significant challenge. Recently, the language-conditioned policy is proposed to facilitate policy transfer through learning the joint representation of observation and text that catches the compact and invariant information across various environments. Existing studies of language-conditioned RL methods often learn the joint representation as a simple latent layer for the given instances (episode-specific observation and text), which inevitably includes noisy or irrelevant information and cause spurious correlations that are dependent on instances, thus hurting generalization performance and training efficiency. To address the above issue, we propose a conceptual reinforcement learning (CRL) framework to learn the concept-like joint representation for language-conditioned policy. The key insight is that concepts are compact and invariant representations in human cognition through extracting similarities from numerous instances in real-world. In CRL, we propose a multi-level attention encoder and two mutual information constraints for learning compact and invariant concepts. Verified in two challenging environments, RTFM and Messenger, CRL significantly improves the training efficiency (up to 70%) and generalization ability (up to 30%) to the new environment dynamics.

NeurIPS Conference 2023 Conference Paper

Context Shift Reduction for Offline Meta-Reinforcement Learning

  • Yunkai Gao
  • Rui Zhang
  • Jiaming Guo
  • Fan Wu
  • Qi Yi
  • Shaohui Peng
  • Siming Lan
  • Ruizhi Chen

Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and further deteriorates the generalization ability of the meta-policy. Existing OMRL methods either overlook this problem or attempt to mitigate it with additional information. In this paper, we propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets. The key insight of CSRO is to minimize the influence of policy in context during both the meta-training and meta-test phases. During meta-training, we design a max-min mutual information representation learning mechanism to diminish the impact of the behavior policy on task representation. In the meta-test phase, we introduce the non-prior context collection strategy to reduce the effect of the exploration policy. Experimental results demonstrate that CSRO significantly reduces the context shift and improves the generalization ability, surpassing previous methods across various challenging domains.

NeurIPS Conference 2023 Conference Paper

Decompose a Task into Generalizable Subtasks in Multi-Agent Reinforcement Learning

  • Zikang Tian
  • Ruizhi Chen
  • Xing Hu
  • Ling Li
  • Rui Zhang
  • Fan Wu
  • Shaohui Peng
  • Jiaming Guo

In recent years, Multi-Agent Reinforcement Learning (MARL) techniques have made significant strides in achieving high asymptotic performance in single task. However, there has been limited exploration of model transferability across tasks. Training a model from scratch for each task can be time-consuming and expensive, especially for large-scale Multi-Agent Systems. Therefore, it is crucial to develop methods for generalizing the model across tasks. Considering that there exist task-independent subtasks across MARL tasks, a model that can decompose such subtasks from the source task could generalize to target tasks. However, ensuring true task-independence of subtasks poses a challenge. In this paper, we propose to \textbf{d}ecompose a \textbf{t}ask in\textbf{to} a series of \textbf{g}eneralizable \textbf{s}ubtasks (DT2GS), a novel framework that addresses this challenge by utilizing a scalable subtask encoder and an adaptive subtask semantic module. We show that these components endow subtasks with two properties critical for task-independence: avoiding overfitting to the source task and maintaining consistent yet scalable semantics across tasks. Empirical results demonstrate that DT2GS possesses sound zero-shot generalization capability across tasks, exhibits sufficient transferability, and outperforms existing methods in both multi-task and single-task problems.

NeurIPS Conference 2023 Conference Paper

Efficient Symbolic Policy Learning with Differentiable Symbolic Expression

  • Jiaming Guo
  • Rui Zhang
  • Shaohui Peng
  • Qi Yi
  • Xing Hu
  • Ruizhi Chen
  • Zidong Du
  • Xishan Zhang

Deep reinforcement learning (DRL) has led to a wide range of advances in sequential decision-making tasks. However, the complexity of neural network policies makes it difficult to understand and deploy with limited computational resources. Currently, employing compact symbolic expressions as symbolic policies is a promising strategy to obtain simple and interpretable policies. Previous symbolic policy methods usually involve complex training processes and pre-trained neural network policies, which are inefficient and limit the application of symbolic policies. In this paper, we propose an efficient gradient-based learning method named Efficient Symbolic Policy Learning (ESPL) that learns the symbolic policy from scratch in an end-to-end way. We introduce a symbolic network as the search space and employ a path selector to find the compact symbolic policy. By doing so we represent the policy with a differentiable symbolic expression and train it in an off-policy manner which further improves the efficiency. In addition, in contrast with previous symbolic policies which only work in single-task RL because of complexity, we expand ESPL on meta-RL to generate symbolic policies for unseen tasks. Experimentally, we show that our approach generates symbolic policies with higher performance and greatly improves data efficiency for single-task RL. In meta-RL, we demonstrate that compared with neural network policies the proposed symbolic policy achieves higher performance and efficiency and shows the potential to be interpretable.

NeurIPS Conference 2023 Conference Paper

Emergent Communication for Rules Reasoning

  • Yuxuan Guo
  • Yifan Hao
  • Rui Zhang
  • Enshuai Zhou
  • Zidong Du
  • Xishan Zhang
  • Xinkai Song
  • Yuanbo Wen

Research on emergent communication between deep-learning-based agents has received extensive attention due to its inspiration for linguistics and artificial intelligence. However, previous attempts have hovered around emerging communication under perception-oriented environmental settings, that forces agents to describe low-level perceptual features intra image or symbol contexts. In this work, inspired by the classic human reasoning test (namely Raven's Progressive Matrix), we propose the Reasoning Game, a cognition-oriented environment that encourages agents to reason and communicate high-level rules, rather than perceived low-level contexts. Moreover, we propose 1) an unbiased dataset (namely rule-RAVEN) as a benchmark to avoid overfitting, 2) and a two-stage curriculum agent training method as a baseline for more stable convergence in the Reasoning Game, where contexts and semantics are bilaterally drifting. Experimental results show that, in the Reasoning Game, a semantically stable and compositional language emerges to solve reasoning problems. The emerged language helps agents apply the extracted rules to the generalization of unseen context attributes, and to the transfer between different context attributes or even tasks.

AAAI Conference 2023 Conference Paper

Online Symbolic Regression with Informative Query

  • Pengwei Jin
  • Di Huang
  • Rui Zhang
  • Xing Hu
  • Ziyuan Nan
  • Zidong Du
  • Qi Guo
  • Yunji Chen

Symbolic regression, the task of extracting mathematical expressions from the observed data, plays a crucial role in scientific discovery. Despite the promising performance of existing methods, most of them conduct symbolic regression in an offline setting. That is, they treat the observed data points as given ones that are simply sampled from uniform distributions without exploring the expressive potential of data. However, for real-world scientific problems, the data used for symbolic regression are usually actively obtained by doing experiments, which is an online setting. Thus, how to obtain informative data that can facilitate the symbolic regression process is an important problem that remains challenging. In this paper, we propose QUOSR, a query-based framework for online symbolic regression that can automatically obtain informative data in an iterative manner. Specifically, at each step, QUOSR receives historical data points, generates new x, and then queries the symbolic expression to get the corresponding y, where the (x, y) serves as new data points. This process repeats until the maximum number of query steps is reached. To make the generated data points informative, we implement the framework with a neural network and train it by maximizing the mutual information between generated data points and the target expression. Through comprehensive experiments, we show that QUOSR can facilitate modern symbolic regression methods by generating informative data.

ICML Conference 2023 Conference Paper

Quantized Distributed Training of Large Models with Convergence Guarantees

  • Ilia Markov
  • Adrian Vladu
  • Qi Guo
  • Dan Alistarh

Communication-reduction techniques are a popular way to improve scalability in data-parallel training of deep neural networks (DNNs). The recent emergence of large language models such as GPT has created the need for new approaches to exploit data-parallelism. Among these, fully-sharded data parallel (FSDP) training is highly popular, yet it still encounters scalability bottlenecks. One reason is that applying compression techniques to FSDP is challenging: as the vast majority of the communication involves the model’s weights, direct compression alters convergence and leads to accuracy loss. We present QSDP, a variant of FSDP which supports both gradient and weight quantization with theoretical guarantees, is simple to implement and has essentially no overheads. To derive QSDP we prove that a natural modification of SGD achieves convergence even when we only maintain quantized weights, and thus the domain over which we train consists of quantized points and is, therefore, highly non-convex. We validate this approach by training GPT-family models with up to 1. 3 billion parameters on a multi-node cluster. Experiments show that QSDP preserves model accuracy, while completely removing the communication bottlenecks of FSDP, providing end-to-end speedups of up to 2. 2x.

NeurIPS Conference 2022 Conference Paper

Causality-driven Hierarchical Structure Discovery for Reinforcement Learning

  • Shaohui Peng
  • Xing Hu
  • Rui Zhang
  • Ke Tang
  • Jiaming Guo
  • Qi Yi
  • Ruizhi Chen
  • Xishan Zhang

Hierarchical reinforcement learning (HRL) has been proven to be effective for tasks with sparse rewards, for it can improve the agent's exploration efficiency by discovering high-quality hierarchical structures (e. g. , subgoals or options). However, automatically discovering high-quality hierarchical structures is still a great challenge. Previous HRL methods can only find the hierarchical structures in simple environments, as they are mainly achieved through the randomness of agent's policies during exploration. In complicated environments, such a randomness-driven exploration paradigm can hardly discover high-quality hierarchical structures because of the low exploration efficiency. In this paper, we propose CDHRL, a causality-driven hierarchical reinforcement learning framework, to build high-quality hierarchical structures efficiently in complicated environments. The key insight is that the causalities among environment variables are naturally fit for modeling reachable subgoals and their dependencies; thus, the causality is suitable to be the guidance in building high-quality hierarchical structures. Roughly, we build the hierarchy of subgoals based on causality autonomously, and utilize the subgoal-based policies to unfold further causality efficiently. Therefore, CDHRL leverages a causality-driven discovery instead of a randomness-driven exploration for high-quality hierarchical structure construction. The results in two complex environments, 2D-Minecraft and Eden, show that CDHRL can discover high-quality hierarchical structures and significantly enhance exploration efficiency.

NeurIPS Conference 2022 Conference Paper

Object-Category Aware Reinforcement Learning

  • Qi Yi
  • Rui Zhang
  • Shaohui Peng
  • Jiaming Guo
  • Xing Hu
  • Zidong Du
  • Xishan Zhang
  • Qi Guo

Object-oriented reinforcement learning (OORL) is a promising way to improve the sample efficiency and generalization ability over standard RL. Recent works that try to solve OORL tasks without additional feature engineering mainly focus on learning the object representations and then solving tasks via reasoning based on these object representations. However, none of these works tries to explicitly model the inherent similarity between different object instances of the same category. Objects of the same category should share similar functionalities; therefore, the category is the most critical property of an object. Following this insight, we propose a novel framework named Object-Category Aware Reinforcement Learning (OCARL), which utilizes the category information of objects to facilitate both perception and reasoning. OCARL consists of three parts: (1) Category-Aware Unsupervised Object Discovery (UOD), which discovers the objects as well as their corresponding categories; (2) Object-Category Aware Perception, which encodes the category information and is also robust to the incompleteness of (1) at the same time; (3) Object-Centric Modular Reasoning, which adopts multiple independent and object-category-specific networks when reasoning based on objects. Our experiments show that OCARL can improve both the sample efficiency and generalization in the OORL domain.

IJCAI Conference 2021 Conference Paper

Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment

  • Jiaming Guo
  • Rui Zhang
  • Xishan Zhang
  • Shaohui Peng
  • Qi Yi
  • Zidong Du
  • Xing Hu
  • Qi Guo

Policy gradient methods are appealing in deep reinforcement learning but suffer from high variance of gradient estimate. To reduce the variance, the state value function is applied commonly. However, the effect of the state value function becomes limited in stochastic dynamic environments, where the unexpected state dynamics and rewards will increase the variance. In this paper, we propose to replace the state value function with a novel hindsight value function, which leverages the information from the future to reduce the variance of the gradient estimate for stochastic dynamic environments. Particularly, to obtain an ideally unbiased gradient estimate, we propose an information-theoretic approach, which optimizes the embeddings of the future to be independent of previous actions. In our experiments, we apply the proposed hindsight value function in stochastic dynamic environments, including discrete-action environments and continuous-action environments. Compared with the standard state value function, the proposed hindsight value function consistently reduces the variance, stabilizes the training, and improves the eventual policy.

NeurIPS Conference 2021 Conference Paper

ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers

  • Husheng Han
  • Kaidi Xu
  • Xing Hu
  • Xiaobing Chen
  • Ling Liang
  • Zidong Du
  • Qi Guo
  • Yanzhi Wang

Adversarial patch attacks that craft the pixels in a confined region of the input images show their powerful attack effectiveness in physical environments even with noises or deformations. Existing certified defenses towards adversarial patch attacks work well on small images like MNIST and CIFAR-10 datasets, but achieve very poor certified accuracy on higher-resolution images like ImageNet. It is urgent to design both robust and effective defenses against such a practical and harmful attack in industry-level larger images. In this work, we propose the certified defense methodology that achieves high provable robustness for high-resolution images and largely improves the practicality for real adoption of the certified defense. The basic insight of our work is that the adversarial patch intends to leverage localized superficial important neurons (SIN) to manipulate the prediction results. Hence, we leverage the SIN-based DNN compression techniques to significantly improve the certified accuracy, by reducing the adversarial region searching overhead and filtering the prediction noises. Our experimental results show that the certified accuracy is increased from 36. 3% (the state-of-the-art certified detection) to 60. 4%on the ImageNet dataset, largely pushing the certified defenses for practical use.

AAAI Conference 2020 Conference Paper

DWM: A Decomposable Winograd Method for Convolution Acceleration

  • Di Huang
  • Xishan Zhang
  • Rui Zhang
  • Tian Zhi
  • Deyuan He
  • Jiaming Guo
  • Chang Liu
  • Qi Guo

Winograd’s minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as 3x3 and stride as 1, because it suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with stride larger than 1. In this paper, we propose a novel Decomposable Winograd Method (DWM), which breaks through the limitation of original Winograd’s minimal filtering algorithm to a wide and general convolutions. DWM decomposes kernels with large size or large stride to several small kernels with stride as 1 for further applying Winograd method, so that DWM can reduce the number of multiplications while keeping the numerical accuracy. It enables the fast exploring of larger kernel size and larger stride value in CNNs for high performance and accuracy and even the potential for new CNNs. Comparing against the original Winograd, the proposed DWM is able to support all kinds of convolutions with a speedup of ∼2, without affecting the numerical accuracy.

JBHI Journal 2018 Journal Article

Epileptic Seizure Classification of EEGs Using Time–Frequency Analysis Based Multiscale Radial Basis Functions

  • Yang Li
  • Xu-Dong Wang
  • Mei-Lin Luo
  • Ke Li
  • Xiao-Feng Yang
  • Qi Guo

The automatic detection of epileptic seizures from electroencephalography (EEG) signals is crucial for the localization and classification of epileptic seizure activity. However, seizure processes are typically dynamic and nonstationary, and thus, distinguishing rhythmic discharges from nonstationary processes is one of the challenging problems. In this paper, an adaptive and localized time–frequency representation in EEG signals is proposed by means of multiscale radial basis functions (MRBF) and a modified particle swarm optimization (MPSO) to improve both time and frequency resolution simultaneously, which is a novel MRBF-MPSO framework of the time–frequency feature extraction for epileptic EEG signals. The dimensionality of extracted features can be greatly reduced by the principle component analysis algorithm before the most discriminative features selected are fed into a support vector machine (SVM) classifier with the radial basis function (RBF) in order to separate epileptic seizure from seizure-free EEG signals. The classification performance of the proposed method has been evaluated by using several state-of-art feature extraction algorithms and other five different classifiers like linear discriminant analysis, and logistic regression. The experimental results indicate that the proposed MRBF-MPSO-SVM classification method outperforms competing techniques in terms of classification accuracy, and shows the effectiveness of the proposed method for classification of seizure epochs and seizure-free epochs.

IJCAI Conference 2016 Conference Paper

Questimator: Generating Knowledge Assessments for Arbitrary Topics

  • Qi Guo
  • Chinmay Kulkarni
  • Aniket Kittur
  • Jeffrey P. Bigham
  • Emma Brunskill

Formative assessments allow learners to quickly identify knowledge gaps. In traditional educational settings, expert instructors can create assessments, but in informal learning environment, it is difficult for novice learners to self assess because they don't know what they don't know. This paper introduces Questimator, an automated system that generates multiple-choice assessment questions for any topic contained within Wikipedia. Given a topic, Questimator traverses the Wikipedia graph to find and rank related topics, and uses article text to form questions, answers and distractor options. In a study with 833 participants from Mechanical Turk, we found that participants' scores on Questimator-generated quizzes correlated well with their scores on existing online quizzes on topics ranging from philosophy to economics. Also Questimator generates questions with comparable discriminatory power as existing online quizzes. Our results suggest Questimator may be useful for assessing learning in topics for which there is not an existing quiz.

TIST Journal 2013 Journal Article

Effective and efficient microprocessor design space exploration using unlabeled design configurations

  • Tianshi Chen
  • Yunji Chen
  • Qi Guo
  • Zhi-Hua Zhou
  • Ling Li
  • Zhiwei Xu

Ever-increasing design complexity and advances of technology impose great challenges on the design of modern microprocessors. One such challenge is to determine promising microprocessor configurations to meet specific design constraints, which is called Design Space Exploration (DSE). In the computer architecture community, supervised learning techniques have been applied to DSE to build regression models for predicting the qualities of design configurations. For supervised learning, however, considerable simulation costs are required for attaining the labeled design configurations. Given limited resources, it is difficult to achieve high accuracy. In this article, inspired by recent advances in semisupervised learning and active learning, we propose the COAL approach which can exploit unlabeled design configurations to significantly improve the models. Empirical study demonstrates that COAL significantly outperforms a state-of-the-art DSE technique by reducing mean squared error by 35% to 95%, and thus, promising architectures can be attained more efficiently.

IJCAI Conference 2011 Conference Paper

Effective and Efficient Microprocessor Design Space Exploration Using Unlabeled Design Configurations

  • Qi Guo
  • Tianshi Chen
  • Yunji Chen
  • Zhi-Hua Zhou
  • Weiwu Hu
  • Zhiwei Xu

During the design of a microprocessor, Design Space Exploration (DSE) is a critical step which determines the appropriate design configuration of the microprocessor. In the computer architecture community, supervised learning techniques have been applied to DSE to build models for predicting the qualities of design configurations. For supervised learning, however, considerable simulation costs are required for attaining the labeled design configurations. Given limited resources, it is difficult to achieve high accuracy. In this paper, inspired by recent advances in semi-supervised learning, we propose the COMT approach which can exploit unlabeled design configurations to improve the models. In addition to an improved predictive accuracy, COMT is able to guide the design of microprocessors, owing to the use of comprehensible model trees. Empirical study demonstrates that COMT significantly outperforms state-of-the-art DSE technique through reducing mean squared error by 30% to 84%, and thus, promising architectures can be attained more efficiently.