Arrow Research search

Author name cluster

Yao Lai

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
2 author rows

Possible papers

6

AAAI Conference 2025 Conference Paper

AnalogCoder: Analog Circuit Design via Training-Free Code Generation

  • Yao Lai
  • Sungyoung Lee
  • Guojin Chen
  • Souradip Poddar
  • Mengkang Hu
  • David Z. Pan
  • Ping Luo

Analog circuit design is a significant task in modern chip technology, focusing on the selection of component types, connectivity, and parameters to ensure proper circuit functionality. Despite advances made by Large Language Models (LLMs) in digital circuit design, the complexity and scarcity of data in analog circuitry pose significant challenges. To mitigate these issues, we introduce AnalogCoder, the first training-free LLM agent for designing analog circuits through Python code generation. Firstly, AnalogCoder incorporates a feedback-enhanced flow with tailored domain-specific prompts, enabling the automated and self-correcting design of analog circuits with a high success rate. Secondly, it proposes a circuit tool library to archive successful designs as reusable modular sub-circuits, simplifying composite circuit creation. Thirdly, extensive experiments on a benchmark designed to cover a wide range of analog circuit tasks show that AnalogCoder outperforms other LLM-based methods. It has successfully designed 20 circuits, 5 more than standard GPT-4o. We believe AnalogCoder can significantly improve the labor-intensive chip design process, enabling non-experts to design analog circuits efficiently.

NeurIPS Conference 2025 Conference Paper

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

  • Jin Wang
  • Yao Lai
  • Aoxue Li
  • Shifeng Zhang
  • Jiacheng Sun
  • Ning Kang
  • Chengyue Wu
  • Zhenguo Li

The rapid progress of large language models (LLMs) has catalyzed the emergence of multimodal large language models (MLLMs) that unify visual understanding and image generation within a single framework. However, most existing MLLMs rely on autoregressive (AR) architectures, which impose inherent limitations on future development, such as the raster-scan order in image generation and restricted reasoning abilities in causal context modeling. In this work, we challenge the dominance of AR-based approaches by introducing FUDOKI, a unified multimodal model purely based on discrete flow matching, as an alternative to conventional AR paradigms. By leveraging metric-induced probability paths with kinetic optimal velocities, our framework goes beyond the previous masking-based corruption process, enabling iterative refinement with self-correction capability and richer bidirectional context integration during generation. To mitigate the high cost of training from scratch, we initialize FUDOKI from pre-trained AR-based MLLMs and adaptively transition to the discrete flow matching paradigm. Experimental results show that FUDOKI achieves performance comparable to state-of-the-art AR-based MLLMs across both visual understanding and image generation tasks, highlighting its potential as a foundation for next-generation unified multimodal models. Furthermore, we show that applying test-time scaling techniques to FUDOKI yields significant performance gains, further underscoring its promise for future enhancement through reinforcement learning.

NeurIPS Conference 2024 Conference Paper

Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs

  • Yao Lai
  • Jinxin Liu
  • David Z. Pan
  • Ping Luo

Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system. Nevertheless, the effectiveness of prior arithmetic design techniques proves inadequate, as they do not sufficiently optimize speed and area, resulting in increased latency and larger module size. To boost computing performance, this work focuses on the two most common and fundamental arithmetic modules, adders and multipliers. We cast the design tasks as single-player tree generation games, leveraging reinforcement learning techniques to optimize their arithmetic tree structures. This tree generation formulation allows us to efficiently navigate the vast search space and discover superior arithmetic designs that improve computational efficiency and hardware size within just a few hours. Our proposed method, ArithTreeRL, achieves significant improvements for both adders and multipliers. For adders, our approach discovers designs of 128-bit adders that achieve Pareto optimality in theoretical metrics. Compared with PrefixRL, it reduces delay and size by up to 26% and 30%, respectively. For multipliers, compared to RL-MUL, our method enhances speed and reduces size by as much as 49% and 45%. Additionally, ArithTreeRL's flexibility and scalability enable seamless integration into 7nm technology. We believe our work will offer valuable insights into hardware design, further accelerating speed and reducing size through the refined search space and our tree generation methodologies.

ICML Conference 2023 Conference Paper

ChiPFormer: Transferable Chip Placement via Offline Decision Transformer

  • Yao Lai
  • Jinxin Liu
  • Zhentao Tang
  • Bin Wang 0034
  • Jianye Hao
  • Ping Luo 0002

Placement is a critical step in modern chip design, aiming to determine the positions of circuit modules on the chip canvas. Recent works have shown that reinforcement learning (RL) can improve human performance in chip placement. However, such an RL-based approach suffers from long training time and low transfer ability in unseen chip circuits. To resolve these challenges, we cast the chip placement as an offline RL formulation and present ChiPFormer that enables learning a transferable placement policy from fixed offline data. ChiPFormer has several advantages that prior arts do not have. First, ChiPFormer can exploit offline placement designs to learn transferable policies more efficiently in a multi-task setting. Second, ChiPFormer can promote effective finetuning for unseen chip circuits, reducing the placement runtime from hours to minutes. Third, extensive experiments on 32 chip circuits demonstrate that ChiPFormer achieves significantly better placement quality while reducing the runtime by 10x compared to recent state-of-the-art approaches in both public benchmarks and realistic industrial tasks. The deliverables are released at https: //sites. google. com/view/chipformer/home.

NeurIPS Conference 2022 Conference Paper

MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning

  • Yao Lai
  • Yao Mu
  • Ping Luo

Placement is an essential task in modern chip design, aiming at placing millions of circuit modules on a 2D chip canvas. Unlike the human-centric solution, which requires months of intense effort by hardware engineers to produce a layout to minimize delay and energy consumption, deep reinforcement learning has become an emerging autonomous tool. However, the learning-centric method is still in its early stage, impeded by a massive design space of size ten to the order of a few thousand. This work presents MaskPlace to automatically generate a valid chip layout design within a few hours, whose performance can be superior or comparable to recent advanced approaches. It has several appealing benefits that prior arts do not have. Firstly, MaskPlace recasts placement as a problem of learning pixel-level visual representation to comprehensively describe millions of modules on a chip, enabling placement in a high-resolution canvas and a large action space. It outperforms recent methods that represent a chip as a hypergraph. Secondly, it enables training the policy network by an intuitive reward function with dense reward, rather than a complicated reward function with sparse reward from previous methods. Thirdly, extensive experiments on many public benchmarks show that MaskPlace outperforms existing RL approaches in all key performance metrics, including wirelength, congestion, and density. For example, it achieves 60%-90% wirelength reduction and guarantees zero overlaps. We believe MaskPlace can improve AI-assisted chip layout design. The deliverables are released at https: //laiyao1. github. io/maskplace.

ECAI Conference 2020 Conference Paper

OpenSMax: Unknown Domain Generation Algorithm Detection

  • Yao Lai
  • Guolou Ping
  • Yuexin Wu
  • Chenhui Lu
  • Xiaojun Ye

Botnet has become one of the most frequent attack patterns in cyberspace, and most of them are concerned with Domain Generation Algorithms (DGAs). Therefore, many researchers have proposed various machine learning models for DGA domain name detection, but how to detect unknown classes of DGA domain names (unknown DGAs) is still a challenging problem. In fact, the problem of detecting unknown classes is also called open set recognition problem. To tackle this issue, we propose a novel classification model OpenSMax which can not only detect various DGA domain names but also classify them into known and unknown classes of DGAs. In this model, we use the one-hot encoding method and the Long Short-Term Memory (LSTM) model to extract the features of the Top Level Domain (TLD) and the Second Level Domain (SLD) respectively. Then, these two feature categories are concatenated and propagated forwards by two fully connected layers for known DGA domain name detection and classification. Finally, both the openmax layer (the layer before the softmax layer) and the softmax layer are used to build One-Class Support Vector Machine (SVM) models for unknown classes recognition. In our experiments, OpenSMax model outperforms the state-of-art methods both in known and unknown DGA domain names detection tasks. Also, OpenSMax provides a bounded open space risk in theory, and therefore it formally provides an effective solution for unknown DGA domain name detection.