Author name cluster

Tao Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

EAAI Journal 2026 Journal Article

A dual-branch multi-scale encoding and fusion model for multivariate time series forecasting

Jiachao Li
Mengxiao Yin
Junyuan Huang
Tao Luo

Details DOI

AAAI Conference 2026 Conference Paper

ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents

Jiangyuan Wang
Kejun Xiao
Qi Sun
Huaipeng Zhao
Tao Luo
Jian Dong Zhang
Xiaoyi Zeng

Existing benchmarks in e-commerce primarily focus on basic user intents, such as finding or purchasing products. However, real-world users often pursue more complex goals, such as applying vouchers, managing budgets, and finding multi-products seller. To bridge this gap, we propose ShoppingBench, a novel end-to-end shopping benchmark designed to encompass increasingly challenging levels of grounded intent. Specifically, we propose a scalable framework to simulate user instructions based on various intents derived from sampled real-world products. To facilitate consistent and reliable evaluations, we provide a large-scale shopping sandbox that serves as an interactive simulated environment, incorporating over 2.5 million real-world products. Experimental results demonstrate that even state-of-the-art language agents (such as GPT-4.1) achieve absolute success rates under 50% on our benchmark tasks, highlighting the significant challenges posed by our ShoppingBench. In addition, we propose a trajectory distillation strategy and leverage supervised fine-tuning, along with reinforcement learning on synthetic trajectories, to distill the capabilities of a large language agent into a smaller one. As a result, our trained agent achieves competitive performance compared to GPT-4.1.

PDF Details DOI

JBHI Journal 2025 Journal Article

A Drug-Drug Interaction Prediction Method Based on Atomic 3D Position Encoding and Elastic Message Passing Graph Neural Network

Tao Luo
Tao Lin
Chun Yang
Lingjie Fan
Wei Wang

Drug-drug interaction (DDI) refers to the inhibitory or enhancing effects between different drugs. Existing DDI prediction methods primarily use graph neural networks (GNNs) to directly represent drug molecular features. However, they often ignore the 3D structures of different atoms within drug molecules and the impact of noise in GNNs on DDI prediction. Consequently, the accuracy of GNN-based DDI prediction remains unsatisfactory. To address these limitations, this study proposes a DDI prediction method based on atomic 3D position encoding and an elastic message passing graph neural network (A3DPE-EMPGNN). Firstly, we construct an atomic feature network based on an attention mechanism and a message passing neural network. This network leverages 3D position encoding based on the molecular centroid to learn the features of different atoms and their associated chemical bonds, thereby constructing a graph-based molecular representation. Secondly, we design a molecular feature network that incorporates an attention mechanism, utilizing multi-head attention to capture interaction information between different drug molecules. Thirdly, we employ an adversarial attack detection and defense strategy, integrating supervised and contrastive loss learning to optimize the model and enhance its robustness while performing DDI prediction. Lastly, we evaluate the effectiveness of A3DPE-EMPGNN on two real-world datasets. Experimental results clearly demonstrate that our method achieves over 98% accuracy across ACC, AUC, AP, and F1-score, outperforming state-of-the-art GNN-based models.

Details DOI

EAAI Journal 2025 Journal Article

A lightweight defect detection transformer for printed circuit boards combining image feature augmentation and refined cross-scale feature fusion

Tao Luo
Yongbing Zhou
Donglin Shi
Qinglin Yun
Shuying Wang
Jian Zhang
Guofu Ding

Details DOI

NeurIPS Conference 2025 Conference Paper

ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

Xiaoyang Liu
Kangjie Bao
Jiashuo Zhang
Yunqi Liu
Yu Chen
Yuntian Liu
Yang Jiao
Tao Luo

Autoformalization, the automatic translation of mathematical content from natural language into machine-verifiable formal languages, has seen significant progress driven by advances in large language models (LLMs). Nonetheless, a primary barrier to further improvements is the limited availability of parallel corpora that map informal mathematical text to its formal counterpart. To address this limitation, we propose ATLAS (Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data), a novel data generation framework designed to produce large-scale, high-quality parallel corpora of theorem statements. Distinct from prior approaches, ATLAS begins with a concept repository, accelerates the improvement of the student model through expert iteration combined with knowledge distillation, and introduces two novel augmentation strategies that exploit the structural characteristics of formal languages. Running the proposed ATLAS framework for 10 iterations, we construct an undergraduate-level dataset of 117k theorem statements and develop the ATLAS Translator by fine-tuning Llama3. 1-8B-Instruct with LoRA. This model establishes a new state of the art, demonstrating statistically significant improvements over both the Herald Translator and the Kimina-Autoformalizer across all benchmarks (p<0. 05, two-sided t-test). Furthermore, we demonstrate that the full-parameter fine-tuning of a stronger base model on the ATLAS dataset leads to superior performance. The datasets, model, and code are available at https: //github. com/XiaoyangLiu-sjtu/ATLAS.

PDF Details

IROS Conference 2025 Conference Paper

Automated Manipulation of Magnetic Microswarms for Temporal Logic Cargo Delivery Tasks in Complex Environments

Naifu Zhang
Tao Luo
Chongjie Jiang
Xiang Yin 0003
Xiao Yu 0002
Rongrong Ji

Micromanipulation using magnetic microswarms has garnered significant attention in recent years due to their potential in microscale cargo delivery tasks. While existing studies have demonstrated the capabilities of microswarms in basic manipulation tasks, they often lack the autonomy required to handle more complex specifications, particularly temporal logic tasks. In this paper, we propose a novel formal planning strategy for magnetic microswarms that enables cargo delivery in complex environments while satisfying finite linear temporal logic (LTL f ) specifications. Our approach consists of two key components. First, we develop a high-level path planner based on a bidirectional temporal logic rapid-explore random tree star (BTL-RRT*) algorithm, which facilitates efficient planning while ensuring compliance with the given task specifications. Second, we employ an automaton to manage the manipulation modes of the microswarm, enabling real-time control over the capture and release of cargoes. In addition, we implement the planning strategy on microswarms actuated by a visual feedback magnetic tweezers system. Extensive simulations and experimental results demonstrate the effectiveness of the proposed planning strategy. The results indicate that, using the proposed approach, microswarms can autonomously select and deliver multiple microbeads to designated regions in both static and dynamic environments, adhering to the LTL f specifications.

Details

NeurIPS Conference 2025 Conference Paper

Embedding Principle of Homogeneous Neural Network for Classification Problem

Jiahan Zhang
Yaoyu Zhang
Tao Luo

In this paper, we study the Karush-Kuhn-Tucker (KKT) points of the associated maximum-margin problem in homogeneous neural networks, including fully-connected and convolutional neural networks. In particular, We investigates the relationship between such KKT points across networks of different widths generated. We introduce and formalize the \textbf{KKT point embedding principle}, establishing that KKT points of a homogeneous network's max-margin problem ($P_{\Phi}$) can be embedded into the KKT points of a larger network's problem ($P_{\tilde{\Phi}}$) via specific linear isometric transformations. We rigorously prove this principle holds for neuron splitting in fully-connected networks and channel splitting in convolutional neural networks. Furthermore, we connect this static embedding to the dynamics of gradient flow training with smooth losses. We demonstrate that trajectories initiated from appropriately mapped points remain mapped throughout training and that the resulting $\omega$-limit sets of directions are correspondingly mapped, thereby preserving the alignment with KKT directions dynamically when directional convergence occurs. We conduct several experiments to justify that trajectories are preserved. Our findings offer insights into the effects of network width, parameter redundancy, and the structural connections between solutions found via optimization in homogeneous networks of varying sizes.

PDF Details

NeurIPS Conference 2025 Conference Paper

From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics

Zheng-An Chen
Tao Luo

Although transformer-based models have shown exceptional empirical performance, the fundamental principles governing their training dynamics are inadequately characterized beyond configuration-specific studies. Inspired by empirical evidence showing improved reasoning capabilities under small initialization scales in language models, we employ the gradient flow analytical framework established in \cite{zhou2022towards} to systematically investigate linearized Transformer training dynamics. Our theoretical analysis dissects the dynamics of attention modules into two distinct stages. In the first stage, asymmetric weight perturbations from random initialization sustain non-degenerate gradient dynamics in parameter matrices, facilitating systematic escape from small initialization regimes. Subsequently, these matrices undergo condensation, progressively aligning toward the target orientation. In the second stage, the previously static key-query matrices actively participate in training, driving the normalized matrices toward asymptotic rank collapse. This two-stage framework generalizes classical directional convergence results.

PDF Details

AAAI Conference 2025 Conference Paper

VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering

Chun-Mei Feng
Yang Bai
Tao Luo
Zhen Li
Salman Khan
Wangmeng Zuo
Rick Siow Mong Goh
Yong Liu

Albeit progress has been made in Composed Image Retrieval (CIR), we empirically find that a certain percentage of failure retrieval results are not consistent with their relative captions. To address this issue, this work provides a Visual Question Answering (VQA) perspective to boost the performance of CIR. The resulting VQA4CIR is a post-processing approach and can be directly plugged into existing CIR methods. Given the top-C retrieved images by a CIR method, VQA4CIR aims to decrease the adverse effect of the failure retrieval results being inconsistent with the relative caption. To find the retrieved images inconsistent with the relative caption, we resort to the "QA generation → VQA" self-verification pipeline. For QA generation, we suggest fine-tuning LLM (e.g., LLaMA) to generate several pairs of questions and answers from each relative caption. We then fine-tune LVLM (e.g., LLaVA) to obtain the VQA model. By feeding the retrieved image and question to the VQA model, one can find the images inconsistent with relative caption when the answer by VQA is inconsistent with the answer in the QA pair. Consequently, the CIR performance can be boosted by modifying the ranks of inconsistently retrieved images. Experimental results show that our proposed method outperforms state-of-the-art CIR methods on the CIRR and Fashion-IQ datasets.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

Sili Huang
Jifeng Hu
Zhejian Yang
Liwei Yang
Tao Luo
Hechang Chen
Lichao Sun
Bo Yang

Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of attention in transformers, current in-context RL methods suffer from huge computational costs as the task horizon increases. In contrast, the Mamba model is renowned for its efficient ability to process long-term dependencies, which provides an opportunity for in-context RL to solve tasks that require long-term memory. To this end, we first implement Decision Mamba (DM) by replacing the backbone of Decision Transformer (DT). Then, we propose a Decision Mamba-Hybrid (DM-H) with the merits of transformers and Mamba in high-quality prediction and long-term memory. Specifically, DM-H first generates high-value sub-goals from long-term memory through the Mamba model. Then, we use sub-goals to prompt the transformer, establishing high-quality predictions. Experimental results demonstrate that DM-H achieves state-of-the-art in long and short-term tasks, such as D4RL, Grid World, and Tmaze benchmarks. Regarding efficiency, the online testing of DM-H in the long-term task is 28$\times$ times faster than the transformer-based baselines.

PDF Details DOI

TMLR Journal 2023 Journal Article

Limitation of Characterizing Implicit Regularization by Data-independent Functions

Leyang Zhang
Zhi-Qin John Xu
Tao Luo
Yaoyu Zhang

In recent years, understanding the implicit regularization of neural networks (NNs) has become a central task in deep learning theory. However, implicit regularization is itself not completely defined and well understood. In this work, we attempt to mathematically define and study implicit regularization. Importantly, we explore the limitations of a common approach to characterizing implicit regularization using data-independent functions. We propose two dynamical mechanisms, i.e., Two-point and One-point Overlapping mechanisms, based on which we provide two recipes for producing classes of one-hidden-neuron NNs that provably cannot be fully characterized by a type of or all data-independent functions. Following the previous works, our results further emphasize the profound data dependency of implicit regularization in general, inspiring us to study in detail the data dependency of NN implicit regularization in the future.

PDF Details

NeurIPS Conference 2022 Conference Paper

Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

Hanxu Zhou
Zhou Qixuan
Zhenyuan Jin
Tao Luo
Yaoyu Zhang
Zhi-Qin Xu

Substantial work indicates that the dynamics of neural networks (NNs) is closely related to their initialization of parameters. Inspired by the phase diagram for two-layer ReLU NNs with infinite width (Luo et al. , 2021), we make a step towards drawing a phase diagram for three-layer ReLU NNs with infinite width. First, we derive a normalized gradient flow for three-layer ReLU NNs and obtain two key independent quantities to distinguish different dynamical regimes for common initialization methods. With carefully designed experiments and a large computation cost, for both synthetic datasets and real datasets, we find that the dynamics of each layer also could be divided into a linear regime and a condensed regime, separated by a critical regime. The criteria is the relative change of input weights (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) as the width approaches infinity during the training, which tends to $0$, $+\infty$ and $O(1)$, respectively. In addition, we also demonstrate that different layers can lie in different dynamical regimes in a training process within a deep NN. In the condensed regime, we also observe the condensation of weights in isolated orientations with low complexity. Through experiments under three-layer condition, our phase diagram suggests a complicated dynamical regimes consisting of three possible regimes, together with their mixture, for deep NNs and provides a guidance for studying deep NNs in different initialization regimes, which reveals the possibility of completely different dynamics emerging within a deep NN for its different layers.

PDF Details

NeurIPS Conference 2022 Conference Paper

Towards Understanding the Condensation of Neural Networks at Initial Training

Hanxu Zhou
Zhou Qixuan
Tao Luo
Yaoyu Zhang
Zhi-Qin Xu

Empirical works show that for ReLU neural networks (NNs) with small initialization, input weights of hidden neurons (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) condense onto isolated orientations. The condensation dynamics implies that the training implicitly regularizes a NN towards one with much smaller effective size. In this work, we illustrate the formation of the condensation in multi-layer fully connected NNs and show that the maximal number of condensed orientations in the initial training stage is twice the multiplicity of the activation function, where ``multiplicity'' indicates the multiple roots of activation function at origin. Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one with arbitrary dimension input, which contains many common activation functions, and the other is for the layer with one-dimensional input and arbitrary multiplicity. This work makes a step towards understanding how small initialization leads NNs to condensation at the initial training stage.

PDF Details

NeurIPS Conference 2021 Conference Paper

Embedding Principle of Loss Landscape of Deep Neural Networks

Yaoyu Zhang
Zhongwang Zhang
Tao Luo
Zhiqin J Xu

Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously important. In this work, we prove an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs. More precisely, we propose a critical embedding such that any critical point, e. g. , local or global minima, of a narrower DNN can be embedded to a critical point/affine subspace of the target DNN with higher degeneracy and preserving the DNN output function. Note that, given any training data, differentiable loss function and differentiable activation function, this embedding structure of critical points holds. This general structure of DNNs is starkly different from other nonconvex problems such as protein-folding. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. The embedding principle provides a new perspective to study the general easy optimization of wide DNNs and unravels a potential implicit low-complexity regularization during the training. Overall, our work provides a skeleton for the study of loss landscape of DNNs and its implication, by which a more exact and comprehensive understanding can be anticipated in the near future.

PDF Details

JMLR Journal 2021 Journal Article

Phase Diagram for Two-layer ReLU Neural Networks at Infinite-width Limit

Tao Luo
Zhi-Qin John Xu
Zheng Ma
Yaoyu Zhang

How neural network behaves during the training over different choices of hyperparameters is an important question in the study of neural networks. In this work, inspired by the phase diagram in statistical mechanics, we draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit for a complete characterization of its dynamical regimes and their dependence on hyperparameters related to initialization. Through both experimental and theoretical approaches, we identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime, based on the relative change of input weights as the width approaches infinity, which tends to $0$, $O(1)$ and $+\infty$, respectively. In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay. In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations. The critical regime serves as the boundary between above two regimes, which exhibits an intermediate nonlinear behavior with the mean-field model as a typical example. Overall, our phase diagram for the two-layer ReLU NN serves as a map for the future studies and is a first step towards a more systematical investigation of the training behavior and the implicit regularization of NNs of different structures. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

PDF Details

ICRA Conference 2017 Conference Paper

A high-precision robot-aided single-cell biopsy system

Adnan Shakoor
Tao Luo
Shuxun Chen
Mingyang Xie
James K. Mills
Dong Sun 0001

In this paper, we present a precise robot-aided single-cell surgery system to perform single-cell biopsy for cells <25 μm in diameter. A microfluidic chip is designed to arrange upto 100 individual cells in an array. A micropipette mounted onto a 3-DOF micromanipulator and a computer mouse-operated high-precision XY stage is developed to perform high-precision and high-throughput single-cell biopsy. The system is evaluated experimentally by extracting two organelles from adherent cells patterned in a microfluidic chip. The fluorescent-labeled nucleus and mitochondria of human foreskin fibroblast cells are biopsied to demonstrate the capability of the proposed system. The survival rate of the semi-automated biopsy is 73% and 45% for mitochondrial and nucleus biopsies, respectively.

Details