Author name cluster

Shu Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Establishing Best Practices in Building Rigorous Agentic Benchmarks

Yuxuan Zhu
Tengjun Jin
Yada Pruksachatkun
Andy Zhang
Shu Liu
Sasha Cui
Sayash Kapoor
Shayne Longpre

Benchmarks are essential for quantitatively tracking progress in AI. As AI agents become increasingly capable, researchers and practitioners have introduced agentic benchmarks to evaluate agents on complex, real-world tasks. These benchmarks typically measure agent capabilities by evaluating task outcomes via specific reward designs. However, we show that many agentic benchmarks have issues in task setup or reward design. For example, SWE-bench-Verified uses insufficient test cases, while $\tau$-bench counts empty responses as successes. Such issues can lead to under- or overestimation of agents’ performance by up to 100% in relative terms. To make agentic evaluation rigorous, we introduce the Agentic Benchmark Checklist (ABC), a set of guidelines that we synthesized from our benchmark-building experience, a survey of best practices, and previously reported issues. When applied to CVE-Bench, a benchmark with a particularly complex evaluation design, ABC reduces performance overestimation by 33%.

PDF Details

YNICL Journal 2025 Journal Article

Understanding the development of neural abnormalities in adolescents with mental health problems: A longitudinal study

Jiangyun Hou
Laurens van de Mortel
Weijian Liu
Shu Liu
Arne Popma
Dirk J.A. Smit
Guido van Wingen

BACKGROUND: Many mental health problems are neurodevelopmental in nature and have an onset during childhood. These disorders are associated with neural abnormalities, but it is unclear when these emerge and how this relates to the development of different mental health problems. METHODS: We identified children who developed mental health problems over two years and controls who remained healthy from the Adolescent Brain Cognitive Development (ABCD) study. Six mental health conditions (N = 58 to N = 173) were compared to controls (N = 2500) using separate linear models to assess group differences at baseline and in neurodevelopment for six disorders and six modalities. Shared neurodevelopmental changes were assessed by comparing spatial patterns of brain alterations across conditions at baseline and over time. RESULTS: The baseline data showed brain-wide abnormalities in children who later developed mental health problems, which were comparable between internalizing problems and different from externalizing problems. Brain-region specific abnormalities were limited to individuals who later developed oppositional defiant symptoms. The longitudinal data showed differential neurodevelopmental trajectories for specific brain regions in adolescents who developed ADHD, conduct and anxiety problems, as well as brain-wide neurodevelopmental abnormalities that were comparable between mental health problem groups compared to controls. CONCLUSIONS: Our findings reveal both shared and problem-specific neural abnormalities, providing critical insights into the evolving neurobiological mechanisms that underlie both shared and distinct mental health problems, highlighting how disorder-specific and transdiagnostic brain abnormalities emerge across development.

Details DOI

NeurIPS Conference 2024 Conference Paper

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

Shaoteng Liu
Haoqi Yuan
Minda Hu
Yanwei Li
Yukang Chen
Shu Liu
Zongqing Lu
Jiaya Jia

Large Language Models (LLMs) have demonstrated proficiency in utilizing various tools by coding, yet they face limitations in handling intricate logic and precise control. In embodied tasks, high-level planning is amenable to direct coding, while low-level actions often necessitate task-specific refinement, such as Reinforcement Learning (RL). To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent. The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks. This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline. Our approach outperforms traditional RL methods and existing GPT agents, demonstrating superior efficiency. In the Minecraft game, it rapidly obtains diamonds within a single day on an RTX3090. Additionally, it achieves SOTA performance across all designated MineDojo tasks.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Learning Context-Aware Classifier for Semantic Segmentation

Zhuotao Tian
Jiequan Cui
Li Jiang
Xiaojuan Qi
Xin Lai
Yixin Chen
Shu Liu
Jiaya Jia

Semantic segmentation is still a challenging task for parsing diverse contexts in different scenes, thus the fixed classifier might not be able to well address varying feature distributions during testing. Different from the mainstream literature where the efficacy of strong backbones and effective decoder heads has been well studied, in this paper, additional contextual hints are instead exploited via learning a context-aware classifier whose content is data-conditioned, decently adapting to different latent distributions. Since only the classifier is dynamically altered, our method is model-agnostic and can be easily applied to generic segmentation models. Notably, with only negligible additional parameters and +2\% inference time, decent performance gain has been achieved on both small and large models with challenging benchmarks, manifesting substantial practical merits brought by our simple yet effective method. The implementation is available at https://github.com/tianzhuotao/CAC.

PDF Details DOI

TMLR Journal 2023 Journal Article

Neural Monge Map estimation and its applications

Jiaojiao Fan
Shu Liu
Shaojun Ma
Hao-Min Zhou
Yongxin Chen

Monge map refers to the optimal transport map between two probability distributions and provides a principled approach to transform one distribution to another. Neural network-based optimal transport map solver has gained great attention in recent years. Along this line, we present a scalable algorithm for computing the neural Monge map between two probability distributions. Our algorithm is based on a weak form of the optimal transport problem, thus it only requires samples from the marginals instead of their analytic expressions, and can be applied in large-scale settings. Furthermore, using the duality gap we prove rigorously \textit{a posteriori} error analysis for the method. Our algorithm is suitable for general cost functions, compared with other existing methods for estimating Monge maps using samples, which are usually for quadratic costs. The performance of our algorithms is demonstrated through a series of experiments with both synthetic and realistic data, including text-to-image generation, class-preserving map, and image inpainting tasks.

PDF Details

IJCAI Conference 2021 Conference Paper

EmbedMask: Embedding Coupling for Instance Segmentation

Hui Ying
Zhaojin Huang
Shu Liu
Tianjia Shao
Kun Zhou

Current instance segmentation methods can be categorized into segmentation-based methods and proposal-based methods. The former performs segmentation first and then does clustering, while the latter detects objects first and then predicts the mask for each object proposal. In this work, we propose a single-stage method, named EmbedMask, that unifies both methods by taking their advantages, so it can achieve good performance in instance segmentation and produce high-resolution masks in a high speed. EmbedMask introduces two newly defined embeddings for mask prediction, which are pixel embedding and proposal embedding. During training, we enforce the pixel embedding to be close to its coupled proposal embedding if they belong to the same instance. During inference, pixels are assigned to the mask of the proposal if their embeddings are similar. This mechanism brings several benefits. First, the pixel-level clustering enables EmbedMask to generate high-resolution masks and avoids the complicated two-stage mask prediction. Second, the existence of proposal embedding simplifies and strengthens the clustering procedure, so our method can achieve high speed and better performance than segmentation-based methods. Without any bell or whistle, EmbedMask outperforms the state-of-the-art instance segmentation method Mask R-CNN on the challenging COCO dataset, obtaining more detailed masks at a higher speed.

PDF Details DOI

ICML Conference 2021 Conference Paper

Learning Stochastic Behaviour from Aggregate Data

Shaojun Ma
Shu Liu
Hongyuan Zha
Haomin Zhou 0001

Learning nonlinear dynamics from aggregate data is a challenging problem because the full trajectory of each individual is not available, namely, the individual observed at one time may not be observed at the next time point, or the identity of individual is unavailable. This is in sharp contrast to learning dynamics with full trajectory data, on which the majority of existing methods are based. We propose a novel method using the weak form of Fokker Planck Equation (FPE) — a partial differential equation — to describe the density evolution of data in a sampled form, which is then combined with Wasserstein generative adversarial network (WGAN) in the training process. In such a sample-based framework we are able to learn the nonlinear dynamics from aggregate data without explicitly solving the partial differential equation (PDE) FPE. We demonstrate our approach in the context of a series of synthetic and real-world data sets.

Details

IROS Conference 2019 Conference Paper

Mobile Robot Learning from Human Demonstrations with Nonlinear Model Predictive Control

Yingbai Hu
Guang Chen 0001
Xiangyu Ning
Jinhu Dong
Shu Liu
Alois C. Knoll

Learning by imitation is a powerful way that can reduce the complexly in searching space. It could help the mobile robot to acquire new skills from interaction with a human-being in natural way. In this paper, the dynamic movement primitives (DMPs) is utilized to imitate the trajectory from human walking. DMPs is a modified formulation of virtual spring-dampers (VSD) system that enjoys better fitting performance in learning. Further, while dealing with the trajectory tracking problem of mobile robots, a novel nonlinear model predictive control (MPC) approach is proposed for motion control. The nonlinear MPC scheme applies a new neural network named Varying-parameter Lagrangian Neural Network (VP-LNN) to solve a Quadratic Programming (QP) problem by iterating over a finite receding horizon. The new network of VP-LNN can converge to the global optimal solution. Thus, a new human-robot interaction (HRI) scheme for mobile robot is proposed, which can reduce the complexity in motion planning in various applications.

Details

NeurIPS Conference 2018 Conference Paper

Sequential Context Encoding for Duplicate Removal

Lu Qi
Shu Liu
Jianping Shi
Jiaya Jia

Duplicate removal is a critical step to accomplish a reasonable amount of predictions in prevalent proposal-based object detection frameworks. Albeit simple and effective, most previous algorithms utilized a greedy process without making sufficient use of properties of input data. In this work, we design a new two-stage framework to effectively select the appropriate proposal candidate for each object. The first stage suppresses most of easy negative object proposals, while the second stage selects true positives in the reduced proposal set. These two stages share the same network structure, an encoder and a decoder formed as recurrent neural networks (RNN) with global attention and context gate. The encoder scans proposal candidates in a sequential manner to capture the global context information, which is then fed to the decoder to extract optimal proposals. In our extensive experiments, the proposed method outperforms other alternatives by a large margin.

PDF Details

YNICL Journal 2017 Journal Article

Polygenic risk for five psychiatric disorders and cross-disorder and disorder-specific neural connectivity in two independent populations

Tianqi Wang
Xiaolong Zhang
Ang Li
Meifang Zhu
Shu Liu
Wen Qin
Jin Li
Chunshui Yu

Major psychiatric disorders, including attention deficit hyperactivity disorder (ADHD), autism (AUT), bipolar disorder (BD), major depressive disorder (MDD), and schizophrenia (SZ), are highly heritable and polygenic. Evidence suggests that these five disorders have both shared and distinct genetic risks and neural connectivity abnormalities. To measure aggregate genetic risks, the polygenic risk score (PGRS) was computed. Two independent general populations (N = 360 and N = 323) were separately examined to investigate whether the cross-disorder PGRS and PGRS for a specific disorder were associated with individual variability in functional connectivity. Consistent altered functional connectivity was found with the bilateral insula: for the left supplementary motor area and the left superior temporal gyrus with the cross-disorder PGRS, for the left insula and right middle and superior temporal lobe associated with the PGRS for autism, for the bilateral midbrain, posterior cingulate, cuneus, and precuneus associated with the PGRS for BD, and for the left angular gyrus and the left dorsolateral prefrontal cortex associated with the PGRS for schizophrenia. No significant functional connectivity was found associated with the PGRS for ADHD and MDD. Our findings indicated that genetic effects on the cross-disorder and disorder-specific neural connectivity of common genetic risk loci are detectable in the general population. Our findings also indicated that polygenic risk contributes to the main neurobiological phenotypes of psychiatric disorders and that identifying cross-disorder and specific functional connectivity related to polygenic risks may elucidate the neural pathways for these disorders.

Details DOI